Compass

View Current Viewing Revision #5 from 11/13/2019 12:05 p.m.

Compass is a powerful Search Engine mapping built on the Lucene API. It is designed for fast search and supports mapping from data object to the Search Engine. Unlike a traditional database, Compass' primary goal is to "simplify the integration of Search Engine into any application". By incorporating fast index operations and optimization, this DMBS allows itself to be used as a lightweight application datasource.

History

In 2006, Shay Banon was searching for an application framework that prioritized getting information fast and efficiently. As a result of his lack of findings, he decided to develop Compass. He initially created this for his wife as a recipe management software, named iCook, but eventually extended this Search Engine trait on a lot of domain models and created Compass. Though Banon thought Lucene by itself was already a strong, durable Search Engine, he thought the low-level usage and API made it unattractive and difficult to work with. Therefore, he used Lucene for its functionality but added a layer of abstraction to make things even simpler. As the DBMS grew and gained support, Banon released it in Source Forge with an open source license.

Isolation Levels

Read Committed

Compass uses Lucene (also known as batch insert) as its transaction processor, which is most like the read committed isolation level but faster and works better for long running batch dirty operations. Unlike the read committed level, the Lucene isolation level does not show dirty operations to get/load/find operations that take place during the same transaction. This isolation level can also define a merge factor to do merges during commit time. To yield better performance, Lucene controls the amount of transactional memory by defaulting to allocate 16.0 Mb in memory.

Indexes

Inverted Index (Full Text)

Compass has a lot of sub-indexes and each sub-index is mapping to a Lucene Index, which is an inverted index. As one of Compass' highlighted features, index partitioning is used to manage complexity and increase performance. It does this by implementing a configurable sub index hash function which can be applied to different searchable objects.

One of its modules, Compass Gps, includes a Jdbc integration feature that allows indexing of database content to be done using configurable SQL expression.

Query Interface

Custom API

As it's main goal is to provide simplicity, Compass includes all of its operations in its single interface and abstracts the complexities of reading, searching, and writing objects. Compass customizes its own API but is primarily based off of Lucene's three main classes: IndexReader, Searcher, and IndexWriter. It is also intentionally similar to certain ORM frameworks, like Hibernate, JDO, or JPA, to make it easier for the developer to learn how to interact with Compass.

Compass' architecture is layered and composed of three main classes that interact with the Search Engine: Compass, CompassSession, and CompassTransaction. The interaction starts with loading mappings files through the CompassConfiguration class, which creates Compass. Using an existing or newly created index, the Compass class will create a CompassSession object to begin organizing data with the Search Engine.

Storage Architecture

In-Memory

Compass uses in-memory for efficient indexing and search. With a RAM (random access memory) based index store, long term storage is not needed. A local in-memory cache improves performance for sub index searches that are high frequency. All operations including reading, writing, and locking are performed in the in memory directory.

Data Model

Document / XML Object-Oriented

Compass aims to be as accessible as possible by being usable in different environments and integrating with different models. As a Java Search Engine framework, Compass allows users to explicitly map an Object domain model to a Search Engine, which includes OSEM (Object/Search Engine Mapping) XSEM (XML/Search Engine Mapping), JSEM (JSON/Search Engine Mapping), and RSEM (Resource/Search Engine Mapping). In general, Compass is most effective when classes follow the POJO (Plain Old Java Object) programming model.

OSEM

By utilizing annotations, OSEM allows Java Objects to be mapped to the Search Engine. However, these Objects must have specific attributes and be classified in either the root searchable class or non-root searchable class. Root searchable classes are used to define whether a hit has been made when searching for an element.

XSEM

Similar to OSEM, XSEM allows the XML structure to be mapped to the Search Engine using Xpath expressions. An "XmlObject" is used to define an XML element, which includes the name and value of the document, node, attribute and expressions to execute against. With built in converters, Compass parses XML content into an "XmlObject" representation in the Search Engine for read and write operations.

JSEM

JSEM also allows explicit mapping of JSON to the Search Engine. However, since explicit mapping can be cumbersome, Compass allows JSON elements to be mapped dynamically and recursively. This allows JSEM to be used a generic indexing service.

RSEM

Even without a domain model, Compass allows users to use the functionality through RSEM. This interface provides resources beyond Objects and XML and still includes the same functionality.

Storage Organization

Indexed Sequential Access Method (ISAM)

As mentioned before, the Compass Gps module includes a Jdbc integration to index and extract database content with SQL expressions. Compass organizes its storage through an index structure, partitioned into sub indexes that are considered Lucene indexes. Transactional indexes allows any operations done on Compass to be simple and efficient.

Revision #5 | Updated 11/13/2019 12:05 p.m.