Compass

Search Engine

Compass is a powerful Search Engine mapping built on the Lucene API. It is designed for fast search and supports mapping from data object to the Search Engine. Unlike a traditional database, Compass' primary goal is to "simplify the integration of Search Engine into any application". By incorporating fast index operations and optimization, this DMBS allows itself to be used as a lightweight application datasource.

History

In 2006, Shay Banon was searching for an application framework that prioritized improving speed and performance. As a result of his lack of findings, he decided to develop Compass. He initially created this for his wife as a recipe management software, named iCook, but eventually extended this Search Engine trait on a lot of domain models and created Compass. Though Lucene has a durable Search Engine, Banon thought the low-level usage and API made it less user-oriented and therefore, not suited for all users. Therefore, he used Lucene for its functionality but added a layer of abstraction to be adaptable to users of all backgrounds. As the DBMS grew and gained support, Banon released it in Source Forge with an open source license.

Concurrency Control

Optimistic Concurrency Control (OCC)

Lucene is not transactional, meaning it does not have the ability to undo any changes. However, it does include an inter and outer process locking mechanism for transaction locking. Locking is done only on the "sub-index" level and the index of the alias/searchable content. Since locks are only applied on dirty operations, the read operation will not require a lock to maximize productivity. In addition, all transactions are managed in a special lock file. Though Lucene is not transactional, Compass Search Engine provides additional support for transaction management, including the Lucene and Async transaction processor. The Lucene transaction processor isolates changes made between transactions but also allows merges during commit time and does not allow dirty operations done on one transaction to be visible to other operations within the same transaction. This separation is similar to the Optimistic Concurrency Control protocol, that creates a private workspace for each transaction. On the other hand, the Async transaction processor doesn't require a lock during a dirty operation. It accumulates and processes transactions asynchronously and concurrently by allowing changes to constantly occur even when a commit is being passed and by a background thread that waits for more transactions.

Data Model

Document / XML Object-Oriented

As a Java Search Engine framework, Compass allows users to explicitly map an Object domain model to a Search Engine, which includes OSEM (Object/Search Engine Mapping) XSEM (XML/Search Engine Mapping), JSEM (JSON/Search Engine Mapping), and RSEM (Resource/Search Engine Mapping). In general, Compass is optimal when classes follow the POJO (Plain Old Java Object) programming model.

OSEM

By utilizing annotations, OSEM allows Java Objects to be mapped to the Search Engine. However, these Objects must have specific attributes and be classified in either the root searchable class or non-root searchable class. Root searchable classes are used to define whether a hit has been made when searching for an element.

XSEM

Similar to OSEM, XSEM allows the XML structure to be mapped to the Search Engine using Xpath expressions. An "XmlObject" is used to define an XML element, which includes the name and value of the document, node, attribute and expressions to execute against. With built in converters, Compass parses XML content into an "XmlObject" representation in the Search Engine for read and write operations.

JSEM

JSEM also allows explicit mapping of JSON to the Search Engine. However, since explicit mapping can be cumbersome, Compass allows JSON elements to be mapped dynamically and recursively. This allows JSEM to be used a generic indexing service.

RSEM

Even without a domain model, Compass allows users to use the functionality through RSEM. This interface provides resources beyond Objects and XML and still includes the same functionality.

Indexes

Inverted Index (Full Text)

Compass has a lot of sub-indexes and each sub-index is a mapping to a Lucene Index, which is an inverted index. As one of Compass' highlighted features, index partitioning is used to manage complexity and increase performance. It does this by implementing a configurable sub index hash function which can be applied to different searchable objects.

One of its modules, Compass Gps, includes a Jdbc integration feature that allows indexing of database content to be done using configurable SQL expression.

Isolation Levels

Read Committed

Compass uses Lucene (also known as batch insert) as its transaction processor, which is most like the read committed isolation level but faster and works better for long running batch dirty operations. Unlike the read committed level, the Lucene isolation level does not show dirty operations to get/load/find operations that take place during the same transaction. This isolation level can also define a merge factor to do merges during commit time. To yield better performance, Lucene controls the amount of transactional memory by defaulting to allocate 16.0 Mb in memory.

Joins

Hash Join

Compass defines aliases for the mapping definitions of the searchable content to partition the content into different sub indexes. The constant mapping to a sub index is defined by sub index hashing. This allows searchable classes to be joined under the same sub index.

There are mappings between the database and Compass index; the database returns its SQL results as a ResultSet object which gets mapped to a set of Compass Resource objects. With the join operation, the mapping allows the join SQL result to be under the same resource mapping with the same alias.

Query Execution

Materialized Model

The Hiberate Gps device contributes the indexing objects ability into a database to Compass. With mappings, Compass can fetch all related data from the database. With this default query, each indexing process emits the entire result at a time. As a result, Compass focuses on keeping the index optimized with a manageable number of segments. Since it emits its output all at once, Compass merges small segments into larger ones to optimize its operations.

Query Interface

Custom API

Compass uses a single interface for all of its operations including reading, searching, and writing objects. Compass customizes its own API but is primarily based off of Lucene's three main classes: IndexReader, Searcher, and IndexWriter. It is also intentionally similar to certain ORM frameworks, like Hibernate, JDO, or JPA, to make it easier for the developer to learn how to interact with Compass.

Compass' architecture is layered and composed of three main classes that interact with the Search Engine: Compass, CompassSession, and CompassTransaction. The interaction starts with loading mappings files through the CompassConfiguration class, which creates Compass. Using an existing or newly created index, the Compass class will create a CompassSession object to begin organizing data with the Search Engine. The Search Engine results include obtaining scores, resources and mapped objects. Following the results, CompassTransaction can optionally manage the transactions for fine graining control.

Storage Architecture

In-Memory

Compass uses in-memory for efficient indexing and search. With a RAM (random access memory) based index store, long term storage is not needed. A local in-memory cache improves performance for sub index searches that are high frequency. All operations including reading, writing, and locking are performed in the in memory directory.

Storage Model

Custom

To ensure searches are efficient, the storage location is dependent on index of the searchable content. Lucene extends a Directory object that provides a layer of abstraction on top of the index storage. Compass includes a few options for storing the index: the file system, Java 1.4 NIO nmap feature, and Java 1.4 NIO feature. The latter two yield better performance than a simple file system based configuration.

Storage Organization

Indexed Sequential Access Method (ISAM)

As mentioned before, the Compass Gps module includes a Jdbc integration to index and extract database content with SQL expressions. Compass organizes its storage through an index structure, partitioned into sub indexes that are considered Lucene indexes.

Stored Procedures

Not Supported

Compass does not support stored procedures. Though it may not save prepared SQL code to be reused again and again, Compass by default provides an "all" field that aggregates over all fields. It is not a saved query but the "all" search field property can just be called to execute it. It is also customizable as it can be enabled or disabled, changed to a different name, and be disabled as a default property.

System Architecture

Shared-Everything

Compass is a distributed database that is meant to be shared between application threads. There can be multiple instances of Compass within the same application but with a different configuration for each.

Compass provides a data management platform called Compass Needle Terracotta that allows Lucene indices to be stored in a distributed manner. The Terracotta directory is partitioned in memory as byte arrays and shared or "network attached" between nodes. Each node is connected and managed by the Terracotta server to make the appropriate changes.

People Also Viewed

Compass Logo
Website

http://www.compass-project.org/

Source Code

https://github.com/kimchy/compass

Tech Docs

http://www.compass-project.org/docs/2.2.0/reference/html/

Developer

Shay Banon

Country of Origin

GB

Start Year

2006

End Year

2009

Former Name

iCook

Project Type

Open Source

Written in

Java

Supported languages

Java

Operating Systems

OS X

Licenses

Apache v2

People Also Viewed