Espresso

Espresso is an internal distributed document-oriented database management system written by LinkedIn. It serves as the source-of-truth primary store for many downstream systems and applications such as Company Pages, Unified Social Content Platform, and InMail. Because of its special role, the design principles of Espresso mainly focus on satisfying the requirements of production environments, which include but not limited to the guarantee of operability, availability, scalability, and elasticity. For operability, it not only provides flexible data model and APIs to support various applications but also has been designed to be highly compatible with the whole data ecosystem at LinkedIn. For availability, it has elaborately designed fault tolerance mechanism, e.g., there is always a warm standby of the Espresso clusters at a geographically remote disaster recovery data center. For scalability, most of the methods it adopts, including cluster management and data management, avoid centralized processing and synchronized operations. For elasticity, it supports online cluster expansion with little downtime.

Espresso employs RESTful APIs as its user-level interface. Because the interface is based on JSON over HTTP, it provides both flexibility and resource transparency for downstream application developers.

Espresso is derived from MySQL. It has two system-level internal building blocks: a cluster management system and a change capture system. Apache Helix, with Apache Zookeeper integrated in, is used to carry out the first function. LinkedIn Databus was used as the first generation change capture system but was later replaced by Kafka. The change capture system also plays an important role in data replication. Besides, another library-level building block of Espresso is Apache Lucene, which provide basic support fo full-text inverted index.

History

The Espresso project was first planned and designed in early 2011. Its mission at that time was to fill the vacancy that there exists no well-designed highly consistent database systems with both scalability and agility in LinkedIn's data infrastructure. Based on the experience of developing early relational database systems like Voldemort, the engineering team at LinkedIn spent one year writing Espresso and deployed it in production in June 2012.

Although LinkedIn once planed to open source Espresso, but the plan was later shelved. Currently, Espresso is still an internal system.

Data Model

Document / XML

Espresso adopts a hierarchical document-oriented data model, where documents belong to both different tables and document groups, tables and document groups then belong to databases. Different level of this hierarchy have different schema-define format, e.g., database and table schemas are defined in JSON but document schemas are defined in Avro. Since document group is a logical concept and has no explicit representation, it does not have schemas.

Database is the largest unit of the data model. Table and document group are the two lower parallel units in the hierarchy, but they are quite different from each other. A table contains arbitrary documents that have common key structures. But a document group does not actively own any document; on the contrary as long as two documents have the same partitioning key, they belong to the same document group. A table can therefore span multiple document groups and a document group can also span multiple tables. The most basic unit of the data model is document. A document contains abundant schema-ed contents with various data structures. It's conceptually similar to a row in SQL database systems. Different documents can be identified by their primary keys.

Although document group is a passive component in the data model, it plays a quite important role in data management, e.g., secondary indexes are mainly built inside a document group and the documents of the same document group are usually stored in the same node.

Storage Architecture

Disk-oriented

Storage Model

Custom

Query Interface

Custom API

System Architecture

Shared-Nothing