Espresso

Espresso is an internal distributed document-oriented database management system written by LinkedIn. It serves as the source-of-truth primary store for many downstream systems and applications such as Company Pages, Unified Social Content Platform, and InMail. Because of its special role, the design principles of Espresso mainly focus on satisfying the requirements of production environments, which include but not limited to the guarantee of operability, availability, scalability, and elasticity. For operability, it not only provides flexible data model and APIs to support various applications but also has been designed to be highly compatible with the whole data ecosystem at LinkedIn. For availability, it has elaborately designed fault tolerance mechanism, e.g., there is always a warm standby of the Espresso clusters at a geographically remote disaster recovery data center. For scalability, most of the methods it adopts, including cluster management and data management, avoid centralized processing and synchronized operations. For elasticity, it supports online cluster expansion with little downtime.

Espresso employs RESTful APIs as its user-level interface. Because the interface is based on JSON over HTTP, it provides both flexibility and resource transparency for downstream application developers.

Espresso has two system-level internal building blocks: a cluster management system and a change capture system. Apache Helix, with Apache Zookeeper integrated in, is used to carry out the first function. LinkedIn Databus was used as the first generation change capture system but was later replaced by Kafka. The change capture system also plays an important role in data replication. Besides, another library-level building block of Espresso is Apache Lucene, which provide basic support fo full-text inverted index.

History

The Espresso project was first planned and designed in early 2011. Its mission at that time was to fill the vacancy that there exists no well-designed highly consistent database systems with both scalability and agility in LinkedIn's data infrastructure. Based on the experience of developing early relational database systems like Voldemort, the engineering team at LinkedIn spent one year writing Espresso and deployed it in production in June 2012.

Although LinkedIn once planed to open source Espresso, but the plan was later shelved. Currently, Espresso is still an internal system.

Data Model

Document / XML

Query Interface

Custom API

Storage Architecture

Disk-oriented

System Architecture

Shared-Nothing

Storage Model

Custom