Espresso

Espresso is an internal distributed document-oriented database management system written by LinkedIn. It serves as the source-of-truth primary store for many downstream systems and applications, e.g., company pages and InMail. Because of its special role, the design principles of Espresso heavily focus on satisfying the requirements of production environments, which include but not limited to the guarantee of operability, availability, scalability, and elasticity. For operability, it not only provides flexible data model and APIs to support various applications but also has been designed to highly compatible with the whole production data ecosystem at LinkedIn. For availability, it has elaborately designed fault tolerance mechanism, e.g. it has a warm standby at a geographically remote disaster recovery data center. For scalability, most of the methods it adopts, including cluster management and data management, avoid centralized processing and synchronized operations. For elasticity, it supports online cluster expansion almost without downtime.

Espresso employs RESTful APIs as its user-level interface, which provide both flexibility and resource transparency for downstream application developers. They are based on HTTP protocols and Espresso therefore is language-agnostic.

Espresso has two internal building blocks in system level: a cluster management system and a change capture system. Apache Helix, with Apache Zookeeper integrated in, is used to carry out the first function. LinkedIn Databus was used as the first generation change capture system but was later replaced by Kafka. The change capture system also plays an important role in data replication. Besides, another library building block of Espresso is Apache Lucene, which supports inverted index.

Data Model

Document / XML

Storage Model

Custom

Query Interface

Custom API

Storage Architecture

Disk-oriented

System Architecture

Shared-Nothing