Cosmos DB

CosmosDB is a globally distributed, consistent, schema-less, multi-model document database that provides high-throughput and availability across various geographical regions. It is used to solve data storage problems of large-scale distributed Internet-scale applications. Most of the Microsoft internal services such as Bing, Office 365, Ads, etc. and many other external services use Cosmos DB for their storage needs. It provides 99.99% availability regardless of a number of regions associated with data. It provides turn-key distribution, which can be used to replica data on specific replica instances to provide low-latency data accesses to users across the globe.

History

It started as an internal project called 'Florence' at Microsoft for storing large scale unstructured data generated by several of its internal services. It was later named as Document DB in 2014. It was released as an Azure service to the public as 'Azure Cosmos DB' in 2017.

Checkpoints

Non-Blocking

It performs checkpoints on its document Index (Bw-Tree) periodically to reduce the recovery time if a node fails.

Concurrency Control

Optimistic Concurrency Control (OCC)

Cosmos DB supports OCC for executing SQL transactions. It uses 'ETag' HTTP header to validate user-queries against the stored data to commit/abort a transaction

Data Model

Column Family / Wide-Column Key/Value Graph

It is a multi-model service and supports document, key-value, graph, and column-family data models.

Indexes

Bw-Tree

Cosmos DB is a NoSQL document database which performs Indexing directly on document's contents. The index is a union of all documents words and can be queried on any word of any document present in the database. It is represented as a schema-agnostic tree, where the tree nodes are all possible words of document set and values are the associated documents in which the word is present. To represent this schema-less index, Bw-Tree is used. To support fast random writes on SSDs and Disks, Cosmos DB also employs Log-Structured merge trees, to store Bw-Tree modifications. It uses delta-record updates instead of in-place updates in the tree to avoid cache invalidation and write amplification on SSDs. Cosmos DB supports blind-incremental updates to its Bw-Tree, so as to perform partial writes to any record without reading it to the memory.

As the database is distributed, Index modifications have to be replicated to all of the replicas of a data shard. Cosmos DB performs asynchronous replication of delta-records to make secondaries consistent with the primary replica. When a new document is created on the primary, it is completely analyzed to extract all of the words and these words are inserted into the Index, while also transferring the word stream to the secondaries.

Isolation Levels

Snapshot Isolation

Multi-document transactions are performed as JavaScript stored procedures using snapshot isolation.

Joins

Not Supported

Only self-joins are supported.

Query Interface

SQL Stored Procedures

It supports SQL, MongoDB, Cassandra, Gremlin, Table APIs.

Storage Architecture

Hybrid

It uses both in-memory and disk-based log-structured merge trees to store documents.

Storage Model

Hybrid

Stored Procedures

Supported

Application logic can be written as stored procedures, triggers, and user-defined functions (UDFs) using JavaScript.

System Architecture

Shared-Nothing

Cosmos DB service is deployed on several replicated shared-nothing nodes across geographical regions for high-availability, low-latency, and high throughput. Some or all of these distributed nodes form a replica set for serving requests on a data shard that contains documents. Among the replicas, one of them is elected as a master to perform totally-ordered writes on the data shard. Writes are done on the write-quorum (W), a subset of the replica nodes, to ensure that the data is durable. Reads are performed on read-quorum (R), a subset of replica nodes, to get the desired consistency levels (Strong, Bounded-staleness, Session, Consistent Prefix, Eventual) as configured by users.

Data is partitioned at logic level and is replicated at storage layer in terms of physical partitions to achieve desired availability and throughput.

Views

Not Supported

Cosmos DB uses fast Bw-Tree to support real-time queries. Views are not used.

People Also Viewed