CosmosDB is a globally distributed, consistent, schema-less, multi-model document database that provides high-throughput and availability across various geographical regions. It is used to solve data storage problems of large-scale distributed Internet-scale applications. Most of the Microsoft internal services such as Bing, Office 365, Ads, etc. and many other external services use Cosmos DB for their storage needs. It provides 99.99% availability regardless of a number of regions associated with data.
Cosmos DB is a NoSQL document database which performs Indexing directly on document's contents. The index is a union of all documents words and can be queried on any word of any document present in the database. It is represented as a schema-agnostic tree, where the tree nodes are all possible words of document set and values are the associated documents in which the word is present. To represent this schema-less index, Bw-Tree is used. To support fast random writes on SSDs and Disks, Cosmos DB also employs Log-Structured merge trees, to store Bw-Tree modifications. It uses delta-record updates instead of in-place updates in the tree to avoid cache invalidation and write amplification on SSDs. Cosmos DB supports blind-incremental updates to its Bw-Tree, so as to perform partial writes to any record without reading it to the memory.\
As the database is distributed, Index modifications have to be replicated to all of the replicas of a data shard. Cosmos DB performs asynchronous replication of delta-records to make secondaries consistent with the primary replica. When a new document is created on the primary, it is completely analyzed to extract all of the words and these words are inserted into the Index, while also transferring the word stream to the secondaries.
Cosmos DB service is deployed on several replicated shared-nothing nodes across geographical regions for high-availability, low-latency, and high throughput. Some or all of these distributed nodes form a replica set for serving requests on a data shard. Among the nodes, one of them is elected as a master to perform total-ordered writes on the data shard. Writes are done on the write-quorum (W), a subset of the replica nodes, to ensure that the data is durable. Reads are performed on read-quorum (R), a subset of replica nodes, to get the desired consistency levels (Strong, Bounded-staleness, Session, Consistent Prefix, Eventual) as configured by users.
Multi-version Concurrency Control (MVCC)
Cosmos DB is a service that runs on a massive scale distributed system on a large cluster of servers over multiple distributed geographic regions. To make sure that meta-data is consistent across the regions, Cosmos DB uses a Paxos like protocol.
https://azure.microsoft.com/en-us/services/cosmos-db/
https://docs.microsoft.com/en-us/azure/cosmos-db/
Microsoft
2015
DocumentDB
C#, Go, Java, JavaScript, Python