CouchDB

View Current Viewing Revision #8 from 11/30/2018 6:20 p.m.

CouchDB ("cluster of unreliable commodity hardware") is a document-oriented NoSQL DBMS.

History

Couch is an acronym for cluster of unreliable commodity hardware. The CouchDB project was created in April 2005 by Damien Katz, former Lotus Notes developer at IBM. He self-funded the project for almost two years and released it as an open source project under the GNU General Public License.

In February 2008, it became an Apache Incubator project and was offered under the Apache License instead.[4] A few months after, it graduated to a top-level project. This led to the first stable version being released in July 2010.

In early 2012, Katz left the project to focus on Couchbase Server.

Since Katz's departure, the Apache CouchDB project has continued, releasing 1.2 in April 2012 and 1.3 in April 2013. In July 2013, the CouchDB community merged the codebase for BigCouch, Cloudant's clustered version of CouchDB, into the Apache project. The BigCouch clustering framework is included in the current release of Apache CouchDB.

Native clustering is supported at version 2.0.0. And the new Mango Query Server provides a simple JSON-based way to perform CouchDB queries without JavaScript or MapReduce.

Compression

Naïve (Record-Level)

CouchDB does compaction operation to reduce the disk usage similar like the vacuum in SQLite. The number of stored revisions (and their tombstones) can be configured by using the _revs_limit URL endpoint. The compaction operations can either be manually triggered or automatically.

Checkpoints

Non-Blocking

In CouchDB, any changes to a document simply appends a new record to the database file, it is always non-blocking to take a snapshot of the file system to get the latest version of the database.

Isolation Levels

Snapshot Isolation

In CouchDB, a read request will always see the most recent snapshot of the database at the time of the beginning of the request because of MVCC.

Indexes

B+Tree

The documents in CouchDB are indexed by their name and sequence id, these index are organized by B-trees.

Logging

Shadow Paging

CouchDB uses shadow paging as its logging method, it only does appending operations to the current database file, which provides the MVCC features.

Query Interface

HTTP / REST

CouchDB provide RESTful HTTP API for reading and updating database documents.

Storage Model

Custom

As the CouchDB is append-only, the critical header of the database file is in the tail of the file, which will be access/re-append by each append operation.

The values in the body of a file header is:

8 bits -- File format version (Currently 10) 48 bits -- Update sequence number counter. This is the sequence number that will appear in the by-sequence index for the next update. 48 bits -- Purge sequence number. 48 bits -- Purged documents pointer 16 bits -- Size of by-sequence B-tree root 16 bits -- Size of by-ID B-tree root 16 bits -- Size of local documents B-tree root The B-tree roots, in the order of the sizes, are B-tree node pointers as described in the "Node Pointers" section.

To locate the file header, the database file are organized as 4096-byte file blocks.

The data in the file are organized as variable-length chunks.

Stored Procedures

Not Supported

Storage Architecture

Disk-oriented

CouchDB will store data on disk and all update are synchronously flushed to disk.

Joins

Not Supported

The data in CouchDB are store as documents, which is unnecessary for joins operations. The way to replace join operation is to do denormalization or stored with related data in documents.