CouchDB ("cluster of unreliable commodity hardware") is a document-oriented NoSQL DBMS.[04]
- Source Code
- https://github.com/apache/couchdb[02]
- Developer
- Country of Origin
- US
- Start Year
- 2005 [04]
- Acquired By
- Project Type
- Open Source
- Written in
- Erlang
- Supported Languages
- C, C#, Erlang, Haskell, Java, JavaScript, Lisp, Lua, Objective-C, Ocaml, Perl, PHP, PL/SQL, Python, Ruby, Smalltalk
- License
- Apache v2
CouchDB ("cluster of unreliable commodity hardware") is a document-oriented NoSQL DBMS.[04]
History[04]
Couch is an acronym for cluster of unreliable commodity hardware. The CouchDB project was created in April 2005 by Damien Katz, former Lotus Notes developer at IBM. He self-funded the project for almost two years and released it as an open source project under the GNU General Public License.
In February 2008, it became an Apache Incubator project and was offered under the Apache License instead.[4] A few months after, it graduated to a top-level project. This led to the first stable version being released in July 2010.
In early 2012, Katz left the project to focus on Couchbase Server.
Since Katz's departure, the Apache CouchDB project has continued, releasing 1.2 in April 2012 and 1.3 in April 2013. In July 2013, the CouchDB community merged the codebase for BigCouch, Cloudant's clustered version of CouchDB, into the Apache project. The BigCouch clustering framework is included in the current release of Apache CouchDB.
Native clustering is supported at version 2.0.0. And the new Mango Query Server provides a simple JSON-based way to perform CouchDB queries without JavaScript or MapReduce.
Checkpoints[05]
In CouchDB, any changes to a document simply appends a new record to the database file, it is always non-blocking to take a snapshot of the file system to get the latest version of the database.
Compression[06]
CouchDB does compaction operation to reduce the disk usage similar like the vacuum in SQLite. The number of stored revisions (and their tombstones) can be configured by using the _revs_limit URL endpoint. The compaction operations can either be manually triggered or automatically.
Concurrency Control[07]
The CouchDB uses MVCC as the concurrency control policy for read operations, each clients sees a consistent snapshot during the read operation.
Indexes[10][11]
The documents in CouchDB are indexed by their name and sequence id, these index are organized by B-trees.
Isolation Levels[12]
In CouchDB, a read request will always see the most recent snapshot of the database at the time of the beginning of the request because of MVCC.
Joins[13]
The data in CouchDB are store as documents, which is unnecessary for joins operations. The way to replace join operation is to do denormalization or stored with related data in documents.
Logging[05]
CouchDB uses shadow paging as its logging method, it only does appending operations to the current database file, which provides the MVCC features.
Storage Architecture[10]
CouchDB will store data on disk and all update are synchronously flushed to disk.
Storage Model[15]
As the CouchDB is append-only, the critical header of the database file is in the tail of the file, which will be access/re-append by each append operation.
The values in the body of a file header is:
8 bits -- File format version (Currently 10)
48 bits -- Update sequence number counter. This is the sequence number that will appear in the by-sequence index for the next update.
48 bits -- Purge sequence number.
48 bits -- Purged documents pointer
16 bits -- Size of by-sequence B-tree root
16 bits -- Size of by-ID B-tree root
16 bits -- Size of local documents B-tree root
The B-tree roots, in the order of the sizes, are B-tree node pointers as described in the "Node Pointers" section.
To locate the file header, the database file are organized as 4096-byte file blocks.
The data in the file are organized as variable-length chunks.
Storage Organization[16]
CouchDB implements append-only B+Tree and uses copy-on-write method to update the database file as well as the index.
System Architecture[17]
CouchDB is a peer-based distributed database system, each peer can provide same data for user to access. They do not share anything.
Citations
18 sources- Apache CouchDB apache.org
- GitHub - apache/couchdb: Seamless multi-primary syncing database with an intuitive HTTP/JSON API, designed for reliability · GitHub github.com
- http://docs.couchdb.org/en/stable/index.html couchdb.org
- Apache CouchDB - Wikipedia wikipedia.org
- 1.1. Technical Overview — Apache CouchDB® 3.5 Documentation couchdb.org
- 5.1. Compaction — Apache CouchDB® 3.5 Documentation couchdb.org
- 1.1. Technical Overview — Apache CouchDB® 3.5 Documentation couchdb.org
- 2. JSON Structure Reference — Apache CouchDB® 3.5 Documentation couchdb.org
- 1.7. The Core API — Apache CouchDB® 3.5 Documentation couchdb.org
- 1.1. Technical Overview — Apache CouchDB® 3.5 Documentation couchdb.org
- https://github.com/apache/couchdb/blob/master/src/couch/src/couch_btree.erl github.com
- 1.3. Eventual Consistency — Apache CouchDB® 3.5 Documentation couchdb.org
- 3.2.3. Joins With Views — Apache CouchDB® 3.5 Documentation couchdb.org
- 1.1. Technical Overview — Apache CouchDB® 3.5 Documentation couchdb.org
- Format · couchbaselabs/couchstore Wiki · GitHub github.com
- The Power of B-trees couchdb.org
- 1.1. Technical Overview — Apache CouchDB® 3.5 Documentation couchdb.org
- Apache CouchDB apache.org