LevelDB is a key/value store built by Google. It can support an ordered mapping from string keys to string values. The core storage architecture of LevelDB is a log-structured merge tree (LSM), which is a write-optimized B-tree variant. It is optimized for large sequential writes as opposed to small random writes.
Two-Phase Locking (Deadlock Prevention)
Leveldb only allow one process to open at one time. The operation system will use the locking scheme to prevent concurrent access. Within one process, Leveldb can be accessed by multiple threads. For multi-writers, it will only allow the first writer to write to database and other writers will be blocked. For read-write conflicts, readers can retrieve data from immutable which is seperated from writing process. The updated version will come into effect in compaction process.
Skip List Log-Structured Merge Tree
It uses skip list in MemTable. Aside from that, LSM-tree is one type of write-optimized B-tree variants consisting of key-value pairs. The LSM-tree is a persistent key-value store optimized for insertions and deletions. LevelDB is an open source LSM-tree implementation.
Keys and values in leveldb are byte arrays with arbitrary length. It supports basic operations like Put(), Get(), Delete(). It also support Batch operations: Batch(). The whole process of operations will run together and return result in a single Batch operation. However, it does not support SQL queries because this is not a SQL type database. Aside from that, it has no support for indexing.
It puts temporarily accessed data into MemTable and periodically moves data from MemTable into Immutable SSTable. Aside from that, it adopts compaction to reduce the invalid data in each level and then generates one new block at the next level.
It uses mmap or native read syscall to read SSTable for querying. The OS caches SSTable for LevelDB. As LevelDB supports using snappy to compress the value, to avoid uncompress data for each query, it introduces Cache.
N-ary Storage Model (Row/Record)
SSTable uses NSM to arrange data. It contains a set of arbitrary, sorted key-value pairs. At the end of the block, it provides the start offset and key value for each block. So bloom filter can be used to search for target block.
In leveldb immutable are stored on the disk which can be shared by different cluster nodes. There are totally 7 levels plus at most two in-memory tables. The procedure can be described as firstly the system buffers write operations in an in-memory table called MEMTable and flushes data to disk when it becomes full. On the disk, tables are organized into levels. Each level contains multiple tables called SSTable. The down level maintains larger capacity than the upper level. When the upper level is full, the system needs to push data to the down level, which might need to read and write multiple SSTables.
https://github.com/google/leveldb
2011
C++, Clojure, Cocoa, D, Dart, Delphi, Erlang, Go, Groovy, Haskell, Java, JavaScript, Lisp, Lua, Matlab, Objective-C, Ocaml, PHP, Python, Ruby, Scala, Visual Basic
BSD, iOS, Linux, OS X, Windows