LevelDB is one key/value store built by Google. It can support an ordered mapping from string to string. LSM-tree is one type of write-optimized B-tree variants consisting of key-value pairs. It allows large sequential writes as opposed to small random writes. LevelDB is an open source LSM-tree implementation.
Two googlers Jeff Dean and Sanjay Ghemawat were inspired by the design scheme of bigtable tablet. Tablets in bigtable are defined as segments of the table split along chosen row. They wanted to build one open-source system containing the characteristic of bigtable tablet. Aside from that, they hoped leveldb can support chrome in its IndexedDB implementation. This is the origin of leveldb.
Keys and values in leveldb are byte arrays with arbitrary length. It supports basic operations like Put(), Get(), Delete(). It also support Batch operations: Batch(). The whole process of operations will run together and return result in a single Batch operation. However, it does not support SQL queries because this is not a SQL type database. Aside from that, it has no support for indexing.
N-ary Storage Model (Row/Record)
SSTable uses NSM to arrange data. It contains a set of arbitrary, sorted key-value pairs. At the end of the block, it provides the start offset and key value for each block. So bloom filter can be used to search for target block.
In leveldb immutable are stored on the disk which can be shared by different cluster nodes. There are totally 7 levels plus at most two in-memory tables. The procedure can be described as firstly the system buffers write operations in an in-memory table called MEMTable and flushes data to disk when it becomes full. On the disk, tables are organized into levels. Each level contains multiple tables called SSTable. The down level maintains larger capacity than the upper level. When the upper level is full, the system needs to push data to the down level, which might need to read and write multiple SSTables.
Two-Phase Locking (Deadlock Prevention)
Leveldb only allow one process to open at one time. The operation system will use the locking scheme to prevent concurrent access. Within one process, Leveldb can be accessed by multiple threads. For multi-writers, it will only allow the first writer to write to database and other writers will be blocked. For read-write conflicts, readers can retrieve data from immutable which is seperated from writing process. The updated version will come into effect in compaction process.
https://github.com/google/leveldb
https://github.com/google/leveldb
2011
C++, Clojure, Cocoa, D, Dart, Delphi, Erlang, Go, Groovy, Haskell, Java, JavaScript, Lisp, Lua, Matlab, Objective-C, Ocaml, PHP, Python, Ruby, Scala, Visual Basic