LevelDB is one key/value store built by Google. It can support an ordered mapping from string to string. LSM-tree is one type of write-optimized B-tree variants consisting of key-value pairs. It allows large sequential writes as opposed to small random writes. LevelDB is an open source LSM-tree implementation.
Two googlers Jeff Dean and Sanjay Ghemawat were inspired by the design scheme of bigtable tablet. Tablets in bigtable are defined as segments of the table split along chosen row. They wanted to build one open-source system containing the characteristic of bigtable tablet. Aside from that, they hoped leveldb can support chrome in its IndexedDB implementation. This is the origin of leveldb.
It puts temporarily accessed data into MemTable and periodically move data from MemTable into Immutable MemTable. Aside form that, it adopts compaction to reduce the invalid data in each level and then generates one new block at next level.
Before every insertion, update or delete, system need to add the message to log. In case of node's failure, uncommited messages can be retrieved and do operation again for recovery.
Keys and values in leveldb are byte arrays with arbitrary length. It supports basic operations like Put(), Get(), Delete(). It also support Batch operations: Batch(). The whole process of operations will run together and return result in a single Batch operation. However, it does not support SQL queries because this is not a SQL type database. Aside from that, it has no support for indexing.
SSTable uses NSM to arrange data. It contains a set of arbitrary, sorted key-value pairs. At the end of the block, it provides the start offset and key value for each block. So bloom filter can be used to search for target block.
In leveldb immutable are stored on the disk which can be shared by different cluster nodes. There are totally 7 levels plus at most two in-memory tables. The procedure can be described as firstly the system buffers write operations in an in-memory table called MEMTable and flushes data to disk when it becomes full. On the disk, tables are organized into levels. Each level contains multiple tables called SSTable. The down level maintains larger capacity than the upper level. When the upper level is full, the system needs to push data to the down level, which might need to read and write multiple SSTables.
Leveldb only allow one process to open at one time. The operation system will use the locking scheme to prevent concurrent access. Within one process, Leveldb can be accessed by multiple threads. For multi-writers, it will only allow the first writer to write to database and other writers will be blocked. For read-write conflicts, readers can retrieve data from immutable which is seperated from writing process. The updated version will come into effect in compaction process.
When operation logging file exceeds over the limit, it will do checkpoints. Data will be flushed to the disk. And compaction scheme will be called. So data will go down levels. Aside from that, leveldb will generate new logging file and memtable for new use.
It uses skip list in MemTable. Aside from that, LSM-tree is one type of write-optimized B-tree variants consisting of key-value pairs. The LSM-tree is a persistent key-value store optimized for insertions and deletions. LevelDB is an open source LSM-tree implementation.