Kdb+ supports on-disk compression with following algorithms:
kdb+ algorithm: default compression algorithm
gzip: supports different level of compression, larger compression rate needs more computation time
Google Snappy: time performance is better but compression rate is lower compared with previous two algorithms
Deterministic Concurrency Control
Kdb+ uses partition-based timestamp ordering. Each transaction gets their timestamp at the begin. And on each partition, transactions are executed in order of their timestamp.
Kdb+ has both in-memory and on-disk storage. New data is held in memory and old data is flushed to disk. The flush is controlled by event-engine. By default, event-engine will flush in-memory data to disk at daily basis. Rationale behind this design is the system wants to keep everyday new data in memory for fast query.
Decomposition Storage Model (Columnar)
Kdb+ uses DSM both for in-memory and on-disk storage.
Kdb+ uses Lambda architecture on each single node. It has the following properties:
Data currently using stores in memory, while historical data is stored on disk.
New data come in from streaming sources.
Event-engine distribute data to downstream subscribers, including real-time database engine and streaming query engine.
Real-time database projects its content down to on-disk historical database for analytic use at daily basis, controlled by event-engine.