DBDB.io The Encyclopedia of Database Systems · Est. 2017
Database of Databases

Database Entry

Materialize


Materialize is a streaming database. It ingests data from sources (i.e., Kafka, Postgres CDC), incrementally maintains materialize views, and allows users to query or subscribe the data using Postgres wire protocol or sinks data to sinks like Kafka.

Source Code
https://github.com/MaterializeInc/materialize[02]
Country of Origin
US
Start Year
2019 [16]
Coding Agents
Project Type
Commercial
Written in
Rust
Compatible With
PostgreSQL
Operating System
Linux

Database Entry

Materialize


Materialize is a streaming database. It ingests data from sources (i.e., Kafka, Postgres CDC), incrementally maintains materialize views, and allows users to query or subscribe the data using Postgres wire protocol or sinks data to sinks like Kafka.

History[04]


Materialize builds on top of differential dataflow, which is a differential computation framework for timely dataflow, introduced in Naiad system from Microsoft Research.

Before the v0.27 release, Materialize mainly supports in-memory computation. Materialized views are only maintained in memory. It requires the data source to persist and replay the changes after restarting Materialize. On October 12, 2022, Materialize 0.26 LTS entered long-term-support mode and 0.27 debuted as the cloud-native release of Materialize. It added support for source data persistence, materialized views with data replicated and persisted within cluster, and other changes for the cloud platform.

Materialize is licensed under BSL license. Each version of Materialize will be transformed to Apache 2.0 License 4 years after its release.

Checkpoints


Concurrency Control


Data Model


Foreign Keys


Indexes[05]


Differential dataflow stores changes of a sequence of data (collection) by key and time in an in-memory index called arrangement, so that operators can query the changes of value for a given key in a period of time.

The index is maintained in-memory and provides multi-version access to the data.

Isolation Levels[06][07]


Materialize supports strong consistency stream processing, and two isolation levels: serializable and strict serializable.

As a streaming system, Materialize ingests data with timestamps and do not directly handle transactions. Upstream systems like Postgres CDC will send data in transition commit order. Data are processed in order of the row's ingestion timestamp, or virtual time.

Materialize supports strong consistency in terms of consistency in streaming processing, which means the result stored in materialized views should be the same as executing a query over the collection of the source data at a given time. In contrary, an eventual consistency streaming system may produce a materialized view which does not correspond to a query over the source data collection at a given time.

If a user creates non-streaming source (i.e., a table) in Materialize, strict serializable isolation level guarantees reads from materialized views always reflect the insertions, by waiting the changes to propagate to all materialized views. Otherwise, serializable level only guarantees a user can read a consistent snapshot over all materialized views.

Joins[08][09]


Differential dataflow only supports index joins. If a user requests to join two tables, differential dataflow will first create an index (arrangement) over the two tables using join columns as the index key, and then perform index joins.

Materialize supports joins by using index joins in differential dataflow.

If user has already created all required indexes for a multi-way join in advance, Materialize will use a special kind of join strategy called delta join to execute the query. Delta join supports joining multiple tables, does not maintain any additional data structures and therefore achieves no additional storage and memory cost.

Parallel Execution[10]


Timely dataflow supports running different operators in the same query plan on different workers, therefore achieving inter query parallelism. At the same time, it also shards a logical operator to multiple physical ones, so as to operate on independent parts of the data concurrently.

Query Execution


Query Interface[11]


SQL

Materialize supports Postgres SQL with custom APIs. For example, CREATE CONNECTION, CREATE SOURCE and CREATE SINK are used to connect Materialize to external sources and sinks (i.e., Kafka, Postgres CDC). SUBSCRIBE allows users to retrieve streaming updates from the Postgres wire protocol.

Storage Architecture[12]


Materialize supports storing data to different storage backends, including S3, filesystem, and in-memory backend (for testing).

Storage Model[13]


Materialize persists data in the form of key-value pairs.

As of Materialize 0.48, the storage component is only used for persisting sources and materialized views. Materialize has a distributed key-value storage service, and the compute workers can read data from and write data to the storage service. Indexes and internal states are only maintained in memory.

Storage Organization


System Architecture


Views[14][15]


Materialize supports two kinds of views: temporary views and materialized views.

Users can create temporary views by using CREATE VIEW statement. It simply stores the SQL query in the system and can be used in other queries within the session.

Users can create materialized views by using CREATE MATERIALIZED VIEW. The views are incrementally maintained and reflect real-time updates. The content of materialized views are persisted into the storage service and users can directly query the materialized views to get latest data. Users can also subscribe materialized views to get real-time updates or sink the data to third-party services.

Citations

18 sources
  1. The Live Data Layer for Apps and AI Agents | Materialize materialize.com
  2. GitHub - MaterializeInc/materialize: The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL · GitHub github.com
  3. Home | Materialize Documentation materialize.com
  4. About Us & Our Values | Materialize materialize.com
  5. https://etos.cs.brown.edu/publications/papers/shared-arrangements:vldb20.pdf brown.edu Dead — Check Archive
  6. Consistency Guarantees in Data Streaming | Materialize materialize.com
  7. Blog | Materialize materialize.com
  8. Joins in Materialize | Materialize materialize.com
  9. Delta Joins and Late Materialization | Materialize materialize.com
  10. When to use Timely Dataflow github.io
  11. SQL commands | Materialize Documentation materialize.com
  12. materialize/src/persist/src at main · MaterializeInc/materialize · GitHub github.com
  13. materialize/doc/developer/platform/architecture-storage.md at main · MaterializeInc/materialize · GitHub github.com
  14. CREATE MATERIALIZED VIEW | Materialize Documentation materialize.com
  15. CREATE VIEW | Materialize Documentation materialize.com
  16. Initial commit github.com
  17. https://github.com/MaterializeInc/materialize/commit/248f143a50aa926a694bba6748d7bdf4e3f6cc6b github.com
  18. https://github.com/MaterializeInc/materialize/commit/16d611fb45fd6d02840f23772d460e0d6c0ba48e github.com
Revision #8 Last Updated: