Accumulo

Viewing Revision #18 from 2022-06-27 03:18 View Current

Apache Accumulo is a sorted, distributed key-value store based on Google's Bigtable, HDFS and Apache Zookeeper. First designed and developed by a team in NSA, Accumulo's mission is to support big data storing and processing, but at the same time enforce fine-grained data access control. In particular, the team in NSA extends Bigtable in a way that Accumulo can control the access of individual data elements. Accumulo is currently an open source project under Apache v2 license.[06][07][08][09]

Logo Versions

Website: https://accumulo.apache.org/[01]
Source Code: https://github.com/apache/accumulo[02] Accessed: Jul 26, 2026 Last Commit: Jul 24, 2026
Tech Docs: https://accumulo.apache.org/docs/2.x/[03]
Developer: National Security Agency
Country of Origin: US
Start Year: 2008 [08]
Project Type: Open Source
Written in: Java
Inspired By: Cloud BigTable
License: Apache v2
Twitter: @ApacheAccumulo[05]
Wikipedia: https://en.wikipedia.org/wiki/Apache_Accumulo[04]

Database Entry

Accumulo

Viewing Revision #18 from 2022-06-27 03:18 View Current

NoSQL

History[08]

A team of software engineers at the United States' National Security Agency (NSA) started building Accumulo as a clone of Google's BigTable system in July 2008. They later released Accumulo as a public open source incubator project on Apache Software Foundation in September 2011.

Compression[08]

Prefix Compression

Accumulo emploies two compression techniques. The first one is running GZip or LZO on blocks of data that are stored on disk. The second one is relative-key encoding, which allows the common prefixes of keys to be stored only once, and the following keys only need to store the difference.

Concurrency Control[08]

Multi-version Concurrency Control (MVCC)

Accumulo guarantees ACID properties per row.

Data Model[10]

Column Family / Wide-Column

Based on Google's Bigtable, Accumulo is a column-family DBMS. It stores key-value pairs on disk and always keeps the keys sorted. Values are stored as byte arrays and their size or type are not restricted. Keys consist of three components: a row ID, a column and a time stamp. Keys are sorted first by row IDs, then column, and finally time stamps. This implies that values in the same row will be stored together, and that different rows don't have to contain the same number of columns. Time stamps are used to support multi-versioning of the same key. The column component in the key can be further divided into three fields: column families, column qualifiers and column visibility. Column families are defined by the application designer to group columns with similar functions, so that Accumulo will store them close on disk for faster access. Note that unlike Bigtable and HBase, Accumulo column families need not be declared before use. Column visibility is Accumulo's unique feature; this allows Accumulo to store data with different sensitivity to be stored on the same physical tables.

Query Interface[11][12]

Custom API Command-line / Shell

Accumulo provides the user with two ways to interact with the system. The first one is to use a client. It supports C++, Python, Java and Ruby. It also has a simple shell that allows the user to examine the content, update configuration settings, insert/update/delete values, etc.

Storage Architecture[13]

Disk-oriented

Accumulo is a disk-oriented database that relies on HDFS to store data.

Storage Model[08]

N-ary Storage Model (Row/Record)

Accumulo is schema-less column-family key-value datastore. As described in Data Model section, key value pairs are stored together, sorted by row ID, column and time stamp.

Stored Procedures

Not Supported

System Architecture[08]

Shared-Disk

Relying on HDFS to manage files, Accumulo applies a Shared-Nothing architecture. Each node of Accumulo has its own CPU, memory and disk, and owns a shard of data. Since each table are partitioned into tablets and scattered in different nodes, these nodes of Accumulo are also called tablet servers. Accumulo is also capable of splitting a large tablet into two and redistributing them as new data arrive.

Unlike some other DBMS, since Accumulo maintains sorted key-value pairs, data are partitioned using sorting instead of hashing. Since disks are faster in sequential access than random access, distribution using sorting allows Accumulo to scan consecutive keys faster than systems that use hashing. However, this incurs the overhead of storing the mapping between portion of sorted set of key-value pairs and tablet servers. This mapping is stored in metadata table.

Views

Not Supported

Compatible Systems

PrestoDB

Derivative Systems

Sqrrl

Embeddings

Rya

Citations

13 sources

Apache Accumulo apache.org Modified: 2026-07-08 Accessed: 2026-07-14
GitHub - apache/accumulo: Apache Accumulo · GitHub github.com Accessed: 2026-06-03
Accumulo Documentation - Setup apache.org Modified: 2026-02-09 Accessed: 2026-06-05
Apache Accumulo - Wikipedia wikipedia.org Modified: 2025-12-30 Accessed: 2026-06-04
https://twitter.com/ApacheAccumulo twitter.com
Open Source & Open Standards | Cloudera hortonworks.com Dead — Check Archive Modified: 2026-05-24 Accessed: 2026-06-07
https://accumulo.apache.org/1.7/accumulo_user_manual.html#_introduction apache.org Dead — Check Archive Accessed: 2026-05-24
Architecture and Data Model - Accumulo [Book] oreilly.com Dead — Check Archive Accessed: 2026-06-07
Bigtable - Wikipedia wikipedia.org Modified: 2026-01-29 Accessed: 2026-06-04
https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf googleusercontent.com Dead — Check Archive Accessed: 2026-05-24
Accumulo Documentation - Accumulo Shell apache.org Modified: 2026-02-09 Accessed: 2026-06-07
Accumulo Documentation - Accumulo Clients apache.org Modified: 2026-02-09 Accessed: 2026-06-07
Accumulo Documentation - Setup apache.org Modified: 2026-02-09 Accessed: 2026-06-08

Revision #18 Last Updated: 2022-06-26 23:18