Teradata is a relational database management system designed specifically for large warehouse applications, based on Massively Parallel Processing (MPP) architecture. It optimizes query performance through parallelism, using the AMP as a basic unit. AMPs (Access Module Processor) is a virtual processor used to manage the database. They receive execution plans from the parser and are able to receive, manipulate, and store data. A pre-defined number of AMPs is shared across a Teradata system to perform tasks including queries, dataload, backups, index builds, etc.
Teradata was founded based on research done in CalTech and incorporated in 1979 with the goal of creating a database computer that could beyond a terabyte of data. Teradata was acquired by NCR in 1991. In the following year, Teradata became the first system to store over 1 terabyte of data. In 2007, Teradata would be split off from NRC becoming its own separate entity. In 2011, Teradata would acquire both Aprimo and Aster Data Systems Inc. beginning Teradata’s involvement in the big data market. From 2016 - 2017, Teradata has become available on both AWS and Azure.
Teradata by default offers a Start-of-Data checkpoint and an End-of-Data checkpoint. If the job failed before the End-of-Data checkpoint was taken, all work done after the Start-of-Data checkpoint will be repeated by the restarted job. For further protection, Teradata also provides interval checkpoints. Instead of automated checkpointing, the user can specify a time interval for checkpoint placement.
Dictionary Encoding Run-Length Encoding Naïve (Page-Level)
Teradata allows for the following possible compression options.
Multivalue Compression (MVC)
MVC compresses repeating values in a column after specifying the value in the compression list in the column definition. When column data matches the specified value, the database stores the value once in the table header, regardless of how many times it occurs as a value in the column. No decompression is necessary when accessing the data from memory.
Algorithmic Compression
This is generally used as an alternative to MVC when column values are mostly unique. Teradata includes several standard compression algorithms that can be used to compress many dataypes including ARRAY, BYTE, VARBYTE, BLOB, CHARACTER, VARCHAR, CLOB, JSON, DATASET). Custom compression algorithms are also allowed to be used.
Row Compression
Row compression is a lossless method. This type of compression stores a repeating column value set a single time, while non-repeating column values that belong to that set are stored as extensions of the base set.
Two-Phase Locking (Deadlock Detection)
For concurrency control, Teradata utilizes the proxy lock strategy. The systems requires that each request has a pseudo lock on it before obtaining an actual lock. For each table, proxy locking defines an AMP which manages the actual locks. Lock requests are to be queued up on this AMP. It is noted that not all deadlocks can be prevented with this strategy. By default Teradata checks for deadlocks globally every four minutes and locally every thirty seconds. Teradata doesn’t adhere to two-phase locking.
Relational Document / XML Graph
Teradata supports a relational model. It also supports document store, Graph DBMS (Teradata Aster), and time series DBMS as secondary models.
Sort-Merge Join Index Nested Loop Join
Natural Join
All columns from both tables are compared and joined based on matching column names. The resulting table retains one column for each pair of matching columns.
Theta Join
This type of join merges tables based on conditions such as less than, greater than, equal, etc.
Teradata also supports inner and outer joins.
Tuple-at-a-Time Model Vectorized Model
During execution, the parsing engine first transforms the user-defined query into an execution plan which it relays to a message passing layer known as the BYNET which effectively serves as a communication channel between the parser and AMPs. Execution plans are sent from the parse to the AMP layer while results from the query are channeled upwards to the parser from the AMPs. Teradata supports query parallelism by hash partition data across all AMPs declared in the system. This way all relational operations can execute in parallel across AMPs. Teradata supports inter-operator parallelism as well as intra-operator parallelism. Teradata also utilizes a vectorized model of query execution.