EsgynDB

EsgynDB provides operational ease of running OLTP, ODS, BI and Analytical workloads on a single Big Data platform. EsgynDB is the only Big Data SQL solution that provides a pluggable data management framework, for disparate data sources, to handle mixed workloads (read while writing in real-time) minimizing data movement and duplication. EsgynDB reduces ETL costs by transforming data (ELT) within the database. It’s Massively Parallel Processing (MPP) architecture ensures that the strictest SLAs can be accomplished by executing queries in parallel. A mature ANSI SQL engine enables application portability over HDFS-based Data Lakes; proven to handle petabytes of data at high concurrency, and a shared nothing architecture allowing for transparent scalability with increasing cluster or data size.

History

EsgynDB has 20+ years of innovation and patented technology. HP invested more than $300 million into this technology just since 2006. The result is a database that has extensive capabilities and has been hardened across a wide variety of workloads, from OLTP, with its Tandem NonStop heritage.

Isolation Levels

Read Committed

In a high concurrency environment traditional 2 phase locking (2PL) databases suffer because transactions must wait to acquire locks, which can be expensive. A common approach to raise the concurrency is to use MVCC where each Write operation is assigned a timestamp or version and multiple versions of a row are maintained in the database. MVCC alone has limitations that allow for both P2 and P3, however EsgynDB uses an Optimistic Concurrency Control (OCC) algorithm during the voting phase of the transaction and blocks the transaction from committing. Although MVCC allows a given transaction to see both P2 and P3 phenomenon while the transaction is active, OCC detects the phenomenon and prevents the transaction from committing.

The SQL Isolation level provided by EsgynDB is greater than Read Committed and less than Snapshot Isolation, however, data integrity is guaranteed by the OCC algorithm at commit time, so the correctness of data is guaranteed regardless of whether or not a phenomenon might have occurred within the transaction. EsgynDB cannot distinguish between the case where a phenomenon might have occurred versus when one actually did occur. In order to ensure the absolute highest data integrity, we opt for the strictest rules and prevent transaction commit if a phenomenon might have occurred.

Logging

Logical Logging

Views

Materialized Views

EsgynDB handles the mapping of SQL requests from the application transparently into native calls for the format being accessed. It provides a relational schema abstraction on top of HBase. In this way, traditional relational database objects (tables, views, secondary indexes) are supported using familiar DDL/DML semantics including object naming, column definition and data type support. It supports tables in these formats as external tables as well.

Indexes

K-D Tree

EsgynDB uses a patented technology called Multi-dimensional Access Method (MDAM) that efficiently uses index even when leading columns of the index don’t have predicates on them, substantially reduces the need for secondary indexes and facilitates salting & computed / divisioning columns to cluster data.

Stored Procedures

Supported

EsgynDB supports Stored Procedures in Java (SPJ) so that users can write operational or business procedures in Java code, to be executed on the server side, when invoked by client applications. When there are multiple SQL statement executions as part of a procedure, with data needing to be exchanged between the client and server for each of these statements, it is more efficient to use stored procedures to push that processing to the server side. Oracle PL/SQL and ANSI SQL stored procedures can be converted to SPJs. EsgynDB also supports Apache Hive HPL/SQL stored procedures natively.

In addition EsgynDB supports generalized User Defined Functions (UDFs). UDFs can be written in Java or C++. Several kinds of UDFs are supported including scalar UDFs and table-valued UDFs.

Compression

Dictionary Encoding

Joins

Hash Join

Complex join support including nested joins, hash joins and merge joins

Query Execution

Tuple-at-a-Time Model

EsgynDB’ s SQL executor uses a data flow and scheduler-driven task model to execute the optimized query plan. Each operator of the plan is an independent task and data flows between operators through in-memory queues (up and down) or by inter-process communication. Queues between tasks allow operators to exchange multiple requests or result rows at a time. A scheduler coordinates the execution of tasks and runs whenever it has data in one of its operators' input queues. In most cases, the EsgynDB executor can process queries with data flowing entirely through memory, providing superior performance and reduced dependency on disk space and I/O bandwidth. Only for a large hash join or sort, where EsgynDB detects memory pressure, does it overflow to disk. The executor incorporates several types of parallelism, such as partitioned, pipeline and operator parallelism.

Concurrency Control

Optimistic Concurrency Control (OCC)

EsgynDB distributed transaction manager (DTM) provides support for implicit (auto-commit) and explicit (BEGIN, COMMIT, ROLLBACK WORK) transaction control. EsgynDB allows multiple transactions to be accessing the same rows concurrently. However, in the case of update, the first transaction to commit wins and other transactions are notified at commit that the transaction failed due to update conflict.

Foreign Keys

Supported

System Architecture

Shared-Nothing

Elastic scalability: nodes and storage across the cluster can be dynamically increased or decreased with no downtime, while transactions and queries are being processed. The very next transaction or query will leverage the reconfigured compute or storage resources.

Query Interface

SQL

Access to EsgynDB services layer is completed via a standard ODBC/JDBC interface using an Esgyn supplied Windows or Linux client driver. Both type 2 and type 4 JDBC drivers are supported and the choice is dependent on the application requirements for response times, number of connections, security, and other factors. EsgynDB extends that support with an ADO.NET driver.

EsgynDB SQL supports more than 150 built-in functions that include aggregate, character, date-time, mathematical, and OLAP functions. These include all ANSI functions as well as many commonly used functions from other databases, such as DECODE and TO_CHAR from Oracle. With support for BI and analytical workloads EsgynDB now sports more OLAP functions such as LEAD and LAG. It supports multiple windows in an OLAP query. It has functionality common to such workloads, such as ROLLUP and GROUPING. INTERSECT and EXCEPT operations have been added.

Query Compilation

Code Generation

EsgynDB’ s optimizer is based on the Cascades optimization framework. SQL Normalizer – the parsed SQL statement is passed to the normalizer which performs unconditional transformations, including subquery transformations, of the SQL into a canonical form which renders the SQL in a form that can be optimized internally. SQL Analyzer – analyzes alternative join connectivity patterns, table access paths and methods, matching partition information, etc. to be used by the optimizer’s rules. The results are passed to the plan generator for consideration in costing various plan alternatives. Table Statistics – captured equal-height histogram statistics identify data distributions for column data and correlations between columns. Sampling is used for large tables to reduce the overhead of generating the statistics. Cardinality Estimator – cardinalities, data skew, and histograms are computed for intermediate results throughout the operator tree. Cost Estimator – estimates node, I/O, and message cost for each operator while accounting for data skew at the operator level. Plan Generator – using cost estimates the optimizer considers alternative plans and chooses the plan which has the lowest cost. Where feasible the optimizer will select plans that incorporate SQL pushdown, sort elimination, and in-memory storage vs. overflow to disk. Also, it determines the optimal degree of parallelism including serial plans.

Storage Model

Hybrid

EsgynDB database objects are stored into native Hadoop database structures. These include the following formats: HBase that provides a Big Table, or wide-column key-value, data model, ORC or Parquet files which provides a column store data model, text files used for staging data such as comma separated value or log data and key-value sequence file

Storage Architecture

Disk-oriented

Data Model

Relational

EsgynDB is relational, supports ANSI SQL and ACID transactions. EsgynDB can query HBase structured tables, native HBase tables, Hive tables (text, ORC and Parquet format) in the same query. Therefore EsgynDB can access in the same query structured, semi-structured and unstructured data. HBase or Hive tables can be defined as EsgynDB external tables.

EsgynDB Logo
Website

https://esgyn.com/

Tech Docs

https://esgyn.com/developers/

Developer

https://esgyn.com/developers/

Country of Origin

US

Start Year

2015

Project Type

Commercial

Written in

C++, Java

Supported languages

C, C#, C++, Java, PL/SQL, SQL

Derived From

Trafodion

Operating Systems

Linux