Infobright

Infobright is a column-oriented, high performance analytic engine which is suitable for fast query of large amount of data. Not only is infobright is fast at query, the data compression is much higher than database engine. Infobright utilizes MySQL database environment, which means infobright is fully compatible with business intelligence tools of MySQL and reduce the learning curve for customer. Infobright use columnar approach to manage database. The architecture of Infobright contains three parts: Data Pack(DP), DPN (Data Pack Node), KN (Knowledge Node) and those three parts generates the knowledge grid of Infobirght. When data is loaded into a table, the data is broken into different groups with fixed number rows and then decompose these data into separate data packs for each of the columns. As a result, each column has the same number of rows and this column structure is better at data compression compared with row-oriented database. The average data pack compression ratio is approximate 20:1 and the infobright can handle up to 50TB data for data analytics applications. One interesting about infobright is that it is more suitable for data analyze because it does not support INSERT, DELETE, UPDATE operations

History

: In 2005, infobright was founded and issued the first free release of its software in September 2008 and launched its community [1] at the same time and become an open source company at that time. In 2009, infobright publish patent on data compression[2], query optimization[3] and data organization[4]. At the same year, Sun Unified Storage produce certified to use infobright and report huge improve for query speed [5]. In February 2016, infobright is recognized by Gartner as a challenger in Data Warehouse and Data Management Solutions for Analytics space with HPE, Amazon Web Services, 1010 data and MarkLogic [6]. In July 2016, Infobright transited away from its open source community edition to direct customer markets and original equipment manufacturer (OME)

Query Interface

Custom API

The API infobright supports are ODBC, JDBC, C API, C++, Delphi, Eiffel, Java, SmallTalk, Lisp, REALbasic, PHP, Visual Basic, Ruby, Perl and Python.

Views

Virtual Views

Infobright only support views but not materialized views. Infobright is good at supporting complex query. Infobright use knowledge grid to narrow down the complex query over large dataset. In addition to that, infobright support approximate query which can greatly reduce the query time for massive amount of data.

Concurrency Control

Two-Phase Locking (Deadlock Detection)

It supports ACID transaction. It is immediate consistency. Since infobright do not support UPDATE, DELETE, INSERT, ALTER, FOREIGN KEY, the concurrency control is relative easy for infobright. When they need to do some modifications, infobright will use table lock for the whole table.

Joins

Nested Loop Join

Infobright use knowledge grid to do join. This knowledge grid tries to minimize the number of decompress data as much as possible. Take the following query as an examples: The query is: SELECT MAX(X.D) FROM T JOIN X ON T.B = X.C WHERE T.A > 6 Suppose the Data Pack is as follows: "https://drive.google.com/open?id=0B1fwCLZ9xWQtbGJiRkRzaERDSmM", The knowledge grid keeps the inter relationship between different tables, the knowledge grid is as follows: (0 means one DPN in one table has no common data with other DPN in other table, 1 means DPN in one table has common data with other DPN in other table). Please see "https://drive.google.com/open?id=0B1fwCLZ9xWQtQUM0bFVXWTNxRUU" for reference. Infobright handle the query like this: First, use DPNs info to find Packs meet the where requirement, here the requirement is T.A > 6, so Infobright optimizer marked A1, A2 A4 as irrelevant, A3 is relevant, A5 is suspect. So for table T, only A3,A5,B3,B5 are left. From knowledge grid table, it is easy to see that B3 has no relation with X, so B3 is removed. The final remained column set is A3,A4,B5. Since B5 only match with C2 so Infobright only need to analyze values of D of those package. In this way, the decompressed part is greatly reduced and infobright can achieve a fast join query without using index.

Storage Model

Decomposition Storage Model (Columnar)

The storage Model for Infobright is DSM. Since infobright is more focused on store huge amount of data and increase the query speed, column orientation is more suitable. For the first reason, different from row based storage database where each metadata contains different data types, the column orientation database contains one data type and this property can help to optimize the compression algorithm for different types of data. In this way, infobright can get a market-leading data compression ratio (from 10:1 to 40:1) and greatly reduce the disk I/O. For the second reason, since most analytic queries only involve part of columns, so column orientation based DBMS can only focus on retrieving the needed data, which helps to improve the query speed of infobright.

Isolation Levels

Read Committed

Infobright support read committed model and read only. Since infobright does not support modified table and table content, all the query will read the table and when there is command like drop table, this query will lock the whole table. So infobright supports read committed model.

Stored Procedures

Supported

Infobright support stored procedures. The language is their own store procedures, follow the MySQL Ansi-92 Standard. When using this language to define a stored procedure, use delimiter key word to define the procedure and change it back when the definition is finished. Below is sample code for stored Procedures for Infobright: "https://drive.google.com/open?id=0B1fwCLZ9xWQtYzh2ZVVDekV5NDg" .This function of this stored procedure is convert a date format string (“YYYYMMDD”) to a string (‘YYYY-MM-DD’).

Indexes

Not Supported

The infobright do not have index explicitly. The knowledge grid in infobright serves as substitute for indexes as well as Data Pack Nodes (DPN). Data Pack Nodes contain some necessary statistic information (such as max, min, sum) of the stored and compressed data which belong to this Data Pack Nodes. The knowledge grid store more advanced information (such as interdependence between multiple tables, multiple columns) and helps to locate the needed DPN with little decompress data as much as possible. For example, suppose a query wants to find such data which the value of certain column is within a specific range. The Infobright Optimizer can generate three type of Packs: Relevant Packs, Irrelevant Packs, Suspect Packs. The query do not need to decompress and relevant and irrelevant packs and only need to find other data in suspect packs. In this way, the DPN serves like the index. Also knowledge grid can also serve like index because knowledge grid records the relationship between multiple tables. So for join search, we can first use information of DPN in both tables to find related data blocks, and use knowledge node to build the relationship between those data blocks and return result. Both DPN and the knowledge grid avoids the trouble of maintain the index and the maintains will increase the update and insert time when the database becomes bigger and bigger. And compared with index method, Infobright have a better performance for unexpected queries because for unexpected queries, it is hard to build efficient index but Infobright build the knowledge grid dynamically and change its content in responding to the query.

Storage Architecture

Disk-oriented

The infobright is Disk-oriented DBMS. But it indeed stores the knowledge grid in memory. The knowledge grid structure is automatically created and store the information of data when the data is uploaded or user execute some query. This knowledge grid is key structure for query and help to improve the query speed of infobright. If there is still space for RAM, this space can be used to store the uncompressed Data Packs. But most data packs and tuples are stored in disk. However, infobright is still disk-oriented DBMS and do not store all the data in memory.

Data Model

Column Family / Wide-Column

Data are stored in a column based way. The column based can get a great compress ratio and helps to improve the speed of analytic job.

System Architecture

Shared-Nothing

Infobright is shared-nothing DBMS and it does not rely on special hardware (such as GPU and FPGA). The system architecture of infobright is as follows. In detail, the architecture of infobright combines a columnar database and knowledge grid for optimizing analytics (such as compressing, storing and retrieving data) and provide a scalable, flexible analytical DBMS without index. Infobright make use of MySQL’s pluggable storage engine architecture to support full functions of database. The advantages of infobright are as follows: 1) high query speed because column-oriented database focus on required data and Knowledge Grid further improves the query speed by special way of organized data information. 2) less load time because infobright do not need to build index. 3) Market-leading compression ratio because infobright optimize compression algorithm for different type of data.

Query Compilation

Not Supported

Website

https://infobright.com

Developer

Infobright

Start Year

2005

End Year

2016

Project Type

Commercial

Supported languages

C++

Operating Systems

Linux, Solaris

Licenses

Proprietary