Open-source MySQL Data Warehouse solution: Infobright

Source: Internet
Author: User
Infobright is a column-based database based on unique patented knowledge grid technology. Infobright is an open-source MySQL Data Warehouse solution that introduces the column storage solution, high-strength data compression, and optimized Statistical Computing (Class

Infobright is a column-based database based on unique patented knowledge grid technology. Infobright is an open-source MySQL Data Warehouse solution that introduces the column storage solution, high-strength data compression, and optimized Statistical Computing (Class

Infobright is a column-based database based on unique patented knowledge grid technology. Infobright is an open-source MySQL Data Warehouse solution that introduces the column storage solution, high-strength data compression, and optimized Statistical Computing (such as sum/avg/group ), infobright is based on mysql, but it does not support mysql, because it comes with. Mysql can be roughly divided into the logic layer and the physical storage engine. infobright mainly implements a storage engine, but because its own storage logic is basically different from that of relational databases, it cannot be directly attached to mysql as the plug-in InnoDB. Its logic layer is the logic layer of mysql and its own optimizer.

Infobright features

Advantages:

Query performance of large data volumes is strong and stable: The same SELECT query statement under the conditions of millions, tens of millions, and hundreds of millions of records is faster than ordinary MySQL storage engines such as MyISAM and InnoDB ~ 60 times. Efficient query mainly relies on the specially designed storage structure to optimize the query. However, the optimization effect depends on the database structure and query statement design.

Large data volumes: terabytes of data and billions of records. Data volume storage depends on your own high-speed data loading tool (GB/hour) and high data compression ratio (>)

High data compression ratio: It is said that the average data compression rate can reach 10 to 1. It can even reach 40: 1, greatly saving data storage space. High data compression ratios mainly rely on column-based storage and patent-pending flexible compression algorithms.

Column-based storage: No index or partition is required. Even if the data volume is huge, the query speed is fast. It is used in a data warehouse and cannot be used to process a large amount of data. You do not need to create an index to avoid the problem of maintaining indexes and indexes expanding with data. Data in each column is compressed and stored in blocks. Each knowledge grid node records the statistical information in the block, instead of indexing, to accelerate search.

Quick response to complex aggregate queries: Suitable for complex analytical SQL queries, such as SUM, COUNT, AVG, and GROUP

Value of Infobright

Save design costs. No complex data warehouse model design requirements (such as star model and snowflake model), no materialized view, Data Partition, and index creation required

Save storage resources. The high compression ratio is usually, and some applications may reach 40: 1

Integration is widely used. Compatible with many BI suites, such as Pentaho, Cognos, and Jaspersof

Reduce O & M costs. As the database grows, the query and loading performance remains stable, and the implementation and management are simple, requiring minimal management.

Commercial guarantee. The first open-source warehouse analysis database supported by the business is the officially recommended warehouse integration architecture of Oracle/MySQL.

Use Cases of Infobright

Big Data analysis applications. Webpage/online analysis, mobile analysis, customer behavior analysis, analysis marketing and advertising

Log/event management system. China Telecom detailed ticket analysis and report, system/network security certification records

Data mart. Specific data warehouses of enterprises and institutions, providing data warehouses for Small and Medium-sized Enterprises

Embedded analysis. Provides Embedded analysis applications for independent software vendors/SaaS providers

Restrictions:

Data update Is Not supported: for the Community edition, Infobright can only import DATA using the "load data infile" method, but does not support INSERT, UPDATE, and DELETE. This makes it very difficult to modify the data, thus limiting its use as a data warehouse for real-time data services.

High concurrency is not supported: only more than 10 concurrent queries are supported. Although more than 10 concurrent queries in a single database are sufficient for general applications, however, low machine utilization is always a bad thing for investors, especially when there are many concurrent requests.

Master-slave backup and horizontal scaling are not provided. If you do not have a master-slave backup and want to back up the data, you can load the data at the same time, but you can only verify the final data consistency, so that the slave machine stops service for a long time during data loading; in terms of horizontal scaling, it is not a distributed storage system.

Comparison with MySQL

Infobright is suitable for Data Warehouse scenarios: Non-transaction, non-real-time, non-Multi-concurrency; analysis-oriented; stores established facts, such as logs, or a large amount of summarized data. Therefore, it is not suitable for responding to requests from website users. In fact, it takes much longer to retrieve a record than mysql, but it takes records faster than mysql.

Mysql usually occupies more space than the actual data file because it also has an index. The compression capability of infobright is very powerful, and different types of data are compressed by column.

The service format is the same as that of the interface mysql. You can enable the infobright service in a way similar to mysql, and then the applications connected to mysql can connect and query infobright in a similar way. This is a good news for skilled mysql users, and the learning cost is basically 0.

Infobright has two release editions: open-source ICE and closed-source commercial IEE. ICE provides sufficient functions, but cannot INSERT, DELETE, UPDATE, and only load data infile. In addition to providing more adequate functions, IEE is said to be faster in query.

The Community ICE version has been tested by major domestic enterprises and has a low investment in generating systems, mainly due to the following reasons:

Limits on DML and alter statements

Incremental load export and import at regular intervals

The built-in MyISAM is difficult to support high concurrency. to make full use of server resources, you must enable another MySQL instance.

Poor Support for Chinese and other multi-byte characters

Only single-core scheduling is supported.

Lack of original factory support

Differences between ICE and IEE

IEE includes additional features tailored to the work needs of most enterprises, such as better query performance, DML statement support, and distributed import. In addition, the IEE version also contains a certain level of Infobright factory or agent support rescue services, product training, and so on.

Significant query performance differences. Although IEE and ICE both have significantly higher query performance than Oracle, SQL Server, MySQL, and other row-based databases, IEE is 50-500% faster than ICE. This obvious gap comes from the unique multi-thread scheduling module in the IEE Core Engine (introduced from IEE3.5 ). in ICE, an independent query can only use a single CPU core, while other query processes can only use other cores. For complex queries that require filtering and distinguishing large amounts of data, the use of the IEE multi-thread scheduling module can significantly save query time.

DML statements are supported. IEE supports standard SQL data operation languages and uses insert, update, and delete statements to manipulate data. However, ICE only supports Load data infile for data import, and all data needs to be re-imported for any data changes. Using DML statements reduces data query performance and increases with the number of times.

Supports DDL statements. Including alter table rename, add column, and drop column (but the column operation can only take effect for the last column)

Support for Hadoop APIs (through DLP)

Advanced Replication and high availability. The IEE version includes the Master/Slave function and is based on SQL statement.

Easier import and faster import speed. IEE supports the distributed import tool-DLP. It also includes standard MySQL native loader for processing the import of some complex data. On the other hand, it also shows that IBloader has poor fault tolerance.

Simultaneous Load or DML consistent Query

Support temporary tables

Other Commercial licenses, after-sales support, etc.

Architecture

MySQL-based internal architecture-Infobright adopts a similar internal architecture as MySQL. The following figure shows the architecture of Infobright:

Infobright

The gray part is the original module of mysql, and the white and blue parts are the ones of infobright.

Infobright has the same two-layer structure as mysql:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.