In April this year, IBM released the latest DB2 database, which added the BLU accelerator (BLU Acceleration) feature. On the surface, the BLU accelerator is similar to columnar storage + memory computing, but there are some technical details worth the attention of DBAs. In this regard, Curt Monash, a database consultant, gave a detailed explanation of the BLU accelerator in his blog, allowing DBAs to gain a deeper understanding of this new feature in DB2 10.5.
The BLU accelerator is a feature of the new version of DB2 database. It acts like a columnar analyticdb. If you connect BLU and non-BLU tables, the process is to first connect all BLU tables, and then connect the result set with other DB2. Now, column-based storage has been added to other mainstream enterprise-level database products except Oracle.
IBM claims that BLU can be linearly scaled vertically to 64-core CPUs, and horizontal scaling will soon be achieved. IBM recommends that BLU accelerators be used for all DB2 tables for analytical load. The BLU accelerator of the first version is optimized for 10 TB-level databases, and its capability can process 20 TB of data.
BLU accelerator's Technological Innovation highlights include:
The query function is fully pipelining and table scanning can be shared.
Data Skipping can effectively reduce I/O
Vectorization Based on a single command and multiple data structures (SIMD ).
The probability cache replaces the traditional LRU (Least Recently Used). The probability that frequently referenced data blocks appear in the memory increases. This feature is backed by more complex random algorithms.
With the "Automated workload management" function, IBM believes that resource contention between queries is the main cause of waste of resources, which gives BLU accelerator a difference in concurrency advantage over traditional DB2 databases.
In addition, BLU adopts the column-oriented storage model, which also has some advantages in data compression. BLU compression includes the approximate Hoffman encoding, prefix encoding, and difference compression. In addition, IBM said that all compression algorithms are sequential, so range judgments can be executed directly on compressed data, that is, they can be processed and analyzed directly on compressed data. This is the biggest highlight of the BLU compression function.
Like other columnar database systems, BLU accelerator's Data Writing Performance is a performance bottleneck. In this regard, IBM's explanation is to move new data (including data in memory and disk) in the database, which is one of the advantages of the BLU accelerator. IBM supports the following methods:
The BLU accelerator supports LOAD, SQL insert, UPDATE, DELETE, and other SQL-based operations, including INGEST, IMPORT, and EXPORT (also including BACKUP and RESTORE, they are also one of the ways to move data ). Since the syntax and semantics of Data import remain unchanged, it means that you can continue to use IBM Data Stage or other third-party ETL tools without any modifications.
Unlike other columnar database vendors that guarantee performance, IBM does not choose to use the delta region of the table to insert new data, and then asynchronously move row data to the column region, the secondary loading of such data will have a great impact on the performance. The IBM method is to add data directly to the master table, and then use batch conversion to solve the latency problem caused by column-level processing. Batch Data Processing can greatly eliminate the inherent overhead of a columnar database and completely avoid secondary data processing.
In addition, IBM has added a new columnar LOG method for the BLU accelerator. On the surface, it is the same as the traditional log-based transaction recovery of DB2, but the log data format is reorganized by column. With the XOR log function, BLU can greatly reduce the log space. In addition, BLU accelerator tables can coexist with traditional database tables, and IBM achieves seamless integration between the two.