The In-Memory option (DBIM) of OracleDB12c loads data from all the rows In the table into the Memory. Why can't I just put frequently accessed data blocks into the Memory like BufferCache? The access modes of memory columnar storage and BufferCache are different because they support different access modes. For BufferCache, OLTP applications are supported, and the access mode is
The In-Memory option (DBIM) of Oracle DB 12c loads the data of all rows In the table into the Memory. Why can't I just put frequently accessed data blocks into the Memory like the Buffer Cache? The access modes of memory columnar storage and Buffer Cache are different because they support different access modes. For Buffer Cache, OLTP applications are supported, and the access mode is
The In-Memory option (DBIM) of Oracle DB 12c loads the data of all rows In the table into the Memory. Why can't I just put frequently accessed data blocks into the Memory like the Buffer Cache?
Access Modes of memory columnar storage and Buffer Cache
The reason is that the two support different access modes. For Buffer Cache, OLTP applications are supported. The access mode is non-uniform access patterns, which means that some rows in the table are frequently accessed than other rows, therefore, only 10% of data can be cached to cover 95% of data access. It can be assumed that 10% of the data is cached and the performance can be improved by 20 times.
The memory columnar storage supports analyticdb applications that access a few columns but need to scan the data of all rows in the table. It is of little significance to cache data of some rows. For example, if the memory columnar storage can improve performance by 100 times, if only 10% of the data in the cache table can be improved by 1.1 times, rather than 100 times. Therefore, in DBIM settings, you can specify the full table, some columns, some partitions, and tablespaces In the table. However, you cannot use the where condition to specify only some rows in the cache column.
Therefore, for analytical applications, memory columnar storage is more than Row-based storage (even through alter tableTablenameThe most important reason is that the column storage format is very suitable for analytical applications.
Column Storage Format
The following figure shows why columnar storage is suitable for analysis.
If you use traditional row-based storage for analysis, such as querying 4th columns, you need to access the data row by row and query 1st to 3rd columns of irrelevant data.
If column-based storage is used, you only need to access the 4th columns to avoid invalid I/O, and the efficiency is naturally improved.
Let's take a look at the test results released by Oracle on Open World 2013:
Both row and column types are in the memory. DBIM is nearly 800 times faster, and a single core processes 1/6 rows of data every 3 billion seconds. Is it incredible ?!
SIMD
The technology previously used in high-performance computing and image processing, namely Single Instruction Multiple Data, is actually a batch processing of Data, but it is very suitable for columnar Data.
Storage index
The storage index is actually available in Exadata. In fact, the column is partitioned into IMCU, and the maximum and minimum values of each IMCU are pre-calculated and maintained in real time. The where condition is matched during query, you can skip many irrelevant imcu to save I/O and time. The principle is similar to that of partitioning.
However, the database needs to be re-computed after being restarted.
Compression
Column-based storage is usually compressed because there are many duplicate data values, and compression in DBIM is the default option.
Compression not only caches more data in the memory, but also reduces I/O. However, if you have more OLTP access, do not select a compression method that is relatively high, so as to avoid excessive resource consumption during compression and decompression.
Optimization of Join and Aggregation in memory
Using the Bloom Filter to convert a Join to a column scan can speed up the Join operation, especially in the memory.
The principle of key vector is similar to that of Bloom Filter. You can also construct the results of clustering tables online. For more information, see the White Paper.
Reference
In-Memory Column Store versus the Buffer Cache