Analysis of column database and row-type database

Source: Internet
Author: User

1, row-type database

Definition: The storage format stores the data of a row of fields in a ' row ' way, and a row of rows is continuously stored.

Figure 1

It is both intuitive and efficient to write information about a piece of data into a database, or to modify some fields in one data, or to delete an OLTP operation of the entire data class.

However, when we do some reports and analysis on the line database, we find that this storage format is inefficient to use. For example: Statistics of sales and profit changes in each province, in accordance with departmental statistical performance and so on, are in some of the fields on the operation, the row database will be in accordance with the page to read data, in the analysis of sales and profits, each contract of other information, such as customer name, contract time, The account manager and so on are all read in, wasting a lot of valuable I/O.

The first way to improve the database interface is to "index", just like the directory in front of the dictionary, to do a quick positioning. But as the analysis scenario becomes more complex and more varied, DBAs find that indexing is often only helpful for a subset of queries and analyses, and it is almost impossible to optimize all queries and analysis scenarios for an enterprise-wide bi system, from a combination point of view, or from a cost perspective, Because of the waste of storage space and the time wasted to maintain these indexes, a large number of indexes will grow exponentially.

2, column-type database

Definition: A column database stores the values of the same data column together. When you insert a data row, the values for each data column of that row are also stored in different places. The column database in the example above only needs to read the data column that stores the "Customer name, contract time, account Manager", and the row database needs to read all the data columns. Therefore, the column-type database greatly improves the efficiency of OLAP large data volume query. Of course, a column database is not omnipotent, and each time a data row is read, the values of each data column are read separately from each other and then merged together to form a data row. Therefore, a column database does not apply if the amount of data involved in each query is small or if the majority of queries require the entire row of data.

Figure 2

Many of the column databases also support column groups (called Locality group in the columns group,bigtable system) to store the values of multiple data columns that are frequently accessed together. If the data column being read belongs to the same column group, the column database can read the values of multiple data columns at once from the same place, avoiding the merging of multiple columns of data. A column group is a row-and-line hybrid storage model that can meet both OLTP and OLAP query requirements. Because of the high degree of data duplication in the same data column, there is a significant advantage in column database compression. For example, Google bigtable column database compression can be more than 15 times times the compression rate. In addition, specific index optimizations can be made for column storage. For example, the sex column has only two values, "male" and "female", which can be used to create a bitmap index on this column:


3, the "male" corresponds to a bitmap of 100101, indicating that the 1th, 4, 6 row value is "male"; "female" corresponds to a bitmap of 011010, which means that the 2nd, 3, and 5 row values are "female". If you need to find the number of males or females, you only need to count the number of occurrences of 1 in the corresponding bitmap. In addition, after the establishment of bitmap index 0 and 1 of the high repetition, you can use a special encoding method to compress it.

Figure 3

3, the difference between a row-type database and a column database

http://fifiole.blog.163.com/blog/static/169459225201111610351428/

Reference documents:

Http://book.2cto.com/201309/32718.html

http://fifiole.blog.163.com/blog/static/169459225201111610351428/

Analysis of column database and row-type database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.