Row Database vs Column Database

Source: Internet
Author: User

First, Introduction

There are two alternatives to big data storage: Row and column storage. The industry has a lot to do with the two storage scenarios, focusing on who can handle massive amounts of data more efficiently, with security, reliability, and integrity in focus. From the present development situation, the relational database has not adapted to this huge storage and computing requirements, basically is eliminated out of the game. In some known big data processing software, HBase for Hadoop is Columnstore, and MongoDB is a document-based row store, LEXST is a binary type of row storage.

As the name implies, these two database schemas are very different in their way of storing data. In a row database, each chunk of data in each row is stored on the hard disk next to another piece of data. In general, you can assume that the content of each row is a contiguous set of bytes in the hard disk, such as SQL Server,oracle,mysql, which is traditionally a row-type database category. In the case of a column-based database, all data is stored as columns, and it is generated from the outset for data analysis of data warehouses in big data environments.

Row Store data arrangement

Column Storage data arrangement

The read and write process for row storage is consistent, starting with the first column and ending with the last column. A column store reads one or all of the data in a column dataset, and when written, a row of records is split into multiple columns, and each column of data is appended to the end of the corresponding column.

This table is stored in the computer's memory (RAM) and storage (hard disk). Although the memory and the hard disk are different in mechanism, the computer's operating system is stored in the same way. The database must store the two-dimensional table in a series of one-dimensional "bytes" that are written by the operating system to memory or hard disk. A row database stores data values in a row together, then stores the next row of data, and so on. 1,smith,joe,40000;2,jones,mary,50000;3,johnson,cathy,44000; A column database stores the data values in a column together, then stores the next column of data, and so on. ; Smith,jones,johnson; joe,mary,cathy;40000,50000,44000; This is a simplified statement. Second, contrast

As you can see from the table above, the write to the row store is completed at once. If this write is based on the operating system's file system, it can guarantee the success or failure of the writing process, so the integrity of the data can be determined. Columnstore because of the need to split a row of records into a single row to save, the number of writes significantly more than row storage, plus the head needs to move on the platter and positioning time spent, the actual time consumption will be greater. Therefore, the row store occupies a great advantage in writing.

There is also data modification, which is actually a write process. The difference is that the data modification is the deletion mark on the disk record. The row store is written once at the specified location, where the column store locates the disk on multiple columns and writes separately, and the process is still several times as many columns as rows are stored. Therefore, data modification is also dominated by row storage. When data is read, row storage typically reads a row of data completely, and if only a few columns of data are needed, there are redundant columns, and the process of eliminating redundant columns is usually done in memory for the sake of shortening processing time. The data that is read by the column store is one or all of the collection, and if you read multiple columns, you need to move the head and position it again to the next column to continue reading. Then we discuss the data distribution of two kinds of storage. There is no ambiguity in the column storage because each column data type is homogeneous. For example, a column data type is integer (int), then its data collection must be integer data. This makes parsing of data easy. In contrast, row storage is much more complex because multiple types of data are saved in a row of records, and data parsing requires frequent conversions between multiple data types, which consumes CPU and increases parsing time. Therefore, the parsing process of Columnstore is more advantageous for analyzing big data.

Obviously, both storage formats have their own advantages and disadvantages: The write of the row store is done once, consumes less time than the column storage, and can guarantee the integrity of the data, the disadvantage is that the data read process will produce redundant data, if there is only a small amount of data, this effect can be ignored, the number of large may affect data processing efficiency. Columnstore is not as good as row storage in writing efficiency and data integrity, its advantage is in the reading process, does not produce redundant data, which is not high data integrity requirements of big data processing areas, such as the Internet, is still important.

When choosing which database to use, ask yourself what kind of workload is the most critical workload your database needs to support. While it may be necessary for both of your operations, a row-wise database, combined with decades of optimized operations, may be the best choice when the core business is an online transactional OLTP application. If your business does not need to quickly process OLTP business, but you need to be able to quickly handle the analysis of massive amounts of static data, that is, OLAP, then a column database will be your choice.

Iii. Summary

The characteristics of both storage formats determine that they cannot be the perfect solution. If the primary consideration is the integrity and reliability of the data, then row storage is the only option, and Columnstore can only be approached if the disk is increased and software design is improved. The write performance of a row store is much higher than the columnstore if the data is stored primarily. Column storage is most appropriate in applications that require frequent single-column collection data. If you read more than one column at a time, two scenarios can be selected as appropriate: In the case of row storage, the design should consider reducing or avoiding redundant columns; If a columnstore scheme is used to ensure read-write efficiency, each column of data is saved to a different disk as much as possible, and multiple threads read and write their data in parallel. This avoids disk contention and improves processing efficiency. Whichever option is chosen, it is necessary to gather the content data together, which is an effective way to reduce the movement of the head on disk and improve the data read time.

In fact, it also verifies that the system database I am currently doing in the company is based on HBase. Because of the user's data classification, each kind of data to make corresponding products to provide services. This process is carried out on-line, but also in-line, the frequency of execution is quite high, the number of users is very large, although the data written on the efficiency will be low, but more importantly, in the data category of the dimension of the Big data analysis, real-time rendering of the execution results rendered page rendering.

Row database vs Column database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.