The difference between relational database and HBase data storage mode

Source: Internet
Author: User

Nowadays, the BigTable (column family) database is used more and more widely, and the function is very powerful. But many people still regard it as a relational database in use, with the original relational database thinking to build tables, storage, query. This article takes HBase as an example to explain the changes in data patterns.

The traditional relational database (mysql,oracle) data is stored in the following ways:

Figure A

is a typical way of storing data, I divide each record into 3 parts: primary key, record attribute, index field . We index the indexed field to achieve the effect of a level two index .

However, as the business progresses, the query conditions become more complex, requiring more indexed fields, and many values do not exist, such as:

Figure II

is a 6 indexed field, The actual situation may be hundreds or more, and you also need to brush from multiple indexed fields. Query performance is getting lower, even if the query requirements are not met. The limitations in relational data are beginning to appear, so many people are beginning to touch NoSQL.

Column family database is very powerful, many people want to move data from MySQL to HBase, stored in the same way as figure one or figure two, the primary key is Rowkey. Data for each of the other fields, storing different columns under a column family. However, there is no way to query the index field, there is no better bigtable-based two-level indexing scheme, so the index field cannot be queried.

At this point can actually convert the thinking, you can turn the data upside down, such as:

Might

The values of each indexed field are Rowkey, and the primary key and attribute values of the records exist in the corresponding Rowkey value in a certain order. There is only one column family, which is the simplest way. The records in value can be set to a fixed length of byte[], and multiple record sets are quickly queried by the shift.

However, the above query is only suitable for a single indexed field. If you want to query multiple indexed fields at the same time, the method of might needs to take out all value values, such as query "Zhejiang" and "mobile phone", need to remove two value, and then resolve the respective primary key for intersection. If each record has hundreds of attributes, it has a significant impact on performance.

the next change is to solve the problem of multi-indexed field queries. We store the primary key field and the attribute field separately , stored under different column family, the multi-index query only needs to take out the data of the column family 1, and then go to the minimum set of column family 2 to get the desired value. Storage Four:

Figure Four

Why are the different column families, not the two columns under a column family?

The column family database data files are sorted by column family. When fetching data, all the column data of a column family is taken out, in fact we do not need to take out the record details, so we put this part of the data into another column family.

Next is the column family 2 extension, column family 2 stores more columns, used to do a variety of brush selection, calculation processing. Such as:

Figure Five

Later I felt this kind of play more and more like search ...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.