Selection and design of bigtable type NoSQL database-products and technologies

Source: Internet
Author: User
Keywords So yes yes nbsp.
Type selection and design of bigtable NoSQL database release time: 2012.04.16 14:20 Source: Blog Author: blog

This paper introduces the selection strategy and schema design principle of Bigtable/hbase class NoSQL database system.

Data size

The BigTable Class database system (Hbase,cassandra, etc.) is designed to solve the storage needs of massive data scale. The massive data scale here refers to the amount of data stored in a single table in terabytes or petabytes, and a single table is made up of billions of rows of hundreds of billions. Mention this data scale problem, have to say is now in NoSQL market, the fire of four kinds of nosql system is mongodb,redis,cassandra,hbase in turn. We know that both Cassandra and HBase are bigtable systems, and they are all well-connected (with strong support from Facebook,yahoo,twitter, etc.). So why is the fire MongoDB? Is it because hbase is not good enough? I think the reason is very simple, after all, most companies have not reached the size of the data Facebook,yahoo and so on, the use of MongoDB enough to meet their needs. MongoDB provides the auto-sharding, schema-less and other functions, just to solve the data size of companies in the use of the RDBMS process encountered problems.

Data model

Moreover, the data model of BigTable class database system is relatively simple, and it does not involve the join operation of multiple tables generally. In such a scale, traditional RDBMS applications are becoming more and more limited, and the cost of maintenance and upgrades is getting higher. and traditional RDBMS because of the design based on Share-storage, Scale-out's ability is not strong. To make a distributed database based on Share-storage RDBMS, users are required to develop proxy layer. All these problems make us have to consider the NoSQL storage scheme like bigtable in the face of massive data. For DBAs who are accustomed to designing schemas for RDBMS, the schema design problem of migrating to the BigTable Class NoSQL system requires a different mindset to consider the problem. This article is about how to design the schema of table in the BigTable class system, and with the expansion of data scale, some traditional RDBMS application to the BigTable system migration process needs attention.

The NoSQL database puts the scalability first, so it is bound to cause a certain amount of data redundancy, through the way of data redundancy to realize the relationship between different tables in the RDBMS. Moreover, in the BigTable class system, the complex query expression and various optimization functions of SQL are not provided, which only provide a large amount of data storage ability. So, as in Facebook's Unified messaging system, there are many times when you use a single line to store all the information for a user. Then in the BigTable class system, the amount of data that a row can store is very large. Some time ago on Weibo there are rumors that Apple's Siri system is used in the background of the hbase, I think if it is true, then a user's personal assistant information should also be in a row, hehe. What's more interesting is that Apple is doing a good job of secrecy, and it's a diversion. Clearly used is hbase, recruitment time to say will Cassandra and MongoDB have added points.

In the BigTable class system schema design also needs to pay attention to is the column family characteristic. Because the BigTable class system is essentially accessed by the column family, one common denominator for different columns in the same column family is the same data type. The same data type can make data io between disk and memory very high compression, which is the common advantage of all column-oriented storage systems. So when we think about the information we want to store in a row, we can store the data types of each attribute in the corresponding column families. Since BigTable is a sparse tabular system, so it is possible that one of the properties of a row does not exist in all other rows, but the properties of the property's data type (for example, int) are almost certainly present in other rows, so in actual storage, the properties of the same column family are stored together.

Non-normalized

In NoSQL system data modeling, it is often mentioned that a denormalization concept is non-standard. As a simple example, the relationship between the entity in the RDBMS and the entity is stored in the same table in NoSQL. For example, in the canonical data modeling of an RDBMS, there are two tables: Student (Studentid,studentname,tutor,courseid), marshalling (courseid,coursename). In the BigTable class NoSQL system, there is only one table student (Studentid,studentname,tutor,courseid,coursename). For information that requires reading two tables in a traditional RDBMS and then joins together to obtain or aggregate information from some users, it is only necessary to read one time in the NoSQL system to get some user information.

Row Key

Another problem needing attention in schema design of BigTable class system is the natural order of row. The BigTable class system interprets row key as String, and organizes row in alphabetical order by string. So this feature can be used by our schema design. For example, our application often requires an index of an attribute or a combination of several attributes, so you can use this attribute or combination of attributes to make a row Key. This is very similar to the index and composite index in an RDBMS, except that it exists naturally in the BigTable class system. It is important to note that in the HBase system when the attribute combination as row key, the need to use special symbols to the individual components of the concatenation, but "/" is not a row key in the different attributes of the separator, we can use "_".

Data consistency and transactions

In the area of data consistency, in traditional RDBMS systems, the attributes of each column can be normalized to not NULL, unique or check, and the RDBMS system is used to guarantee the consistency of the data to the user. In the BigTable class system, this requirement is not guaranteed in the DB layer, but is guaranteed by the user layer program. Because the open source system is hbase with line consistency and row atomicity, and the general line holds one user's information, the cost of maintaining data consistency is relatively small. If the schema of the BigTable class system is poorly designed, resulting in complex data redundancy, the cost of maintaining data consistency for the application tier is significant.

The transaction support for the BigTable class system is very complicated to say. The simple thing is that hbase only supports row-level locks, and if you intend to implement transactional features similar to RDBMS, you have to combine hbase and zookeeper. There is no detailed discussion of this in this article, and later will be dedicated to discuss Google's monitors on percolator and Megastore. These two monitors mainly discuss how to use NoSQL system to implement transactions, how to get through NoSQL and SQL.

Index

About indexing is an issue that needs to be considered by each DB system. You can see from BigTable's paper that it maintains a special Single-column index for each column, allowing the creation of multiple-column indexes. These indexes are automatically maintained by the BigTable, which are automatically selected by BigTable when the query is used. This is quite close to the RDBMS. The open source implementation of the HBase in addition to automatically ordered row key as an index, only provide an automatic maintenance secondary index. However, the query should use those indexes, which must be determined by the application layer. About HBase Secondary index of the implementation of a variety of ways, seems to have recently and coprocessor, can refer to the relationship between the Http://kenwublog.com/hbase-secondary-index-and-join. HBase also allows the creation and use of Lucene indexes stored on file systems. For a hbase and lucene combination, you can refer to the http://www.infoq.com/articles/LuceneHbase here.

The above mainly explains how to design schema for BigTable class system. The next article will illustrate the content of this article in a number of instances, involving examples of the Facebook Unified Messaging system.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.