Objective: To understand the characteristics and implementation of hbase and support massive data query
Characteristics and limitations of traditional relational database
Traditional database transaction is particularly strong, requiring data integrity and security, resulting in system availability and scalability is greatly compromised. For high-concurrency traffic, database performance is not good, and traffic like the Internet can easily cause downtime.
HBase
HBase is a columnstore-based database that is scalable compared to a traditional row-based relational database. HBase is a columnstore-oriented distributed storage system that has the advantage of achieving high-performance concurrent read and write operations while HBase transparently splits the data so that the storage itself has horizontal scalability.
The data organization structure of HBase consists mainly of primary key and column family, each column family needs to have multiple columns according to the attribute, the column is allowed to be extensible, for example, want to increase a column can be added at any time.
Advantages and disadvantages of hbase
1 columns can be dynamically increased and listed as empty without storing data, saving storage space.
2 hbase automatically splits data so that the data store automatically has a horizontal scalability.
3 HBase provides support for high concurrency read and write operations
Disadvantages of HBase:
1 cannot support conditional queries, only query by row key is supported.
2 cannot support failover of master server temporarily, and when Master goes down, the entire storage system hangs up.
Four. Supplement
1. Data types, HBase has only a simple character type, all types are left to the user to handle, it only saves the string. The relational database has rich types and storage methods.
2. Data manipulation: HBase is simple to insert, query, delete, empty, and so on, the table and table are separated, there is no complex relationship between tables and tables, and traditional databases usually have a variety of functions and connection operations.
3. Storage mode: HBase is a column-based store, and each column family is saved by several files, separated by different column family files. The traditional relational database is saved based on the table structure and the row pattern.
4. Data maintenance, HBase Update operation should not be called update, it is actually inserting new data, and traditional database is replacing modify
5. Scalability, hbase this kind of distributed database is developed for this purpose, so it can easily increase or decrease the number of hardware, and the compatibility of the error is relatively high. Traditional databases typically require an additional middle tier to achieve similar functionality
The organizational structure of the htable of HBase see http://blog.csdn.net/lifuxiangcaohui/article/details/39894265 Blog
Application Scenarios for HBase
Say what the situation requires hbase
Semi-structured or unstructured data
data that is not deterministic or disorganized for data structure fields is difficult to extract by a concept that is suitable for hbase. In the example above, when business development needs to store author Email,phone,address information, the RDBMS requires downtime maintenance while hbase support dynamically increases.
Very sparse records
The number of rows in an RDBMS is fixed, and null columns waste storage space. As mentioned above, the null column of hbase is not stored, which saves space and improves read performance.
Multi-version data
As mentioned above, the value that is anchored to row key and column key can have any number of version values, so it is very convenient to use hbase for data that needs to store the change history. For example, the address of the author in the example above is subject to change, and business generally requires only the most recent values, but sometimes it may be necessary to query to historical values.
Very large data volume
when the data volume is getting larger, the RDBMS database can't hold up, there is a read-write separation strategy, through a master dedicated to write operations, multiple slave responsible for read operations, server cost multiplier. As the pressure increases, master can't hold up, at this time to separate the library, the data is not associated with the deployment, some join query can not be used, need to rely on the middle tier. As the amount of data increases further, the records of a table become larger, the query becomes very slow, and the tables are divided, such as by ID modulo into multiple tables to reduce the number of records in a single table. People who have experienced these things know how the process is going to be frustrating. With HBase, it's easy to add machines, HBase automatically scales horizontally, and seamless integration with Hadoop guarantees high performance (MapReduce) for data Reliability (HDFS) and massive data analytics.
HBase applications can also be described in http://blog.csdn.net/yen_csdn/article/details/55657363
HBase Non-structured database vs. structured database