The 2013 will soon be over, summarizing the major changes that have taken place in the year hbase. The most influential event is the release of HBase 0.96, which has been released in a modular format and provides many of the most compelling features. These characteristics are mostly in yahoo!/facebook/Taobao/millet and other companies within the cluster run a long time, can be considered more stable available.
1. Compaction optimization
HBase's compaction is a feature that has long been widely criticized, and many people spit hbase because of it. But we can not because HBase has such a flaw to kill it a stick to kill, more still want to be able to tame it, can make it adapt to its own application scene. According to the business load type adjustment compaction type and parameters, generally in the business peak time to ban major compaction. In 0.96, the HBase community used a plug-in architecture to provide more compaction strategies for different scenarios. At the same time, improved hbase storage management at regionserver end, which is directly region->store->storefile, now in support of a more flexible management storefile and compact strategy, The RS end uses a storeengine structure. A storeengine involves Storeflusher, Compactionpolicy, compactor, and Storefilemanager. Unspecified words default is defaultstoreengine, four components are Defaultstoreflusher, Exploringcompactionpolicy, Defaultcompactor, Defaultstorefilemanager. You can see that after version 0.96, the default compaction algorithm changed from Ratiobasedcompactionpolicy to Exploringcompactionpolicy. Why to change this, first from the compaction of the optimization goal to look at: compaction is about trading some disk IO now for fewer seeks, That is, compaction's optimization goal is to perform compaction operations can merge more files the better, if the same number of files produced by the IO smaller the better, so select out of the list is the best.
The main differences are:
Ratiobasedcompactionpolicy is simply traversing the StoreFile list from start to finish, encountering a sequence that conforms to the ratio condition to select the execution compaction. For the typical continuous flush memstore formation of the storefile scene is appropriate, but for bulk-loaded is not appropriate, will fall into the local optimal.
While the Exploringcompactionpolicy is the best to record the current optimal, and then select a global optimal list from beginning to end.
The logic of the two algorithms can refer to the corresponding Applycompactionpolicy () function in the code. Other Compactionpolicy research and development are also very active, such as tier-based compaction (HBASE-6371, from Facebook) and Stripe compaction (HBASE-7667)
Spit slot: HBase compaction why it's so much, I feel missing a whole IO load feedback and scheduling mechanism. Because compaction reads data from HDFs and then writes to HDFs, it grabs IO resources like other HDFs. If you can have an IO resource management and scheduling mechanism, perform compaction when the HDFS load is light, and do not perform when the load is heavy. And this problem also exists in Hadoop/hdfs, the resource management of Hadoop is currently only for cpu/memory resource management, there is no resource management for IO, which causes some jobs to be affected by their own program bugs and may write a lot of data to HDFs, Can severely affect the read and write performance of other normal jobs.
MORE: HBase compaction, HBase 2013 compaction promotion.
2. Score Time to recovery/mttr optimization
Currently hbase external services, Region server is a single point. If an RS is hung, the region data is inaccessible until all region on the RS are reassigned to other Rs. The main improvements to this process include:
HBASE-5844 and HBASE-5926: Delete region server/master corresponding znode on the zookeeper, so that the znode will not be discovered until Rs/master 30s timeout.
hbase-7006:distributed log replays, is directly from the HDFs read down the Wal log, directly to the new assigned RS log replays, rather than create temporary files Recovered.edits and then log Replays
The region server for all region in the Hbase-7213/8631:hbase's meta table will have two Wal, one common, and one dedicated to the Meta table region. This allows you to recover the meta table first in recovery.
3. Bucket Cache (L2 cache on HBase)
HBase regionserver memory is divided into two parts, part as Memstore, mainly used for writing, and the other part as Blockcache, mainly for reading. The cache hit rate of block has a great effect on the read performance of HBase. Currently the default is Lrublockcache, which uses the JVM's HashMap directly to manage Blockcache, with heap fragmentation and full GC issues.
HBASE-7404 introduction of the concept of bucket cache can be placed in memory, can also be placed in such as SSD for high-speed random reading of the external storage device, so that the cache can be very large, can significantly improve the performance of HBASE read. The essence of Bucket cache is that it allows hbase to manage memory resources instead of having the Java GC manage it, a feature that has been hotly debated since the birth of HBase.
4. Java GC Improvements
Memstore-local allocation buffers solves the full GC problem due to memory fragmentation by allocating memory blocks beforehand, but for frequent update operations, Memstore is flush to file systems without reference chunk or triggers a lot of young GC. So HBase-8163 proposed the concept of memstorechunkpool, that is, hbase to manage a GC that chunkpool to store chunk and no longer rely on the JVM. The nature of this ticket is also managed by the HBase process to manage memory allocations and redistribution, no longer dependent on the Java GC.
5. HBase Enterprise Database features (secondary Index, join, and transaction)
When it comes to HBase's enterprise-class database features, the first thought is secondary Index, Join, and Transaction. However, the implementation of these functions is now provided in the form of peripheral projects.
Huawei's Hindex is the best way to realize the secondary index. The main idea is to create the index table, and the region distribution of the index table is the same as the main table, which means that the corresponding index table of a region in the primary table corresponds to the region on the same Rs. And this index table is prohibited from automatic or manual split, only the main table has split will trigger the index table split.
How does this work? In essence, the index table is also a hbase table, and only one rowkey can be indexed. The Rowkey design of this index table is more important, the rowkey+ index name of the Rowkey= primary table region the index table (because one primary table may have multiple indexes, all placed in the same index table) + The column value to index + Main Table Rowkey. The Rowkey design of such an index table can ensure that the index table and the main table corresponding region is on the same RS, can save the query process RPC. Each time you insert the data, insert it into the Index table by coprocessor. Each time the data is scan according to the level two index column, the Rowkey of the corresponding primary table is coprocessor from the index table and then the scan is done. In performance, query performance has been greatly improved, the insertion performance decreased by about 10%.
Phoenix also implements a level two index by building another table
Phoenix also implements a join operation for a small two-table. The old fashioned the small table broadcast to all RS, and then through the coprocessor to do hash join, the final summary. Feel a little superfluous, after all hbase design is the original intention is to use large table data redundancy to try to avoid join operation. Now it's time to support join, not knowing what the salesfore business needs of this scenario is.
On the support of transaction, the most concerned is the Omid of Yahoo!. But it seems that the enthusiasm for this feature is not particularly high.
6. Prefixtreecompression
Because the keyvalue storage of hbase is stored in the form of Row/family/qualifier/timestamp/value, row/family/qualifier these correspond to prefixes, If each row is stored according to the original data, it can result in large storage space. The HBase version 0.94 has introduced the concept of DataBlock encode (HBASE-4218) to compress and store repetitive row/family/qualifier in sequence, increasing memory utilization and supporting four compression modes fast_diff\ Prefix\prefix_trie\diff. But this feature is only through the Delta Encoding/compression reduced memory footprint, the efficiency of data query is not increased, and even the compression/decompression of the CPU resource consumption.
Hbase-4676:prefixtreecompression is to compress the repeated row/family/qualifier in the form of prefix tree, which can generate the prefix trees at parse time, and the son of the tree node is sorted, So the efficiency of querying data from DataBlock can be more than two points. (Preliminary study and test of prefix_tree compression)
7. Other changes
HBASE-5305: In order to better cross version compatibility, the introduction of Kyoto buffer as a serialization/deserialization engine used in RPC (previously Hadoop RPC has all been rewritten with PB). Because with the escalating HBase server, some versions of the client may be older, so RPC needs to be compatible between the old and new versions.
HBASE-6055 hbase-7290:hbase Table Snapshot. Creating Snaphost has no performance impact on the HBase cluster, but generates snaphost corresponding metadata without copying the data. Users can implement backup and disaster recovery by creating snaphost, for example, after a user creates a snaphost that may cause errors in some tables. This allows us to choose to roll back to the stage where the snaphost was created without causing the data to be all unavailable. You can also create snapshot on a regular basis and then copy them to other clusters for timed offline processing.
HBASE-8015: In 0.96, the root table has been renamed Hbase:namespace,meta is Hbase:meta. And Hbase:namespace is on the zookeeper. This namespace is similar to the concept of database in an RDBMS, and it can be better for rights management and security control. The meta information of the table in HBase is also stored as a region on the region server, so the region of the meta table and other normal region have obvious resource competition. To improve the performance of the Meta Region, the exclusive Metaserver is presented in the hbase of 360 and only meta Region is stored on this Region server
HBASE-5229: Cross-line transactions within the same region. In one operation, all write operations in the same region are executed after all row locks in the associated row that were fetched (row locks in the order of Rowkey to prevent deadlocks).
Hbase-4811:reverse Scan. Asked in the past how to find data in hbase, it is often answered to build a reverse-stored table, and both LEVELDB and Cassandra support reverse scanning. HBase reverse scanning is 30% lower than forward scan performance, which is similar to LEVELDB.
Hoya-hbase on YARN. You can deploy several different versions of HBase instances of different configurations on a yarn cluster, and you can refer to GitHub.
Looking forward to 2014, HBase will release version 1.0, better support multi-tenancy, and support cell-level ACL control.
8. Summary
Cloudera/hortonworks/yahoo!/facebook people from a variety of systems and performance concerns
Salesfore/huawei people seem to be more concerned about enterprise-class characteristics, after all, they are facing customers are telecommunications, finance, securities and other highly handsome rich industry
From the domestic Alibaba/millet/360 and other companies pay more attention to system performance, stability and operational dimensions related topics. Domestic Internet industry with hbase more attention is how to solve business problems.
More and more companies are building their hbase clusters on the cloud, such as Pinterest all hbase clusters are on AWS, and the start up environment abroad is great, with AWS not spending much of their resources on infrastructure.
Traditional hbase applications are online storage, real-time data reading services. For example, Alipay uses HBase to store the user's historical transaction information service user inquiries, and China Unicom is using HBase to store users ' internet history information for users ' real-time query needs. Now HBase is also evolving into real-time data mining applications, such as Wibidata's Open-source Kiji can easily build real-time recommendation engines, real-time user tiering, and real-time fraud monitoring on hbase.