HBase Performance Tuning

Source: Internet
Author: User
Tags compact manual failover flush garbage collection one table split zookeeper

We often see articles boasting about how fast a product is, how strong it is, and how it is better to test than some of the data described. The reason may be that you don't really understand the internal structure, and are not aware of its performance tuning approach. This article transferred from Taobao's Ken Wu classmate's blog, is currently seeing the more complete hbase tuning article.

Original link: hbase Performance tuning

Because the official book performance tuning part of the chapters are not indexed by the configuration item, can not achieve the effect of quick review. So I was driven by the configuration item, re-organized the original text, and add some of their own understanding, if there are errors, please correct me. Configuration Optimizations

Zookeeper.session.timeout

Default value : 3 minutes (180000MS)

Description : The connection timeout between Regionserver and zookeeper. When the time-out expires, Reigonserver is removed from the RS cluster list, and Hmaster receives the removal notification, the regions responsible for this server is re-balance, allowing the other surviving regionserver to take over.

Tuning : This timeout determines whether Regionserver can failover in time. Set to 1 minutes or less, you can reduce the failover time that is prolonged by waiting for a timeout.

However, it should be noted that for some online applications, regionserver from downtime to recovery time itself is very short (network Flash, crash and other failures, operations can quickly intervene), if you lower timeout time, but will not outweigh the gains. Because when Reigonserver was formally removed from the RS cluster, Hmaster began to do balance (let other RS recover from the Wal logs recorded by the faulty machine). When the faulty RS is manually involved in the recovery, this balance action is meaningless, but will make the load uneven, to the RS to bring more burden. Especially those scenarios where the fixed allocation of regions.

Hbase.regionserver.handler.count

Default value : 10

Description : The number of Regionserver request processing IO threads.

tuning: This parameter tuning is closely related to memory.

Fewer IO threads for large put scenarios that handle a higher memory consumption of a single request (bulk single put or scan with larger cache, both big put) or reigonserver memory-intensive scenarios.
More IO threads for scenarios where a single request for low memory consumption and a very high TPS is required. When this value is set, the primary reference is to monitor memory.

It is important to note that if the server has a small number of region, a large number of requests fall on a region, because the fast-filled memstore trigger flush caused by the read-write lock will affect the global TPS, not the higher the number of IO threads the better.

When the enabling Rpc-level logging is turned on, the memory consumption and GC status of each request can be monitored at the same time, and the number of IO threads is adjusted reasonably by the result of multiple pressure measurement.

Here is a case of Hadoop and HBase optimization for Read intensive Search applications, where the author sets the number of IO threads on the SSD machine to 100 for reference only.

hbase.hregion.max.filesize

default value : 256M

Note : The maximum storage space for a single reigon on the current Reigonserver, when a single region exceeds this value, the area is automatically split into smaller areas.

Tuning : The small region is friendly to split and compaction because the storefile speed in the split region or compact region is fast and memory consumption is low. The disadvantage is that split and compaction can be very frequent.

In particular, a large number of small region constantly split, compaction, will lead to cluster response time fluctuation is very big, region number too much not only to bring trouble to the management, even will cause some hbase bug.
Generally less than 512 is a small region.

Large region, is not suitable for frequent split and compaction, because doing a compact and split will produce a longer pause, the application of read and write performance impact is very large. In addition, the large region means that larger storefile,compaction are also a challenge to memory.
Of course, the big region has its own niche. If you have a low level of access at a certain point in your application scenario, doing the compact and split at this time will not only complete split and compaction, but also ensure smooth read and write performance for most of the time.

Since split and compaction so affect performance, there is no way to remove.
Compaction is unavoidable, split can be automatically adjusted to manual.
By turning this parameter value to a hard-to-reach value, such as 100G, you can indirectly disable automatic split (Regionserver does not split the region that does not reach 100G).
With Regionsplitter This tool, split manually when split is required.
Manual split is much higher in terms of flexibility and stability than automatic split, and on the contrary, the cost of management is not increasing, it is recommended to use online real-time systems.

Memory, small region in the setting of the size value of the memstore is more flexible, large region is too big than small, the conference led to flush when the app's IO wait increased, too small because the store file too much impact on read performance.

Hbase.regionserver.global.memstore.upperlimit/lowerlimit

default value:0.4/0.35

upperlimit Description : The function of hbase.hregion.memstore.flush.size this parameter is to flush the memstore when a single memstore reaches the specified value. However, a reigonserver may have hundreds or thousands of memstore, and each memstore may not be enough to reach the FLUSH.SIZE,JVM heap. This parameter is to limit the total memory occupied by Memstores.
When all of the Memstore in Reigonserver's memory total reaches 40% of the heap, HBase enforces all of the blocks ' updates and flush the Memstore to free all Memstore-occupied memory.

lowerlimit description : With Upperlimit, but when the global Memstore memory reaches 35%, it will not flush all the memstore, it will find some memory to occupy the larger memstore, do the individual flush, Of course the update will still be block. Lowerlimit is a remedy before a global flush causes a crash in performance. Why is it that performance is collapsing. It can be imagined that the performance impact of the HBase cluster would be significant if the memstore needed to do a full flush for a longer period of time and could not accept any read-write requests during this period.

Tuning : This is a heap memory protection parameter, and the default value is already applicable to most scenarios. It is generally tuned to accommodate certain proprietary optimizations, such as read-intensive applications, large read caches, lower values, and more memory for use by other modules.
What effect this parameter will have on the user.

For example, 10G of memory, 100 region, each memstore 64M, assuming that each region has only one memstore, then when the average 100 memstore occupy 50% or so, the Lowerlimit limit will be reached. Suppose at this point, the other memstore also have a lot of write requests coming in. In those large region that did not flush, it could be more than upperlimit, then all region will be block, start to trigger global flush.

However, except that your memory is very small or most of your applications are read, I don't think I need to tune this parameter.

hfile.block.cache.size

Default value : 0.2

description : The StoreFile read cache occupies a percentage of the heap size, and 0.2 represents 20%. This value directly affects the performance of data read.

tuning : Of course, the bigger the better, if read than write less, open to 0.4-0.5 is no problem. If read and write more balanced, about 0.3. If you write more than read, be decisive by default. When setting this value, you should also refer to Hbase.regionserver.global.memstore.upperLimit, which is the maximum percentage of the heap Memstore, two parameters one affects the read, one affects the write. If the two values add up to more than 80-90%, there is a risk of oom, set carefully.

Hbase.hstore.blockingStoreFiles

Default value:7

Note : At compaction, if there are more than 7 storefile in a store (COULMN Family) that need to be merged, Block all write requests, flush, and limit the number of storefile to grow too fast.

Tuning : Block write requests can affect the performance of the current region, and setting the value to the maximum number of store file values that a single region can support is a good option, when Comapction is allowed, Memstore continues to generate StoreFile. The maximum number of storefile can be calculated by region Size/memstore size. If you set the region size to infinity, you need to estimate the maximum number of storefile that a region might produce.

Hbase.hregion.memstore.block.multiplier

Default value:2

Note : When the Memstore in a region exceeds the size of a single memstore.size by twice times, all requests from the block are flush to free memory. Although we set the total size of the memstore, such as 64M, but imagine, in the last 63.9M, I put a 100M of data, at this time the size of Memstore will instantly soar to exceed the expected memstore.size. The function of this parameter is that when the size of the memstore increases to more than memstore.size, block all requests, the containment risk expands further.

tuning : The default value of this parameter is still relatively reliable. If you anticipate that your normal scenario (excluding exceptions) will not be controlled for burst writes or writes, then keep the default values. If normal, your write request volume will often grow to a normal number of times, then you should increase this multiple and adjust the other parameter values, such as Hfile.block.cache.size and Hbase.regionserver.global.memstore.upperlimit/lowerlimit, to reserve more memory to prevent HBase server OOM. other

Enable Lzo compression

LZO compares HBase's default gzip, which has a higher performance, the latter compression is higher, see Using LZO Compression. for developers who want to improve the read and write performance of HBase, adopting Lzo is a better choice. For developers who care a lot about storage space, it is recommended to leave the default.

don't define too many column Family in one table

HBase is currently unable to handle more than 2-3 CF tables in a good condition. Because a CF occurs in flush, its neighboring CF will also be triggered flush due to the correlation effect, resulting in more IO being generated by the system.

Bulk Import

Before you bulk import data into HBase, you can balance the load of data by pre-creating regions. See Table creation:pre-creating regions

Avoid CMS concurrent mode failure

HBase uses a CMS GC. The default time to trigger GC is when the old generation memory reaches 90%, and this percentage is set by the-xx:cmsinitiatingoccupancyfraction=n parameter. Concurrent mode failed occurs in such a scenario:

When the old generation of memory reached 90%, the CMS began to carry out concurrent garbage collection, at the same time, the new generation is also rapidly and continuously promoted to the old generation. When the old generation CMS has not completed the concurrency tag, the old age is full, tragedy happened. Because no memory is available, CMS has to pause mark and trigger a full-JVM stop the world (all threads are suspended) and then clean up all junk objects in single-threaded copy mode. The process will be very long. To avoid concurrent mode failed, we should allow the GC to trigger when it is less than 90%.

By setting the-xx:cmsinitiatingoccupancyfraction=n

This percentage can be calculated as simple as this. If your hfile.block.cache.size and hbase.regionserver.global.memstore.upperLimit add up to 60% (the default), then you can set 70-80, generally up to 10% or so. hbase Client Optimization

AutoFlush

Set the Setautoflush of htable to False to support bulk update of clients. That is, when a put fills the client flush cache, it is sent to the server. The default is true.

Scan Caching

Scanner How much data is cached at a time to scan (how much data is captured from the service side at a time).
The default value is 1 and takes only one at a time.

Scan Attribute Selection

Scan is recommended to specify the required column Family to reduce traffic, otherwise the scan operation will return all data for the entire row (all COULMN Family) by default.

Close resultscanners

After you have finished fetching data through scan, remember to close resultscanner, otherwise regionserver may be problematic (the corresponding server resource cannot be freed).

Optimal Loading of Row Keys

When you scan a table, the result is only the row key (no CF, Qualifier,values,timestaps), you can add a filterlist to the scan instance and set the Must_pass_all operation, Add Firstkeyonlyfilter or Keyonlyfilter in filterlist. This can reduce network traffic.

Turn off WAL on Puts

When you put some non-critical data, you can set Writetowal (false) to further improve write performance. Writetowal (false) discards the writing of the Wal log when put. The risk is that when regionserver goes down, the data that you just put may be lost and cannot be recovered.

Enable Bloom Filter

Bloom filter improves read performance by changing space time.


Original link: http://blog.nosqlfan.com/html/2095.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.