Hbase database performance tuning

Source: Internet
Author: User
Document directory
  • Configuration Optimization

Because the performance tuning section of the official book does not index by configuration item, quick query cannot be achieved. So I re-organized the original text with the configuration item drive and added some of my own understandings. If there is any error, please correct me.

Configuration Optimization

Zookeeper. session. Timeout

Default Value:3 minutes (180000 ms)

Note:The connection timeout between regionserver and zookeeper. After the time-out period expires, the reigonserver will be removed from the RS cluster list by zookeeper. After the hmaster receives the removal notification, it will re-balance the regions responsible for this server to allow other surviving regionservers to take over.

Optimization:

This timeout determines whether the regionserver can promptly failover. Set to 1 minute or lower, which can reduce the failover time extended due to wait timeout.

However, it should be noted that for some online applications, regionserver takes a short time from downtime to recovery (transient network disconnection, crash, and other faults can be quickly involved in O & M ), if you reduce the timeout time, the loss will not be worth the candle. The reason is that when the reigonserver is officially removed from the RS cluster, the hmaster starts to perform the balance (so that other RS can be restored Based on the wal logs recorded by the faulty machine ). When the faulty RS is manually recovered, this balance action is meaningless, which will result in uneven load and bring more burden to Rs. In particular, for scenarios with fixed regions allocation.

Hbase. regionserver. handler. Count

Default Value:10

Note:The number of I/O threads that the regionserver requests process.

Optimization:

The optimization of this parameter is closely related to the memory.

A small number of Io threads are suitable for big put scenarios with high memory consumption for processing a single request (large-capacity single put or scan with a large cache) or the memory of the reigonserver is relatively tight.

A large number of Io threads are suitable for scenarios with low memory consumption and high TPS requirements for a single request. When setting this value, the main reference is monitoring memory.

Note that if the number of region servers is small and a large number of requests are on the same region, the read/write lock caused by memstore triggering flush will affect the global TPS, the higher the number of Io threads, the better.

Enabling RPC-level logging is enabled during stress testing to monitor the memory consumption and GC status of each request at the same time. At last, the number of I/O threads is adjusted through multiple stress testing results.

Here is a case study of hadoop and hbase Optimization for read intensive search applications. The author sets the number of I/O threads to 100 on the SSD machine for reference only.

Hbase. hregion. Max. filesize

Default Value:256 m

Note:The maximum storage space of a single reigon on the current reigonserver. If a single region exceeds this value, the region is automatically split into smaller region.

Optimization:

Small region is friendly to split and compaction, because it splits storefile in region or compact small region quickly and consumes less memory. The disadvantage is that split and compaction are frequent.

In particular, a small number of region shards are constantly split and compaction, which may cause great fluctuations in the cluster response time. A large number of region not only causes management trouble, but also causes some hbase bugs.

Generally, less than 512 is a small region.

Large Region is not suitable for split and compaction, because a compact and split operation may cause a long pause, which has a great impact on the application's read/write performance. In addition, large region means a large storefile, and compaction is also a challenge to memory.

Of course, the big region also has its potential. In your application scenario, if the traffic volume at a certain point in time is low, compact and split can successfully complete split and compaction, and ensure stable read/write performance for most of the time.

Since Split and compaction affect performance, is there a way to remove them?

Compaction cannot be avoided, but split can be automatically adjusted to manual.

By increasing the value of this parameter to a hard-to-reach value, such as 100 GB, auto split can be indirectly disabled (regionserver does not split region that has not reached GB ).

Combined with the regionsplitter tool, manual split is required when split is required.

Manual split is much more flexible and stable than automatic split. On the contrary, the management cost is not much higher. It is recommended for online real-time systems.

In terms of memory, the small region is flexible in setting the memstore size value, while the large region is too large or too small. After the Conference, the IO wait of the app increases during the flush process, if it is too small, the read performance will be affected due to too many store files.

Hbase. regionserver. Global. memstore. upperlimit/lowerlimit

Default Value:0.4/0.35

Upperlimit description:Hbase. hregion. memstore. Flush. size is used to flush the memstore when a single memstore reaches the specified value. However, a reigonserver may have hundreds of thousands of memstores, and each memstore may not be flush. size. The JVM heap is not enough. This parameter is used to limit the total memory occupied by memstores.

When the total memory occupied by all memstores in the reigonserver reaches 40% of heap, hbase forces all updates to be block and flush these memstores to release the memory occupied by all memstores.

Description of lowerlimit:The same as upperlimit, but when the global memstore memory reaches 35%, it will not flush all memstores, it will find some memstores with a large memory usage, do some flush, of course, updates will still be blocked. Lowerlimit is a remedy before global flush causes a performance slump. Why is the performance slump? It can be imagined that if memstore needs to perform full flush over a long period of time and cannot accept any read/write requests during this period, the performance of the hbase cluster will be greatly affected.

Optimization:

This is a heap memory protection parameter. The default value is applicable to most scenarios. It is generally adjusted to fit with some exclusive optimizations, such as read-intensive applications, to increase the read cache, lower the value, and free up more memory for other modules.

What is the impact of this parameter on users?

For example, 10 Gb memory, 100 region, 64 MB for each memstore. Assuming that each region has only one memstore, when the average usage of 100 memstores reaches about 50%, the limit of lowerlimit will be reached. Assume that many write requests are sent to other memstores. When the large region is not flushed, it may exceed upperlimit. All region will be block and global flush will be triggered.

However, except that your memory is very small or most of your application scenarios are read, I don't think you need to call this parameter.

Hfile. Block. cache. Size

Default Value:0.2

Note:The percentage of heap size occupied by the read cache of storefile. 0.2 indicates 20%. This value directly affects the data read performance.

Optimization:

Of course, the larger the value, the better. If the read ratio is less than the write ratio, it is okay to open to 0.4-0.5. If the read/write status is balanced, it is about 0.3. If you write more data than read data, use the default setting. When setting this value, you also need to refer to hbase. regionserver. Global. memstore. upperlimit. This value is the maximum percentage of memstore to heap. One of the two parameters affects reading and the other affects writing. If the sum of the two values is greater than 80-90%, oom risks may occur.

Hbase. hstore. blockingstorefiles

Default Value:7

Note:In compaction, if more than seven storefiles in a store (coulmn family) need to be merged, all write requests in the block will be flushed to limit the increase in the number of storefiles.

Optimization:

Block write requests affect the performance of the current region. Setting the value to the maximum number of store files supported by a single region is a good option. That is, when comapction is enabled, memstore continues to generate storefile. The maximum number of storefiles can be calculated by region size/memstore size. If you set region size to an infinitely large value, you need to estimate the maximum number of storefiles that a region may generate.

Hbase. hregion. memstore. Block. Multiplier

Default Value:2

Note:When the size of a memstore in a region exceeds twice that of a single memstore. size, all requests of the region are blocked for flush and memory is released. Although we set the total size of memstore, such as 64 m, imagine that at the end of 63.9m, I put a m data, at this time, the size of memstore will soar to exceed the expected memstore. size. This parameter is used to block all requests when the size of memstore exceeds memstore. size, so as to further limit the risk.

Optimization:

The default value of this parameter is reliable. If you predict that your normal application scenarios (excluding exceptions) will not experience unexpected write or write volume control, keep the default value. Under normal circumstances, your Write Request volume will often grow to several times of normal, so you should increase this multiple and adjust other parameter values, such as hfile. block. cache. size and hbase. regionserver. global. memstore. upperlimit/lowerlimit to reserve more memory to prevent hbase server oom.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.