HBase trillion-level storage performance Optimization Summary

Source: Internet
Author: User
Tags server memory

hbase main cluster has been running stably in production environment for 1.5 time, the largest single table region number has reached more than 7,200, every day add the amount of storage there is capacity, The cognition of HBase has gone through the process of ignorance and maturity. In order to cope with the pressure of business data, HBase storage is also upgraded from the initial single-machine multithreading to a distributed storage system with disaster tolerant mechanism , and a set of alarm systems are developed for the early detection of problems in the cluster. Some of the experience with HBase optimization (for version 0.94) is a description of the hbase work for the past two years.


Service side

1.hbase.regionserver.handler.count: The number of threads requested by RPC, the default value is 10, the production environment is recommended to use 100, nor is the larger the better, especially when the request content is very large, such as scan/ Put a few m of data, will occupy too much memory, it may lead to frequent GC, and even memory overflow.


2.hbase.master.distributed.log.splitting: The default value is true and the recommended setting is false. Turn off the distributed log cut for HBase, which is replayed by master when log needs to be replay


3.hbase.regionserver.hlog.splitlog.writer.threads: The default value is 3, the recommended setting is 10, the number of threads used for log cutting


4.hbase.snapshot.enabled: Snapshot function, default is False (do not turn on), it is recommended to set to true, especially for some critical tables, timing with a snapshot backup is a good choice.


5.hbase.hregion.max.filesize: The default is 10G, if any of the storefile in the column familiy exceeds this value, then the region will be divided in half, Because the region division will have a short time in the region (usually within 5s), in order to reduce the impact on the business side, it is recommended to manually split the timing, can be set to 60G.


6.hbase.hregion.majorcompaction: The interval between the main merge of the region of HBase, the default is 1 days, the recommended setting is 0, the automatic major master merge is forbidden, The major merge will rewrite all the storefile in a store as a storefile file, and delete the deleted identity data during the merge process, and in the production cluster, the master merge can last for hours to reduce the impact on the business. It is recommended to perform regular major merges manually or through scripts or APIs during low peak business periods.



7.hbase.hregion.memstore.flush.size: The default value is 128M, the unit byte, once the Memstore exceeds the value will be flush if the regionserver JVM memory is more abundant ( 16G or more), can be adjusted to 256M.


8.hbase.hregion.memstore.block.multiplier: The default value of 2, if a memstore memory size has exceeded hbase.hregion.memstore.flush.size * Hbase.hregion.memstore.block.multiplier will block the Memstore write operation, to avoid blocking, the recommended setting is 5, if too large, there is the risk of oom. If the "Blocking updates for" <threadName> "in Region <regionName>: Memstore size < How much m> i" appears in the Regionserver log S >= than blocking < how much m> size "information, the description of this value is adjusted.


9.hbase.hstore.compaction.min: The default value is 3, if the total number of storefile in any store exceeds this value, the default merge operation will be triggered, and the 5~8 can be set in a manual periodic major Consolidation of storefile files in the compact reduces the number of merges, although this increases the time of the merge and the previous corresponding parameter is Hbase.hstore.compactionThreshold.


10.hbase.hstore.compaction.max: The default value is 10, which merges up to how many storefile at a time to avoid oom.


11.hbase.hstore.blockingstorefiles: The default is 7, if any one store (non. META. Table Store) has a storefile number of files greater than this value, the flush Memstore before the split or compact, while adding the region to the Flushqueue, delayed refresh, which will block the write operation until the compact completes or more than Hbase.hstore.blockingWaitTime ( Default 90s) The configured time can be set to 30 to avoid memstore not flush in time. When a large number of "region <regionName> have too many store files appear in the Regionserver run log; Delaying flush up to 90000ms ", this value needs to be adjusted


12.hbase.regionserver.global.memstore.upperlimit : The default value 0.4,regionserver the upper scale of total memory in all Memstore occupancy, and when this value is reached, the region that most needs to flush is found from the entire regionserver, until the total memory ratio drops below that number, with the default value.


13.hbase.regionserver.global.memstore.lowerlimit: Default value of 0.35, with default.


14.hbase.regionserver.thread.compaction.small: The default value is 1,regionserver do minor compaction when the number of thread constructor threads can be set to 5.


15.hbase.regionserver.thread.compaction.large: The default value is 1,regionserver do major compaction when the number of thread constructor threads can be set to 8.


16.hbase.regionserver.lease.period: The default value 60000 (60s), the client connection regionserver the lease time-out period, the client must be reported within this time, otherwise the client is considered dead. This is best adjusted to the actual business situation


17.hfile.block.cache.size: The default value 0.25,regionserver the memory size limit of the block cache, which can be appropriately resized in a read-biased business. It is important to note that the sum of the values of Hbase.regionserver.global.memstore.upperLimit and hfile.block.cache.size must be less than 0.8.


18.dfs.socket.timeout: The default value of 60000 (60s), we recommend that according to the actual Regionserver log monitoring found the exception for reasonable settings, such as we set to 900000, The modification of this parameter requires simultaneous changes to the Hdfs-site.xml


19.dfs.datanode.socket.write.timeout: Default 480000 (480s), sometimes regionserver when merging, Datanode write timeout may occur, 480000 Millis timeout while waiting for channel to is ready for write, the modification of this parameter requires simultaneous changes Hdfs-site.xml


JVM and garbage collection parameters:

Export hbase_regionserver_opts= "-xms36g-xmx36g-xmn1g-xx:+useparnewgc-xx:+useconcmarksweepgc-xx:+ Usecmscompactatfullcollection-xx:cmsfullgcsbeforecompaction=15-xx:cmsinitiatingoccupancyfraction=70-verbose:gc -xx:+printgcdetails-xx:+printgctimestamps-xloggc:/data/logs/gc-$ (hostname)-hbase.log "


Because our server memory is large (96G), we give some regionserver JVM memory to 64G, so far, has not happened once full gc,hbase in memory use control is really a lot of effort, such as the implementation of various Blockcache, Careful classmates can see the source code.



Client Side

1.hbase.client.write.buffer: Default is 2M, write cache size, recommended set to 5 m, unit is byte, of course, the larger the memory, and also tested set to 10M under the performance of the storage, but not 5M good

2.hbase.client.pause: The default is 1s, if you want to read or write low latency, it is recommended to set to 200, this value is usually used for failed retry, region search, etc.

3.hbase.client.retries.number: The default value is 10, the maximum number of client retries, can be set to 11, combined with the above parameters, total retry time 71s

4.hbase.ipc.client.tcpnodelay: Default is False, recommended set to True to turn off message buffering

5.hbase.client.scanner.caching: Scan cache, default is 1, avoid excessive use of client and Rs memory, generally 1000 reasonable, if a piece of data is too large, you should set a small value, Typically the number of data bars for a single query that sets the business requirements

If the scan data does not help the next query, you can set the scan Setcacheblocks to false to avoid using the cache;

6.table need to close, close scanner

7. Limit Scan Range: Specify the column families or specify the columns to query, specifying StartRow and Endrow

8. Using filter can reduce network consumption much

9. Through the Java multi-threaded storage and query, and control the time-out. Next will share the code of my HBase standalone multithreaded library

10. Notes on the construction of the table:

Turn on compression

A reasonable design rowkey

To pre-partition

Open Bloomfilter


Zookeeper Tuning

1.zookeeper.session.timeout: Default value of 3 minutes, not configurable too short, avoid session timeout, HBase stop service, online production environment because configured for 1 minutes, if too long, when Regionserver hangs, ZK also has to wait for this timeout (patch fix already), which prevents master from migrating to region in time.

2.zookeeper Quantity : 5 or 7 nodes are recommended. Give each zookeeper about 4G of memory, preferably with a separate disk.

3.hbase.zookeeper.property.maxclientcnxns: ZK's maximum number of connections, default is 300, no need to adjust.

4. Set the operating system swappiness to 0, the swap partition will be used when there is insufficient physical memory to avoid GC reclamation and will take more time, and when the session timeout exceeds ZK, there will be false positives for regionserver downtime.


HDFs Tuning

1.dfs.name.dir: Namenode data storage address, can be configured multiple, on different disks and configure an NFS remote file system so that Namenode data can have multiple backups

2.dfs.namenode.handler.count: Number of processing threads for Namenode node RPC, default is 10, can be set to 60

3.dfs.datanode.handler.count: Number of processing threads for Datanode node RPC, default is 3, can be set to 30

4.dfs.datanode.max.xcievers: Datanode simultaneously handles the upper limit of the file, default is 256, can be set to 8192


Other

Row family names, column names, and Rowkey are all stored in hfile, so these items are as short as possible when designing the table structure

The number of region regionserver is not over 1000, too much region will cause many memstore, may lead to memory overflow, also increase the time-consuming of major compact


Reprint please specify the original link: http://blog.csdn.net/odailidong/article/details/41794403


HBase trillion-level storage performance Optimization Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.