External zookeeper-based Glusterfs as a fully distributed HBase cluster Installation guide for Distributed file systems

Source: Internet
Author: User
Tags compact failover xsl glusterfs

(WJW) External zookeeper-based Glusterfs as a fully distributed HBase cluster Installation guide for Distributed file systems[X] Prerequisites
  • Server list:

    192.168.1.84 hbase84#Hbase-master

    192.168.1.85 hbase85#Hbase-regionserver,zookeeper

    192.168.1.86 hbase86#Hbase-regionserver,zookeeper

    192.168.1.87 hbase87#Hbase-regionserver,zookeeper

  • Jdk

    It is recommended to install Sun's JDK1.7 version!

  • Ssh

    Need to configure password-free login for each node!

  • Ntp

    The clock of the cluster must be consistent. A slight inconsistency is tolerable, but a great inconsistency can cause strange behavior. Run NTP or something else to sync your time.
    If you have a query or if you encounter a strange fault, you can check the system time is correct!

  • Ulimit and Nproc

    HBase is a database that uses a lot of file handles at the same time. Most Linux systems use a default value of 1024 that is not sufficient to modify the /etc/security/limits.conf file as:

    *               Soft    nproc   16384  *               hard    nproc   16384    *               soft    nofile  65536    *               hard    nofile  65536
  • ZooKeeper

  • Install the zookeeper on hbase85,hbase86,hbase87 3 nodes first

  • hbase85,hbase86,hbase87zookeeper! on the start

    /opt/app/zookeeper/bin/zkServer.sh start

  • HBase also requires a running Distributed File system: HDFS , orGlusterFS

    This guide is used GlusterFS as a distributed file system!
    The Mount directory for each node is: /mnt/gfs_v3/hbase
    Glusterfs is the core of the Scale-out storage solution Gluster, an open source Distributed file system with strong scale-out capability to support petabytes of storage capacity and handle thousands of of clients through scaling.
    The Glusterfs aggregates the physically distributed storage resources with TCP/IP or InfiniBand RDMA networks, using a single global namespace to manage the data.
    The Glusterfs is based on a stackable user space design that provides superior performance for a variety of data loads.

[X] Modify 192.168.1.84,192.168.1.85,192.168.1.86,192.168.1.87Of etc/hostsFile, added at the end of the file:
192.168.1.84 hbase84192.168.1.85 hbase85192.168.1.86 hbase86192.168.1.87 hbase87
[X] Copy hbase-0.98.8-hadoop2-bin.tar.gzTo each node's /optdirectory, and then do the following:
Cd/opttar ZXVF./hbase-0.98.8-hadoop2-bin.tar.gz
[X] Modify conf/hbase-env.shFile, added at the end of the file:
Export Java_home=/usr/java/defaultexport path= $JAVA _home/bin: $PATHexport hbase_pid_dir=/var/hadoop/pidsexport Hbase_manages_zk=falseexport hbase_heapsize=1000
[X] Modify conf/hbase-site.xmlFile, change to:
<?xml version= "1.0"? ><?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?><configuration>  <property>    <name>hbase.cluster.distributed</name>    <value>true</value >  </property>  <property>    <name>hbase.master.port</name>    <value >60000</value>  </property>  <property>    <name>hbase.rootdir</name>    <value>file:///mnt/gfs_v3/hbase</value>    <!--<value>hdfs://m1:9000/hbase</ Value>  -  </property>  <property>    <name>hbase.zookeeper.quorum< /name>    <value>192.168.1.85,192.168.1.86,192.168.1.87</value>  </property>  <property>    <name>hbase.zookeeper.property.clientPort</name>    <value>2181</ Value>  </property></configuration>
[X] Modify conf/regionservers文件, configure the From node region Server in the HBase cluster as follows:
Hbase85hbase86hbase87
[X] Add executable permissions to the script file
Chmod-r +x/opt/hbase-0.98.8-hadoop2/bin/*chmod-r +x/opt/hbase-0.98.8-hadoop2/conf/*.sh
[X] Start hbase
/opt/hbase-0.98.8-hadoop2/bin/start-hbase.sh
[X] Stop HBase
/opt/hbase-0.98.8-hadoop2/bin/stop-hbase.sh
Appendix:Htable some basic conceptsRow Key
  • Row primary key, HBase does not support queries such as conditional queries and order BY, read records can only be scanned by row key (and its range) or full table, so row key needs to be designed according to the business to take advantage of its storage sorting characteristics (table is sorted by row key dictionary like 1, 10,100,11,2) improves performance.

  • Avoid using incrementing numbers or times as Rowkey.

  • If the Rowkey is an integer, the binary way is more space-saving than using string to store

  • Reasonable control of the length of the Rowkey, as short as possible, because the Rowkey data will also exist in each cell.

  • If you need to pre-split the table into more than one region, it is best to customize the rules for splitting.

Column Family (Family of columns)
  • Declared at table creation, each column family as a storage unit. In the example above, an HBase table blog has been designed with two column families: article and author.

  • As few as possible, preferably no more than 3. Because each of the clusters is in a separate hfile, flush and compaction operations are done for a region, and when the data for a column cluster needs to be flush, the other columns need to be flush even if the data is rare. This creates a lot of unnecessary IO operations.

  • In the case of multi-column clusters, note that the order of magnitude of the data for each column cluster is consistent. If the order of magnitude of the two-column clusters is too large, the data scanning of the column clusters with fewer orders of magnitude is inefficient.

  • Put data that is frequently queried and infrequently queried into different column families.

  • Because the names of the columns and columns will exist in each cell of hbase, their names should be as short as possible. For example, replace Mycolumnfamily:mycolumnqualifier with F:q

Column (columns)
  • Each column of HBase belongs to a column family, prefixed by a column family name, such as Columns Article:title and article:content belong to the article column family, and Author:name and Author:nickname belong to the author column family.

  • Column can be dynamically added without creating a table, the columns of the same column family are gathered on a single storage unit and sorted by column key, so the design should design a column with the same I/O characteristics in a column Family to improve performance. At the same time, it is important to note that this column can be added and deleted, which is very different from our traditional database. So he is suitable for unstructured data.

Timestamp
  • HBase determines a single piece of data by row and column, the value of which may have multiple versions, the values of different versions are sorted in reverse chronological order, that is, the most recent data is first, and the latest version is returned by default when queried. As in the example above row Key=1 Author: The nickname value has two versions, 1317180070811 corresponding to the "one leaf crossing" and 1317180718830 corresponding to "yedu" (corresponding to the actual business can be understood to change at some point nickname to Yedu, but the old value still exists). Timestamp defaults to the current time of the system (accurate to milliseconds), or you can specify this value when writing data.

Value
  • Each value is uniquely indexed by 4 keys, tablename+rowkey+columnkey+timestamp=>value, such as {tablename= ' blog ' In the previous example, rowkey= ' 1 ', columnname= ' Author:nickname ', timestamp= ' 1317180718830 '} The unique value to be indexed is "yedu".

Storage type

TableNameis a string
RowKeyAnd ColumnName is a binary value (Java type byte[])
Timestampis a 64-bit integer (Java type Long)
valueis a byte array (Java type byte[]).

HBase Configuration OptimizationsZookeeper.session.timeout

Default value : 3 minutes (180000MS)

Description : The connection timeout between Regionserver and zookeeper. When the time-out expires, Reigonserver is removed from the RS cluster list, and Hmaster receives the removal notification. This server-owned regions will be re-balance, allowing other surviving regionserver to take over.

Tuning : This timeout determines whether Regionserver can be failover in a timely manner. Set to 1 minutes or less, you can reduce the failover time that is extended by waiting for timeouts.
However, it should be noted that for some online applications, regionserver from downtime to recovery time itself is very short (network Flash, crash and other failures, operations can quickly intervene), if the timeout time is lowered, It's not worth the candle. Because when Reigonserver was formally removed from the RS cluster, Hmaster began to do balance (let the other RS recover from the Wal logs recorded by the faulty machine). When the faulty RS is recovered after manual intervention, This balance action is meaningless, but will make the load uneven, to the RS to bring more burden. In particular, those scenarios where the fixed allocation of regions.

Hbase.regionserver.handler.count

Default value : 10

Description : The number of Regionserver request processing IO threads.

Tuning :
The tuning of this parameter is closely related to memory.
Fewer IO threads for large put scenarios that handle a higher memory consumption of a single request (bulk single put or scan with larger cache, both big put) or reigonserver memory-intensive scenarios.
More IO threads for scenarios with low memory consumption and very high TPS requirements for a single request. When setting this value, monitor memory as the primary reference.
It is important to note that if the server has a small number of region, a large number of requests fall on a region, because the fast-filled memstore trigger flush caused by the read-write lock will affect the global TPS, not the higher the number of IO threads the better.
When the pressure is measured, open the enabling Rpc-level logging,
The memory consumption and GC status of each request can be monitored at the same time, and the number of IO threads is adjusted reasonably by multiple-test results.
Here is a case? Hadoop and HBase optimization for Read intensive Search applications, the author sets the number of IO threads on the SSD machine to 100, for informational purposes only.

Hbase.hregion.max.filesize

default value : 256M

Note : The maximum storage space for a single reigon on the current Reigonserver, when a single region exceeds this value, the area is automatically split into smaller areas.

Tuning :
The small region is friendly to split and compaction, because the storefile speed in the split region or compact region is fast, Low memory footprint. The disadvantage is that split and compaction will be very frequent. In particular, a large number of small region constantly split, compaction, will lead to cluster response time fluctuations, the region too much not only to bring trouble to the management, It can even cause some hbase bugs. Normally 512 or less is a small region.
Large region, it is not suitable for frequent split and compaction, because doing a compact and split will produce a longer pause, the application of read and write performance impact is very large. In addition, the greater region implies a larger storefile, Compaction is also a challenge for memory.
Of course, the region also has its niche. If you have a low level of access at a certain point in your application, doing the compact and split at this time will enable both the split and the compaction and the most time-stable read and write performance.
Since split and compaction so affect performance, is there any way to remove it?
Compaction is unavoidable, split can be automatically adjusted to manual.
By turning this parameter value to a hard-to-reach value, such as 100G, you can indirectly disable automatic split (Regionserver does not split the region that does not reach 100G).
With Regionsplitter This tool, split manually when split is required.
Manual split is much higher in terms of flexibility and stability than automatic split, and on the contrary, the cost of management is not increasing, it is recommended to use online real-time systems.
Memory, small region in the setting of the size value of the memstore is more flexible, large region is too big than small, the conference led to flush when the app's IO wait increased, too small because the store file too much impact on read performance.

Hbase.regionserver.global.memstore.upperlimit/lowerlimit

Default value: 0.4/0.35

upperlimit Description : The function of hbase.hregion.memstore.flush.size This parameter is when all the memstore size sum in a single region exceeds the specified value, Flush all memstore of the region. Regionserver Flush is handled asynchronously by adding a queue to the request and simulating the production consumption pattern. There is a problem here, when the queue is too late to consume, generating a large backlog of requests can lead to a spike in memory, and the worst is to trigger Oom.
This parameter is used to prevent excessive memory consumption, when the total memory of all region memstores in Reigonserver reaches 40% of the heap, HBase forces all updates to the block and flush the region to free all memstore memory.

lowerlimit description : With Upperlimit, but lowerlimit in all region of Memstores occupied memory up to 35% of the heap, Do not flush all the memstore. It will find a memstore memory occupies the largest region, do individual flush, At this point the update will still be block.lowerlimit as a remedy before the performance is reduced by forcing flush in all region. In the log, the expression is "* * Flush thread woke up with memory above low water."

Tuning : This is a heap memory protection parameter, and the default value is already applicable to most scenarios.
Parameter adjustment will affect read and write, if the write pressure is often more than this threshold, then the small read cache hfile.block.cache.size increase the threshold, or more heap headroom, do not modify the read cache size.
If in the high-pressure situation, also did not exceed this threshold, then we recommend that you properly adjust the threshold to do the pressure measurement, to ensure that the number of triggers do not too much, and then there are more heap allowance, the increase hfile.block.cache.size improve read performance.
Is there another possibility? The Hbase.hregion.memstore.flush.size remains the same, but RS maintains an excessive region, knowing that the region number directly affects the size of the memory footprint.

Hfile.block.cache.size

Default value : 0.2

description : The StoreFile read cache occupies a percentage of the heap size, and 0.2 represents 20%. This value directly affects the performance of data read.

tuning : Of course, the bigger the better, if the writing is much less than reading, open to 0.4-0.5 is not a problem. If read and write more balanced, 0.3 or so. If you write more than read, be decisive by default. When setting this value, you also need to refer to it? Hbase.regionserver.global.memstore.upperLimit?, the value is the maximum percentage of heap memstore, two parameters one affects read, and one affects write. If the two values add up to more than 80-90%, there is a risk of oom , set it carefully.

Hbase.hstore.blockingStoreFiles

Default value : 7

Note : When flush, when there are more than 7 storefile in the store (COULMN Family) in a region, all write requests from Block are compaction to reduce the number of storefile.

Tuning : Block write requests can severely affect the response time of the current regionserver, but too much storefile can also affect read performance. From a practical point of view, in order to obtain a smoother response time, You can set the value to infinity. If the response time can tolerate a large peak trough, then the default or according to their own scene adjustment.

Hbase.hregion.memstore.block.multiplier

Default value : 2

Note : When a region's memstore takes up more than hbase.hregion.memstore.flush.size twice times the size of the memory, all requests from the block are flush to free memory.
Although we set the total memory size of the memstores used in region, such as 64M, imagine, at the last 63.9M, I put a 200M of data, At this point the size of the Memstore will instantly skyrocket to more than the expected hbase.hregion.memstore.flush.size of several times. This parameter is when the size of the memstore increases to more than hbase.hregion.memstore.flush.size Twice times, block all requests, containment risk further expanded.

tuning : The default value of this parameter is still relatively reliable. If you anticipate that your normal scenario (excluding exceptions) will not be controlled by bursts of write or write, then keep the default values. If normal, your write requests will often grow to a normal number of times, Then you should turn this multiple and adjust other parameter values, such as hfile.block.cache.size and Hbase.regionserver.global.memstore.upperlimit/lowerlimit, to reserve more memory, Prevent HBase server OOM.

Hbase.hregion.memstore.mslab.enabled

Default value : True

Description : Improves overall performance by reducing full GC due to memory fragmentation.

tuning : See Http://kenwublog.com/avoid-full-gc-in-hbase-using-arena-allocation

External Zookeeper-based Glusterfs as a fully distributed HBase cluster Installation guide for Distributed file systems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.