A performance optimization strategy for HBase database retrieval

Source: Internet
Author: User
Tags bulk insert compact fam set set zookeeper server memory

HBase Data Sheet Introduction

The HBase database is a distributed, column-oriented, open-source database that is primarily used for unstructured data storage purposes. Its design ideas come from Google's non-open source database "BigTable".

HDFS provides the underlying storage support for HBase, and MapReduce provides computing power for ZooKeeper, which provides a mechanism for coordinating services and failover (fail-over backup operations). Pig and Hive provide high-level language support for HBase, enabling it to perform data statistics (multi-table joins, etc.), and Sqoop provides RDBMS data import capabilities.

HBase cannot support a where condition, an Order by query, and only supports querying by a range of primary key Rowkey and primary key, but conditional filtering can be done through the APIs provided by HBase.

HBase's Rowkey is the unique identifier of the data row, which must be accessed through data rows, currently available in three ways, single-line key access, row-key-range access, full-table scan access. The data is sorted by row key, followed by bitwise comparison, with the larger values arranged after, for example, the sort of int: 1,10,100,11,12,2,20...,906, ....

Columnfamily is a "column family", which belongs to the schema table, defined in the table, each column belongs to a column family, column names are prefixed with the column family as "Columnfamily:qualifier", and access control, disk and memory usage statistics are performed at the column family level.

A cell is a storage unit determined by rows and columns, with values stored in bytecode, with no type.

Timestamp is an index that distinguishes between different versions of the Cell, 64-bit integer. Different versions of the data are sorted in reverse chronological order, with the latest version of the data in the front row.

Hbase is divided horizontally into N region in the row direction, each table starts with only one region, the data volume increases, the region automatically splits into two, the different region is distributed on different servers, but the same one is not split to different servers.

Region by columnfamily divided into Store,store as the smallest storage unit, for saving a column family of data, each Store includes in-memory memstore and persisted to disk hfile.

Figure 1 is an example of an HBase data table that distributes data across multiple node machines.

Figure 1. HBase Data Representation Example

Back to top of page

HBase Invoke API Sample

Similar to the JDBC library for manipulating relational databases, the HBase client package itself provides a number of APIs that can be manipulated to help users quickly manipulate the HBase database. Provides interfaces such as creating data tables, deleting data tables, adding fields, storing data, reading data, and so on. Listing 1 provides an author-encapsulated tool class that includes manipulating data tables, reading data, storing data, and exporting data.

Listing 1.HBase API Action Tool class code
Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.hbase.hcolumndescriptor;import Org.apache.hadoop.hbase.htabledescriptor;import Org.apache.hadoop.hbase.keyvalue;import Org.apache.hadoop.hbase.client.get;import Org.apache.hadoop.hbase.client.hbaseadmin;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.put;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.resultscanner;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.util.bytes;import Java.io.IOException;import Java.util.arraylist;import Java.util.list;public class Hbaseutil {private Configuration conf = null;private hbaseadmin admin = null;protected hbaseutil (Configuration conf) throws IOException {this.conf = conf; this.admin = new Hbaseadmin (CO NF);} public boolean existstable (String table) throws IOException {return admin.tableexists (table);} public void createtable (string table, byte[][] Splitkeys, string ... colfams) throwsIOException {htabledescriptor desc = new Htabledescriptor (table); for (String cf:colfams) {Hcolumndescriptor coldef = new Hcolumndescriptor (CF);d esc.addfamily (coldef); }if (Splitkeys! = null) {admin.createtable (desc, splitkeys);} else {admin.createtable (desc);}} public void disabletable (String table) throws IOException {admin.disabletable (table);} public void droptable (String table) throws IOException {if (existstable (table)) {disabletable (table); admin.deletetable (table); }} public void filltable (String table, int startrow, int endrow, int numcols, int pad, Boolean settimestamp, Boolean Rando M, String ... colfams) throws IOException {htable tbl = new htable (conf, table); for (int row = StartRow; row <= Endrow ; row++) {for (int col = 0, col < numcols; col++) {Put put = new put (Bytes.tobytes ("row-")); for (String Cf:colfams) {String colname = "col-"; String val = "val-"; if (Settimestamp) {Put.add (bytes.tobytes (CF), Bytes.tobytes (colname), col, Bytes.tobytes (Val));} else{Put.add (bytes.tobytes (CF), Bytes.tobytes (colname), Bytes.tobytes (Val));} } tbl.put (Put); }} tbl.close (); }public void put (string table, String row, String fam, String qual, String val) throws IOException {htable tbl = new Htab Le (conf, table); Put put = new put (bytes.tobytes (row)); Put.add (Bytes.tobytes (FAM), Bytes.tobytes (qual), Bytes.tobytes (Val)); Tbl.put (Put); Tbl.close (); } public void put (string table, String row, String fam, String qual, Long ts, string val) throws IOException {htable tbl = new htable (conf, table); Put put = new put (bytes.tobytes (row)); Put.add (Bytes.tobytes (FAM), Bytes.tobytes (qual), TS, Bytes.tobytes (val)); Tbl.put (Put); Tbl.close ();  } public void put (String table, string[] rows, string[] fams, string[] quals, long[] ts, string[] vals) throws IOException {htable tbl = new htable (conf, table); for (string row:rows) {Put put = new put (Bytes.tobytes (Row)]; for (string fam : fams) {int v = 0; for (string qual:quals) {string val = Vals[v < Vals.lengTh? V:vals.length]; Long T = Ts[v < ts.length v:ts.length-1]; Put.add (Bytes.tobytes (FAM), Bytes.tobytes (qual), T, Bytes.tobytes (Val)); v++; }} tbl.put (Put); } tbl.close (); } public void dump (String table, string[] rows, string[] fams, string[] quals) throws IOException {htable tbl = new Htabl E (conf, table); List<get> gets = new Arraylist<get> ();  for (string row:rows) {Get get = new Get (bytes.tobytes (row)); Get.setmaxversions (); if (fams! = null) {for (string fam : Fams) {for (String qual:quals) {get.addcolumn (Bytes.tobytes (FAM), bytes.tobytes (qual));}}} Gets.add (get); } result[] results = tbl.get (gets); for (Result result:results) {KeyValue Kv:result.raw ()) {System.out.println ("kv:" + kv + ", Value:" + bytes.to String (Kv.getvalue ())); }}} private static void scan (int caching, int batch) throws IOException {htable table = null; final int[] counters = { 0, 0}; Scan scan = new scan (); Scan.setcaching (caching); Co scancachebatchexample-1-set Set caching and batch parameters. Scan.setbatch (Batch); Resultscanner scanner = Table.getscanner (scan); for (Result Result:scanner) {counters[1]++;//CO Scancachebatchexample-2-count Count the number of the Results available. } scanner.close (); System.out.println ("Caching:" + Caching + ", Batch:" + Batch + ", Results:" + counters[1] + ", RPCs:" + counters[0]); }}

The API for the operation table is provided by Hbaseadmin, specifically explaining the operation deployment of Scan.

HBase table data is divided into multiple tiers, Hregion->hstore->[hfile,hfile,..., Memstore].

In HBase, a table can have multiple column Family, and in the process of a Scan, each column Family (Store) reads the data from a Storescanner object. The data for each Store consists of an in-memory Memstore and a hfile file on disk, and the corresponding Storescanner object uses a Memstorescanner and N Storefilescanner for actual data reading.

Therefore, reading a row of data requires the following steps:

1. Read each Store in sequence

2. For each store, merge the relevant hfile and in-memory Memstore below the store

Both steps are done through the heap. The read of Regionscanner is done by a heap consisting of the following multiple Storescanner, using the Regionscanner member variable keyvalueheap storeheap representation. One storescanner a heap, and the elements in the heap are the Storefilescanner and memstorescanner corresponding to the hfile and memstore that are contained underneath. The advantage of the heap is the high efficiency of the heap, the ability to allocate memory dynamically, without having to determine the life cycle beforehand.

Then call Seekscanners () to seek for these storefilescanner and Memstorescanner respectively. Seek is for KeyValue, the semantics of seek is seek to the specified KeyValue, if the specified KeyValue does not exist, then seek to the next specified KeyValue.

Description of common methods of scan class:

Scan.addfamily ()/scan.addcolumn (): Specifies the desired Family or column, and returns all Columns if no addfamily or column is called;

Scan.setmaxversions (): Specifies the maximum number of versions. If you call setmaxversions without any arguments, it means that all versions are taken. If you do not drop the setmaxversions, only the latest version will be taken.

Scan.settimerange (): Specifies the maximum timestamp and minimum timestamp, which can only be obtained by the Cell in this range;

Scan.settimestamp (): Specifies the timestamp;

Scan.setfilter (): Specify filter to filter out unwanted information;

Scan.setstartrow (): Specifies the start line. If not called, start from the table header;

Scan.setstoprow (): Specifies the end of the line (not including this line);

Scan. Setcaching (): The number of rows read from the server side (affecting RPC);

Scan.setbatch (): Specifies the maximum number of cells to return. Used to prevent excessive data in a row, resulting in OutofMemory errors that are unrestricted by default.

Back to top of page

HBase Data Table Optimization

HBase is a highly reliable, high-performance, column-oriented, scalable, distributed database, but read-write performance degrades when the concurrency is too high or the amount of data is large. We can gradually improve the retrieval speed of HBase using the following method.

Pre-partitioning

By default, a region partition is created automatically when the HBase table is created, and when the data is imported, all HBase clients write data to this region until the region is large enough to slice. One way to speed up bulk write is by pre-creating some empty regions, so that when data is written to HBase, the data is load-balanced within the cluster according to the region partitioning situation.

Rowkey optimization

The Rowkey in HBase is stored in dictionary order, so when designing Rowkey, make full use of the sorting characteristics, store the data that is often read together, and place the data that may be accessed recently.

In addition, if the Rowkey is incremented, it is recommended that you do not use positive-order direct write Rowkey, but instead reverse Rowkey in reverse way, making the Rowkey roughly balanced, so that the design has the advantage of being able to load balance the regionserver, Otherwise, it is easy to generate all the new data on a regionserver, which can also be combined with table pre-segmentation design.

Reduce the number of columnfamily

Don't define too many columnfamily in a single table. Currently Hbase is not able to handle more than two or three columnfamily tables well. Because a columnfamily is flush, its neighboring columnfamily will also be triggered by the correlation effect flush, resulting in more I/O generated by the system.

Cache Policy (setcaching)

When creating a table, you can place the table in the Regionserver cache with Hcolumndescriptor.setinmemory (true), guaranteeing that the cache will be hit when read.

Setting the storage life cycle

When you create a table, you can set the storage lifetime of the data in the table through hcolumndescriptor.settimetolive (int timeToLive), and the expired data is automatically deleted.

HDD Configuration

Each Regionserver management 10~1000 a regions, each region in the 1~2G, each server must be at least 10G, the largest to 1000*2G=2TB, consider 3 backup, then 6TB. Plan one is to use 3 2TB hard disk, two is 12 500G hard disk, bandwidth enough, the latter can provide greater throughput, more granular redundant backup, faster single-disk recovery.

Allocate appropriate memory to the Regionserver service

The larger the better, without affecting the other services. For example hbase-env.sh in the Conf directory of HBase Add export hbase_regionserver_opts= "-xmx16000m $HBASE _regionserver_opts"

Where 16000m is the amount of memory allocated to Regionserver.

Number of write data backups

The number of backups is proportional to the read performance, inversely related to write performance, and the number of backups affects high availability. There are two configurations, one is to copy the Hdfs-site.xml into the Conf directory of HBase, and then add or modify the value of the configuration item dfs.replication to the number of backups to be set, which takes effect for all hbase user tables, and another way to is to overwrite the HBase code, let hbase support set the number of backups for the column family, set the number of column family backups when creating the table, default to 3, and this type of backup is only valid for the set of column families.

WAL (pre-write log)

You can set the switch, which means that HBase does not write the log before writing the data, the default is to open, turn off will improve performance, but if the system fails (responsible for inserting the regionserver hangs), the data may be lost. Configure Wal to set the Wal of the Put instance to call Put.setwritetowal (Boolean) when invoking the Java API write.

Bulk Write

HBase's Put supports a single insert, also supports bulk INSERT, generally faster bulk write, saving back-and-forth network overhead. When the client calls the Java API, put the batch put in a put list and then call Htable's put (put list) function to write in bulk.

The number of clients pulled from the server at one time

By configuring a large amount of data to be pulled at once, you can reduce the time that the client gets the data, but it consumes client memory. There are three places to configure:

1) Configure hbase.client.scanner.caching in the Conf configuration file of HBase;

2) configuration by calling htable.setscannercaching (int scannercaching);

3) Configure by calling scan.setcaching (int caching). The priority of the three is getting higher.

Regionserver number of request processing IO threads

Fewer IO threads are suitable for large put scenarios where a single request memory consumption is high (bulk put or Scan with a larger cache, both big puts) or reigonserver memory-intensive scenarios.

More IO threads are suitable for scenarios where a single request memory consumption is low and the TPS requires a very high transaction volume per second (Transactionpersecond). When this value is set, the primary reference is to monitor memory.

The configuration item in the Hbase-site.xml configuration file is Hbase.regionserver.handler.count.

Region Size Setting

The configuration entry is hbase.hregion.max.filesize and the owning profile is Hbase-site.xml., the default size is 256M.

The maximum storage space for a single reigon on the current Reigonserver, when a single region exceeds this value, the area is automatically split into smaller areas. The small region is friendly to split and compaction because the StoreFile in the region or compact region is fast and the memory footprint is low. The disadvantage is that split and compaction will be very frequent, especially the number of small region constantly split, compaction, will lead to cluster response time fluctuation is very large, region number too much not only to bring trouble to the management, and even will cause some Hbase bug 。 The average 512M or less is a small region. The large region is less suitable for frequent split and compaction, because doing a compact and split will result in a longer pause and a very large impact on the read and write performance of the application.

In addition, the large region means that larger storefile,compaction are also a challenge to memory. If you have a low level of access at a certain point in your application scenario, doing the compact and split at this time will not only complete split and compaction, but also ensure smooth read and write performance for most of the time. Compaction is unavoidable, split can be adjusted from auto to manual. By turning this parameter value to a hard-to-reach value, such as 100G, you can indirectly disable automatic split (Regionserver does not split the region that does not reach 100G). With Regionsplitter This tool, split manually when split is required. Manual split is much more flexible and stable than auto split, and the cost of management is not increasing, it is recommended to use online real-time systems. Memory, small region in the setting of the size of the memstore is more flexible, large region is too big than small, the conference led to flush when the app's IO wait increased, too small because storefile too much impact on read performance.

HBase Configuration

It is recommended that HBase server memory be at least 32G, and table 1 is a recommended value for the memory assigned to each role through hands-on testing.

Table 1. HBase-related service configuration information
Module Type of service Memory Requirements
Hdfs HDFS NameNode 16GB
HDFS DataNode 2GB
HBase Hmaster 2GB
Hregionserver 16GB
ZooKeeper ZooKeeper 4GB

The single region size of HBase is recommended to be set larger, and it is recommended that 2g,regionserver handle a small amount of large region faster than a large number of small region. For unimportant data, it is placed in a separate column family when the table is created, and the number of column family backups is set to 2 (by default, this guarantees both dual backup and space saving and write performance, at the cost of a slightly less high availability than the number of backups of 3 and a lower read performance than the default number of backups.)

Practical cases

Project requirements can delete data stored in the HBase data table, the Rowkey of the data in HBase consists of the task ID (the data is generated by the task) plus a 16-bit random number, and the task information is maintained by a single table. Figure 2 shows the data removal flowchart.

Figure 2. Data removal flowchart

The original design was to delete the task and remove the task's corresponding data stored in HBase at the same time as the task ID. However, more HBase data can cause the deletion to take a long time, and because of the high disk I/O, the data read and write times out.

When you view the HBase log discovery Delete data, HBase is doing Major compaction operation. The purpose of the Major compaction operation is to merge files and purge deleted, expired, redundant versions of the data. Major compaction when HBase merges the StoreFile in region, the action that lasts for a long time causes the whole region to be unreadable, eventually causing all queries based on these region to time out.

If you want to solve the Major compaction problem, you need to review its source code. By viewing HBase source code Discovery regionserver at boot time, there is a compactionchecker thread that periodically detects whether a Compact is required. As shown in source code 3.

Figure 3. Compactionchecker Thread Code Map

Ismajorcompaction will determine whether to do Major Compact according to the hbase.hregion.majorcompaction parameter. If Hbase.hregion.majorcompaction is 0, false is returned. Modify the configuration file Hbase.hregion.majorcompaction to 0, prohibit hbase periodic Major compaction mechanism, through the custom timing mechanism (in the morning when HBase business is not busy) to perform Major operation, this timing can be is through the Linux cron timed startup script, you can also through the Java timer schedule, in the actual project using Quartz to start, the time configuration is given in the configuration file, you can easily modify the Major Compact startup time. With this modification, we find that there will still be a Compact operation after deleting the data. This process enters the Needscompaction = True branch. View Needscompaction Judging conditions (storefiles.size ()-filescompacting.size ()) > Minfilestocompact trigger. At the same time, when the number of files to be tightened equals the number of files in Store, the Minor Compact is automatically upgraded to the Major compact. However, the Compact operation cannot be banned because it can cause data to persist and ultimately affect query efficiency.

Based on the above analysis, we must reconsider the process of deleting data. To the user, the user simply does not retrieve the deleted task at the time of retrieval. Then you only need to delete the task record, and the data associated with the task does not need to be deleted immediately. When the system is idle, the data in the HBase data table is deleted periodically, and the region is Major Compact to clean up the deleted data. By modifying the task deletion process, the requirements of the project are met, and this modification does not require the configuration of HBase to be modified.

Figure 4. Data removal Process Comparison Chart

Retrieving, querying, and deleting the data in the HBase data table has a large number of correlations, and you need to look at the source code of the HBase data table to determine the root cause and final solution that caused the retrieval performance bottleneck.

The article has been published in the IBM Developer Forum: http://www.ibm.com/developerworks/cn/java/j-lo-HBase/index.html
Copyright belongs to the IBM Developer Forum, please specify the source of the reprint.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

A performance optimization strategy for HBase database retrieval

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.