Hbase Data Writing Test

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Test Environment

Tested hardware: 4-core I5 processor, 8 GB memory, 1 TB hard disk, Gigabit network

Test software: ubuntu12.10 64-bit, hadoop version: 0.20.205, hbase version: 0.90.5

Test settings: one master (namenode) and three resigonservers (datanode) write tens of millions of data (about 15 kb of data) to the hbase cluster)

Test Results

The first and last columns insert the same data to hbase and HDFS respectively. We can see that there is a big gap. The data insertion time on hbase is about 10 times that of HDFS.
The performance of inserting data into hbase is so much worse than that of HDFS. I will investigate why hbase write performance is so bad. The process of inserting data into hbase is roughly like this: when the client inserts data, it first requests to the master, which resigionserver the master replies to which region can insert data, and then the client directly communicates with the resigionserver to insert data, the resigionserver determines the database lock to which the data is inserted (resigion is composed of datablock), and stores the data in HDFS in the form of hfile (the data is not necessarily stored locally on the resigionserver ). The specific process is as follows can refer to blog http://jiajun.iteye.com/blog/899632
One factor that affects hbase write performance is the buffer zone where put class is used to insert data. When you use the put class to insert data, the default situation is that data is written once by clinet and resigionserver for RPC to insert data. Because it is tens of millions of data records, multiple inter-process communication will inevitably affect the time. Hbase provides a write buffer for the client. The write operation is performed only when the buffer is filled up, which reduces the number of write tests.
- Cancel automatic write first, setautoflush (false)
- Then set the write buffer size (2 MB by default) setwritebuffersize () or change the properties of hbase. Client. Write. buffer for the hbase-site.xml
- The above list shows whether to set the buffer to 20 MB or improve the write time, but change to MB for a longer write time (Why ?)
Another factor is Wal (write ahead log), because each resigion has a memstore that uses memory to temporarily store data, sort the data, and then inhale the hfile, in this way, data in the memory is recorded for disaster recovery, which saves time to reduce disk tracing. Therefore, I disabled Wal and tested the performance again, which is helpful. However, the help is not too great. It can be seen that Wal is not the bottleneck of writing. (Setwritetowal (false ))
Because hbase is convenient to query and can quickly read data, some measures will be taken to sort data during writing. This is the mechanism of hbase merging and splitting. Hbase provides a scheme to pre-allocate resigion to improve the write performance, that is, the concept of a pool. You can allocate some resigion first and use it directly. Originally, 900 resigion records were stored for tens of millions of data records. Therefore, the author allocated 150 resigion records in advance (allocated 900 resigion records. The table creation time is too long, and an exception occurs, which has not been solved yet ), the write time of the results has been improved a lot, basically half of the original. If 900 resigion can be pre-allocated, it can save more time.

PS: When writing data, the longer the buffer is set, the shorter the time for writing data (such as 8 bytes ).

PPS: write buffer should not be set to an excessively large m level. In addition to pre-allocation of region, the write efficiency with multiple threads is still very high. I have tested that 40 threads are about 8 times faster than a single thread.

Test procedure

package GIS.Update;import java.io.IOException;import java.math.BigInteger;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.MasterNotRunningException;import org.apache.hadoop.hbase.ZooKeeperConnectionException;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.util.Bytes;public class TestUpdate {public static void testHDFS() throws IOException{String str="hdfs://cloudgis4:9000/usr/tmp/";Path path=new Path(str);Configuration conf=new Configuration();conf.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));FileSystem hdfs=path.getFileSystem(conf);hdfs.setReplication(path, (short)4);FSDataOutputStream fsDataOut=hdfs.create(new Path(str+"zzz"));long begin=System.currentTimeMillis();for(int i=0;i<10000000;i++){//byte [] kkk=new byte[10000+i/1000];byte [] kkk=new byte[12]; fsDataOut.write(kkk);//fsDataOut.close();//hdfs.close();}fsDataOut.close();long end=System.currentTimeMillis();System.out.println("hdfs:"+(end-begin));}public static void testHBase() throws IOException{Configuration conf=HBaseConfiguration.create();conf.addResource(new Path("/usr/local/hbase/conf/hbase-site.xml"));//conf.addResource("/usr/local/hbase/conf/hdfs-site.xml");HBaseAdmin admin=new HBaseAdmin(conf);String tableName="qq";String familyName="imageFamily";String columnName="imageColumn";HTableDescriptor htd=new HTableDescriptor (tableName);HColumnDescriptor hdc=new HColumnDescriptor(familyName);htd.addFamily(hdc);long before=System.currentTimeMillis();//admin.createTable(htd,splits);admin.createTable(htd,Bytes.toBytes("0000000"),Bytes.toBytes("9999999"),150);long after=System.currentTimeMillis();System.out.println(after-before);HTable table=new HTable(conf,htd.getName());table.setAutoFlush(false);//table.setWriteBufferSize(209715200); System.out.println(table.getWriteBufferSize());long begin=System.currentTimeMillis();     for(int i=0;i<10000000;i++){byte [] kkk=new byte[10000+i/1000];   //byte [] kkk=new byte[12]; Put p1=new Put(Bytes.toBytes(intToString(i)));p1.setWriteToWAL(false);p1.add(Bytes.toBytes(familyName),Bytes.toBytes(columnName),kkk); table.put(p1);}long end=System.currentTimeMillis();table.flushCommits();System.out.println("HBase:"+(end-begin));}static public String intToString(int x){String result=String.valueOf(x);int size=result.length();while(size<7){size++;result="0"+result;}return result;} public static void main(String []args) throws IOException{testHBase();}}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hbase Data Writing Test

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support