Test Environment
Tested hardware: 4-core I5 processor, 8 GB memory, 1 TB hard disk, Gigabit network
Test software: ubuntu12.10 64-bit, hadoop version: 0.20.205, hbase version: 0.90.5
Test settings: one master (namenode) and three resigonservers (datanode) write tens of millions of data (about 15 kb of data) to the hbase cluster)
Test Results
- The first and last columns insert the same data to hbase and HDFS respectively. We can see that there is a big gap. The data insertion time on hbase is about 10 times that of HDFS.
- The performance of inserting data into hbase is so much worse than that of HDFS. I will investigate why hbase write performance is so bad. The process of inserting data into hbase is roughly like this: when the client inserts data, it first requests to the master, which resigionserver the master replies to which region can insert data, and then the client directly communicates with the resigionserver to insert data, the resigionserver determines the database lock to which the data is inserted (resigion is composed of datablock), and stores the data in HDFS in the form of hfile (the data is not necessarily stored locally on the resigionserver ). The specific process is as follows can refer to blog http://jiajun.iteye.com/blog/899632
- One factor that affects hbase write performance is the buffer zone where put class is used to insert data. When you use the put class to insert data, the default situation is that data is written once by clinet and resigionserver for RPC to insert data. Because it is tens of millions of data records, multiple inter-process communication will inevitably affect the time. Hbase provides a write buffer for the client. The write operation is performed only when the buffer is filled up, which reduces the number of write tests.
- Cancel automatic write first, setautoflush (false)
- Then set the write buffer size (2 MB by default) setwritebuffersize () or change the properties of hbase. Client. Write. buffer for the hbase-site.xml
- The above list shows whether to set the buffer to 20 MB or improve the write time, but change to MB for a longer write time (Why ?)
- Another factor is Wal (write ahead log), because each resigion has a memstore that uses memory to temporarily store data, sort the data, and then inhale the hfile, in this way, data in the memory is recorded for disaster recovery, which saves time to reduce disk tracing. Therefore, I disabled Wal and tested the performance again, which is helpful. However, the help is not too great. It can be seen that Wal is not the bottleneck of writing. (Setwritetowal (false ))
- Because hbase is convenient to query and can quickly read data, some measures will be taken to sort data during writing. This is the mechanism of hbase merging and splitting. Hbase provides a scheme to pre-allocate resigion to improve the write performance, that is, the concept of a pool. You can allocate some resigion first and use it directly. Originally, 900 resigion records were stored for tens of millions of data records. Therefore, the author allocated 150 resigion records in advance (allocated 900 resigion records. The table creation time is too long, and an exception occurs, which has not been solved yet ), the write time of the results has been improved a lot, basically half of the original. If 900 resigion can be pre-allocated, it can save more time.
PS: When writing data, the longer the buffer is set, the shorter the time for writing data (such as 8 bytes ).
PPS: write buffer should not be set to an excessively large m level. In addition to pre-allocation of region, the write efficiency with multiple threads is still very high. I have tested that 40 threads are about 8 times faster than a single thread.
Test procedure
package GIS.Update;import java.io.IOException;import java.math.BigInteger;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.MasterNotRunningException;import org.apache.hadoop.hbase.ZooKeeperConnectionException;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.util.Bytes;public class TestUpdate {public static void testHDFS() throws IOException{String str="hdfs://cloudgis4:9000/usr/tmp/";Path path=new Path(str);Configuration conf=new Configuration();conf.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));FileSystem hdfs=path.getFileSystem(conf);hdfs.setReplication(path, (short)4);FSDataOutputStream fsDataOut=hdfs.create(new Path(str+"zzz"));long begin=System.currentTimeMillis();for(int i=0;i<10000000;i++){//byte [] kkk=new byte[10000+i/1000];byte [] kkk=new byte[12]; fsDataOut.write(kkk);//fsDataOut.close();//hdfs.close();}fsDataOut.close();long end=System.currentTimeMillis();System.out.println("hdfs:"+(end-begin));}public static void testHBase() throws IOException{Configuration conf=HBaseConfiguration.create();conf.addResource(new Path("/usr/local/hbase/conf/hbase-site.xml"));//conf.addResource("/usr/local/hbase/conf/hdfs-site.xml");HBaseAdmin admin=new HBaseAdmin(conf);String tableName="qq";String familyName="imageFamily";String columnName="imageColumn";HTableDescriptor htd=new HTableDescriptor (tableName);HColumnDescriptor hdc=new HColumnDescriptor(familyName);htd.addFamily(hdc);long before=System.currentTimeMillis();//admin.createTable(htd,splits);admin.createTable(htd,Bytes.toBytes("0000000"),Bytes.toBytes("9999999"),150);long after=System.currentTimeMillis();System.out.println(after-before);HTable table=new HTable(conf,htd.getName());table.setAutoFlush(false);//table.setWriteBufferSize(209715200); System.out.println(table.getWriteBufferSize());long begin=System.currentTimeMillis(); for(int i=0;i<10000000;i++){byte [] kkk=new byte[10000+i/1000]; //byte [] kkk=new byte[12]; Put p1=new Put(Bytes.toBytes(intToString(i)));p1.setWriteToWAL(false);p1.add(Bytes.toBytes(familyName),Bytes.toBytes(columnName),kkk); table.put(p1);}long end=System.currentTimeMillis();table.flushCommits();System.out.println("HBase:"+(end-begin));}static public String intToString(int x){String result=String.valueOf(x);int size=result.length();while(size<7){size++;result="0"+result;}return result;} public static void main(String []args) throws IOException{testHBase();}}