Data Bulk Import HBase

Source: Internet
Author: User

Test data:

Datas

1001    lilei  138000011111002    Lily  138000011121003    Lucy  138000011131004    Meimei  13800001114

Data Bulk Import using MR, Mr. Cheng Hfile file is then imported with the Completebulkload tool.

1. You need to create a table name in HBase first:

hbase> create ' student ', {NAME = ' info '}

The Maven pom.xml configuration file is as follows:

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>

<!--hbase--        <dependency>            <groupId>org.apache.hbase</groupId>            < artifactid>hbase-client</artifactid>            <version>1.0.0</version>        </dependency>                <dependency>            <groupId>org.apache.hbase</groupId>            <artifactid>hbase-server </artifactId>            <version>1.0.0</version>        </dependency>

Write the MapReduce code as follows:

Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.hbase.HBaseConfiguration;ImportOrg.apache.hadoop.hbase.KeyValue;Importorg.apache.hadoop.hbase.client.HTable;Importorg.apache.hadoop.hbase.io.ImmutableBytesWritable;ImportOrg.apache.hadoop.hbase.mapreduce.HFileOutputFormat;Importorg.apache.hadoop.hbase.util.Bytes;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/**  * @authorauthor E-Mail: *@versioncreated: March 2, 2016 4:15:57 * class Description*/ Public classCreatehfilebymapreduce { Public Static classMybulkmapperextendsMapper<longwritable, Text, immutablebyteswritable, keyvalue>{@Overrideprotected voidSetup (mapper<longwritable, Text, immutablebyteswritable, keyvalue>. Context context)throwsIOException, interruptedexception {Super. Setup (context); } @Overrideprotected voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] split= Value.tostring (). Split ("\ t");//Modify according to the actual situation            if(Split.length = = 4){                byte[] Rowkey = split[0].getbytes (); Immutablebyteswritable Imrowkey=Newimmutablebyteswritable (Rowkey); Context.write (Imrowkey,NewKeyValue (Rowkey, bytes.tobytes ("info"), Bytes.tobytes ("name"), Bytes.tobytes (split[1]))); Context.write (Imrowkey,NewKeyValue (Rowkey, bytes.tobytes ("info"), Bytes.tobytes ("Age"), Bytes.tobytes (split[2]))); Context.write (Imrowkey,NewKeyValue (Rowkey, bytes.tobytes ("info"), bytes.tobytes ("Phone"), Bytes.tobytes (split[3]))); }}} @SuppressWarnings ("Deprecation" )     Public Static voidMain (string[] args) {if(Args.length! = 4) {System.err.println ("Usage:createhfilebymapreduce <table_name><data_input_path>); System.exit (2); } String tableName= Args[0]; String InputPath= Args[1]; String OutputPath= Args[2]; /*String tableName = "Student";        String InputPath = "Hdfs://node2:9000/datas"; String OutputPath = "Hdfs://node2:9000/user/output";*/htable htable=NULL; Configuration conf=hbaseconfiguration.create (); Try{htable=Newhtable (conf, tableName); Job Job= Job.getinstance (conf, "Createhfilebymapreduce"); Job.setjarbyclass (createhfilebymapreduce.class ); Job.setmapperclass (mybulkmapper.class); Job.setinputformatclass (Org.apache.hadoop.mapreduce.lib.input.TextInputFormat.class); //hfileoutputformat.configureincrementalload (Job, htable); Fileinputformat.addinputpath (Job,NewPath (InputPath)); Fileoutputformat.setoutputpath (Job,NewPath (OutputPath)); System.exit (Job.waitforcompletion (true)? 0:1 ); }        Catch(Exception e) {e.printstacktrace (); }            }}

Note: With MAVEN's assembly plugin, generate a fat jar package (that is, the dependent zookeeper and HBase jar packages are all in the MapReduce package), otherwise it will require the user static configuration, Add zookeeper and HBase configuration files and related jar packages to the Hadoop class.

The final jar package is Bulk.jar, the main class name is Cn.bd.batch.mr.CreateHfileByMapReduce, the hfile is generated, and the incremental hot load into hbase
Sudo-u HDFs Hadoop jar <xxoo>.jar <MainClass> <table_name> <data_input_path> HBase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles

Hadoop jar Bulk.jar Cn.bd.batch.mr.CreateHfileByMapReduce student/datas/user/output

HBase Org.apache.hadoop.hbase.mapreduce.loadincrementalhfiles/user/output Student

This article refers to address: http://www.cnblogs.com/mumuxinfei/p/3823367.html

Data Bulk Import HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.