Test data:
Datas
1001 lilei 138000011111002 Lily 138000011121003 Lucy 138000011131004 Meimei 13800001114
Data Bulk Import using MR, Mr. Cheng Hfile file is then imported with the Completebulkload tool.
1. You need to create a table name in HBase first:
hbase> create ' student ', {NAME = ' info '}
The Maven pom.xml configuration file is as follows:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<!--hbase-- <dependency> <groupId>org.apache.hbase</groupId> < artifactid>hbase-client</artifactid> <version>1.0.0</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactid>hbase-server </artifactId> <version>1.0.0</version> </dependency>
Write the MapReduce code as follows:
Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.hbase.HBaseConfiguration;ImportOrg.apache.hadoop.hbase.KeyValue;Importorg.apache.hadoop.hbase.client.HTable;Importorg.apache.hadoop.hbase.io.ImmutableBytesWritable;ImportOrg.apache.hadoop.hbase.mapreduce.HFileOutputFormat;Importorg.apache.hadoop.hbase.util.Bytes;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * @authorauthor E-Mail: *@versioncreated: March 2, 2016 4:15:57 * class Description*/ Public classCreatehfilebymapreduce { Public Static classMybulkmapperextendsMapper<longwritable, Text, immutablebyteswritable, keyvalue>{@Overrideprotected voidSetup (mapper<longwritable, Text, immutablebyteswritable, keyvalue>. Context context)throwsIOException, interruptedexception {Super. Setup (context); } @Overrideprotected voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] split= Value.tostring (). Split ("\ t");//Modify according to the actual situation if(Split.length = = 4){ byte[] Rowkey = split[0].getbytes (); Immutablebyteswritable Imrowkey=Newimmutablebyteswritable (Rowkey); Context.write (Imrowkey,NewKeyValue (Rowkey, bytes.tobytes ("info"), Bytes.tobytes ("name"), Bytes.tobytes (split[1]))); Context.write (Imrowkey,NewKeyValue (Rowkey, bytes.tobytes ("info"), Bytes.tobytes ("Age"), Bytes.tobytes (split[2]))); Context.write (Imrowkey,NewKeyValue (Rowkey, bytes.tobytes ("info"), bytes.tobytes ("Phone"), Bytes.tobytes (split[3]))); }}} @SuppressWarnings ("Deprecation" ) Public Static voidMain (string[] args) {if(Args.length! = 4) {System.err.println ("Usage:createhfilebymapreduce <table_name><data_input_path>); System.exit (2); } String tableName= Args[0]; String InputPath= Args[1]; String OutputPath= Args[2]; /*String tableName = "Student"; String InputPath = "Hdfs://node2:9000/datas"; String OutputPath = "Hdfs://node2:9000/user/output";*/htable htable=NULL; Configuration conf=hbaseconfiguration.create (); Try{htable=Newhtable (conf, tableName); Job Job= Job.getinstance (conf, "Createhfilebymapreduce"); Job.setjarbyclass (createhfilebymapreduce.class ); Job.setmapperclass (mybulkmapper.class); Job.setinputformatclass (Org.apache.hadoop.mapreduce.lib.input.TextInputFormat.class); //hfileoutputformat.configureincrementalload (Job, htable); Fileinputformat.addinputpath (Job,NewPath (InputPath)); Fileoutputformat.setoutputpath (Job,NewPath (OutputPath)); System.exit (Job.waitforcompletion (true)? 0:1 ); } Catch(Exception e) {e.printstacktrace (); } }}
Note: With MAVEN's assembly plugin, generate a fat jar package (that is, the dependent zookeeper and HBase jar packages are all in the MapReduce package), otherwise it will require the user static configuration, Add zookeeper and HBase configuration files and related jar packages to the Hadoop class.
The final jar package is Bulk.jar, the main class name is Cn.bd.batch.mr.CreateHfileByMapReduce, the hfile is generated, and the incremental hot load into hbase
Sudo-u HDFs Hadoop jar <xxoo>.jar <MainClass> <table_name> <data_input_path> HBase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
Hadoop jar Bulk.jar Cn.bd.batch.mr.CreateHfileByMapReduce student/datas/user/output
HBase Org.apache.hadoop.hbase.mapreduce.loadincrementalhfiles/user/output Student
This article refers to address: http://www.cnblogs.com/mumuxinfei/p/3823367.html
Data Bulk Import HBase