The fastest way to import data is to skip the Wal direct production of the underlying hfile file
(Environment: centos6.5, Hadoop2.6.0, HBase0.98.9)
1.SHELL mode
1.1 IMPORTTSV Direct Import
Command: Bin/hbase ORG.APACHE.HADOOP.HBASE.MAPREDUCE.IMPORTTSV
Usage:importtsv-dimporttsv.columns=a,b,c <tablename> <inputdir>
Test:
1.1.1 Creating a good table in HBase
Create ' testImport1 ', ' CF '
1.1.2 Prepare the data file Sample1.csv and upload it to HDFs with the content:
1, "Tom"
2, "Sam"
3, "Jerry"
4, "Marry"
5, "John
1.1.3 Importing using the import command
Bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv-dimporttsv.separator= ","-dimporttsv.columns=hbase_row_key , CF Testimport1/sample1.csv
1.1.4 Results
1.2 Production of hfile files through IMPORTTSV, and then via Completebulkload into HBase
1.2.1 Using the source data and creating a new table
Create ' testImport2 ', ' CF '
1.2.2 using commands to produce hfile files
Bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv-dimporttsv.separator= ","-dimporttsv.bulk.output=hfile_tmp -DIMPORTTSV.COLUMNS=HBASE_ROW_KEY,CF Testimport2/sample1.csv
1.2.3 Intermediate results on HDFs
1.2.4 using commands to import hfile files into HBase
Hadoop jar Lib/hbase-server-0.98.9-hadoop2.jar completebulkload hfile_tmp testImport2
1.2.5 Results
Note: 1. If there is a missing packet error message, include the HBase jar package in the classpath of Hadoop; 2. The essence of running this command is an HDFS MV operation and does not start MapReduce.
2.API Code mode
Code is a little more flexible, and many things can be customized.
Directly paste the code bar:
Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsshell;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.keyvalue;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.mapreduce.hfileoutputformat2;import Org.apache.hadoop.hbase.mapreduce.loadincrementalhfiles;import Org.apache.hadoop.hbase.util.bytes;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.input.textinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; Import Org.slf4j.logger;import Org.slf4j.loggerfactory;import Java.io.ioexception;public class BulkLoadJob {static Logger Logger = LoggerfaCtory.getlogger (bulkloadjob.class);p ublic static class Bulkloadmap extends Mapper<longwritable, Text, Immutablebyteswritable, keyvalue> {public void map (longwritable key, Text value, Context context) throws IOException, I nterruptedexception {string[] valuestrsplit = value.tostring (). Split ("\ t"); String hkey = valuestrsplit[0]; String family = Valuestrsplit[1].split (":") [0]; String column = Valuestrsplit[1].split (":") [1]; String Hvalue = valuestrsplit[2];final byte[] RowKey = bytes.tobytes (hkey); final immutablebyteswritable hkey = new Immutab Lebyteswritable (RowKey);//Put hput = new put (rowKey);//byte[] cell = bytes.tobytes (hvalue);//Hput.add (Bytes.tobytes ( Family), bytes.tobytes (column), cell); KeyValue kv = new KeyValue (RowKey, Bytes.tobytes (family), bytes.tobytes (column), Bytes.tobytes (Hvalue)); context.write (HKey, KV);}} public static void Main (string[] args) throws Exception {Configuration conf = hbaseconfiguration.create (); Conf.set (" Hbase.zookeeper.property.clientPort "," 2182 "); CoNf.set ("Hbase.zookeeper.quorum", "msg801,msg802,msg803"), Conf.set ("Hbase.master", "msg801:60000"); string[] Dfsargs = new Genericoptionsparser (conf, args). Getremainingargs (); String InputPath = dfsargs[0]; System.out.println ("Source:" + dfsargs[0]); String OutputPath = dfsargs[1]; System.out.println ("dest:" + dfsargs[1]); Htable htable = null;try {Job Job = job.getinstance (conf, "Test Import hfile & Bulkload"); Job.setjarbyclass (Bulkloadjo B.class); Job.setmapperclass (BulkLoadJob.BulkLoadMap.class); Job.setmapoutputkeyclass ( Immutablebyteswritable.class); Job.setmapoutputvalueclass (keyvalue.class);// Speculationjob.setspeculativeexecution (false); Job.setreducespeculativeexecution (false);//In/out Formatjob.setinputformatclass (Textinputformat.class); Job.setoutputformatclass (Hfileoutputformat2.class); Fileinputformat.setinputpaths (Job, InputPath); Fileoutputformat.setoutputpath (Job, New Path (OutputPath)), htable = new htable (conf, dfsargs[2]); Hfileoutputformat2.configureincrementalload (Job, htableif (Job.waitforcompletion (True)) {Fsshell shell = new Fsshell (conf); try {shell.run (new string[] {"-chmod", "-R", "777", DFSARGS[1]});} catch (Exception e) {logger.error ("Couldnt change the file permissions", e); throw new IOException (e);} Loaded into hbase table loadincrementalhfiles loader = new Loadincrementalhfiles (conf);//two ways can//mode one string[] Loadargs = {OUTPUTPA Th, dfsargs[2]};loader.run (Loadargs);//Mode two//Loader.dobulkload (new Path (OutputPath), htable);} else {logger.error ("Loading failed."); System.exit (1);}} catch (IllegalArgumentException e) {e.printstacktrace ();} finally {if (htable! = null) {Htable.close ();}}}}
2.1 Creating a new Table
Create ' testImport3 ', ' fm1 ', ' fm2 '
2.2 Create the Sample2.csv and upload to HDFs with the content:
Key1 fm1:col1 value1
Key1 Fm1:col2 value2
Key1 Fm2:col1 Value3
Key4 Fm1:col1 Value4
Use the command:
Hadoop jar Bulkloadjob.jar hdfs://msg/sample2,csv hdfs://msg/hfileout testImport3
Note: You can use KeyValue and put in 1.mapper; 2. Note The classpath;3 of the jar package. If Hadoop is ha, you need to use Ha's name, such as our active Namenode name, called msg801, But the nameservice of HA is msg, then the path to HDFs must use HDFS://MSG instead of hdfs://msg801:9000 (why?). )。
The specific error is:
Illegalargumentexception:wrong Fs:hdfs://msg801:9000/hfileout/fm2/bbab9d883a574d518cdcb304d1e681e9, expected: Hdfs://msg |
Importtsv&bulkload of HBase Data fast Import