Importtsv&bulkload of HBase Data fast Import

Last Update:2016-04-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The fastest way to import data is to skip the Wal direct production of the underlying hfile file

(Environment: centos6.5, Hadoop2.6.0, HBase0.98.9)

1.SHELL mode

1.1 IMPORTTSV Direct Import

Command: Bin/hbase ORG.APACHE.HADOOP.HBASE.MAPREDUCE.IMPORTTSV

Usage:importtsv-dimporttsv.columns=a,b,c <tablename> <inputdir>

Test:

1.1.1 Creating a good table in HBase

Create ' testImport1 ', ' CF '

1.1.2 Prepare the data file Sample1.csv and upload it to HDFs with the content:

1, "Tom"
2, "Sam"
3, "Jerry"
4, "Marry"
5, "John

1.1.3 Importing using the import command

Bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv-dimporttsv.separator= ","-dimporttsv.columns=hbase_row_key , CF Testimport1/sample1.csv

1.1.4 Results

1.2 Production of hfile files through IMPORTTSV, and then via Completebulkload into HBase

1.2.1 Using the source data and creating a new table

Create ' testImport2 ', ' CF '

1.2.2 using commands to produce hfile files

Bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv-dimporttsv.separator= ","-dimporttsv.bulk.output=hfile_tmp -DIMPORTTSV.COLUMNS=HBASE_ROW_KEY,CF Testimport2/sample1.csv

1.2.3 Intermediate results on HDFs

1.2.4 using commands to import hfile files into HBase

Hadoop jar Lib/hbase-server-0.98.9-hadoop2.jar completebulkload hfile_tmp testImport2

1.2.5 Results

Note: 1. If there is a missing packet error message, include the HBase jar package in the classpath of Hadoop; 2. The essence of running this command is an HDFS MV operation and does not start MapReduce.

2.API Code mode

Code is a little more flexible, and many things can be customized.

Directly paste the code bar:

Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsshell;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.keyvalue;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.mapreduce.hfileoutputformat2;import Org.apache.hadoop.hbase.mapreduce.loadincrementalhfiles;import Org.apache.hadoop.hbase.util.bytes;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.input.textinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; Import Org.slf4j.logger;import Org.slf4j.loggerfactory;import Java.io.ioexception;public class BulkLoadJob {static Logger Logger = LoggerfaCtory.getlogger (bulkloadjob.class);p ublic static class Bulkloadmap extends Mapper<longwritable, Text, Immutablebyteswritable, keyvalue> {public void map (longwritable key, Text value, Context context) throws IOException, I nterruptedexception {string[] valuestrsplit = value.tostring (). Split ("\ t"); String hkey = valuestrsplit[0]; String family = Valuestrsplit[1].split (":") [0]; String column = Valuestrsplit[1].split (":") [1]; String Hvalue = valuestrsplit[2];final byte[] RowKey = bytes.tobytes (hkey); final immutablebyteswritable hkey = new Immutab Lebyteswritable (RowKey);//Put hput = new put (rowKey);//byte[] cell = bytes.tobytes (hvalue);//Hput.add (Bytes.tobytes ( Family), bytes.tobytes (column), cell); KeyValue kv = new KeyValue (RowKey, Bytes.tobytes (family), bytes.tobytes (column), Bytes.tobytes (Hvalue)); context.write (HKey, KV);}} public static void Main (string[] args) throws Exception {Configuration conf = hbaseconfiguration.create (); Conf.set (" Hbase.zookeeper.property.clientPort "," 2182 "); CoNf.set ("Hbase.zookeeper.quorum", "msg801,msg802,msg803"), Conf.set ("Hbase.master", "msg801:60000"); string[] Dfsargs = new Genericoptionsparser (conf, args). Getremainingargs (); String InputPath = dfsargs[0]; System.out.println ("Source:" + dfsargs[0]); String OutputPath = dfsargs[1]; System.out.println ("dest:" + dfsargs[1]); Htable htable = null;try {Job Job = job.getinstance (conf, "Test Import hfile & Bulkload"); Job.setjarbyclass (Bulkloadjo B.class); Job.setmapperclass (BulkLoadJob.BulkLoadMap.class); Job.setmapoutputkeyclass ( Immutablebyteswritable.class); Job.setmapoutputvalueclass (keyvalue.class);// Speculationjob.setspeculativeexecution (false); Job.setreducespeculativeexecution (false);//In/out Formatjob.setinputformatclass (Textinputformat.class); Job.setoutputformatclass (Hfileoutputformat2.class); Fileinputformat.setinputpaths (Job, InputPath); Fileoutputformat.setoutputpath (Job, New Path (OutputPath)), htable = new htable (conf, dfsargs[2]); Hfileoutputformat2.configureincrementalload (Job, htableif (Job.waitforcompletion (True)) {Fsshell shell = new Fsshell (conf); try {shell.run (new string[] {"-chmod", "-R", "777", DFSARGS[1]});} catch (Exception e) {logger.error ("Couldnt change the file permissions", e); throw new IOException (e);} Loaded into hbase table loadincrementalhfiles loader = new Loadincrementalhfiles (conf);//two ways can//mode one string[] Loadargs = {OUTPUTPA Th, dfsargs[2]};loader.run (Loadargs);//Mode two//Loader.dobulkload (new Path (OutputPath), htable);} else {logger.error ("Loading failed."); System.exit (1);}} catch (IllegalArgumentException e) {e.printstacktrace ();} finally {if (htable! = null) {Htable.close ();}}}}

2.1 Creating a new Table

Create ' testImport3 ', ' fm1 ', ' fm2 '

2.2 Create the Sample2.csv and upload to HDFs with the content:
Key1 fm1:col1 value1
Key1 Fm1:col2 value2
Key1 Fm2:col1 Value3
Key4 Fm1:col1 Value4

Use the command:

Hadoop jar Bulkloadjob.jar hdfs://msg/sample2,csv hdfs://msg/hfileout testImport3

Note: You can use KeyValue and put in 1.mapper; 2. Note The classpath;3 of the jar package. If Hadoop is ha, you need to use Ha's name, such as our active Namenode name, called msg801, But the nameservice of HA is msg, then the path to HDFs must use HDFS://MSG instead of hdfs://msg801:9000 (why?). ）。

The specific error is:

Illegalargumentexception:wrong Fs:hdfs://msg801:9000/hfileout/fm2/bbab9d883a574d518cdcb304d1e681e9, expected: Hdfs://msg

Importtsv&bulkload of HBase Data fast Import

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Importtsv&bulkload of HBase Data fast Import

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Importtsv&bulkload of HBase Data fast Import

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support