Importtsv&bulkload of HBase Data fast Import

Source: Internet
Author: User

The fastest way to import data is to skip the Wal direct production of the underlying hfile file

(Environment: centos6.5, Hadoop2.6.0, HBase0.98.9)

1.SHELL mode

1.1 IMPORTTSV Direct Import

Command: Bin/hbase ORG.APACHE.HADOOP.HBASE.MAPREDUCE.IMPORTTSV

Usage:importtsv-dimporttsv.columns=a,b,c <tablename> <inputdir>

Test:

1.1.1 Creating a good table in HBase

Create ' testImport1 ', ' CF '

1.1.2 Prepare the data file Sample1.csv and upload it to HDFs with the content:

1, "Tom"
2, "Sam"
3, "Jerry"
4, "Marry"
5, "John

1.1.3 Importing using the import command

Bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv-dimporttsv.separator= ","-dimporttsv.columns=hbase_row_key , CF Testimport1/sample1.csv

1.1.4 Results

1.2 Production of hfile files through IMPORTTSV, and then via Completebulkload into HBase

1.2.1 Using the source data and creating a new table

Create ' testImport2 ', ' CF '

1.2.2 using commands to produce hfile files

Bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv-dimporttsv.separator= ","-dimporttsv.bulk.output=hfile_tmp -DIMPORTTSV.COLUMNS=HBASE_ROW_KEY,CF Testimport2/sample1.csv

1.2.3 Intermediate results on HDFs

1.2.4 using commands to import hfile files into HBase

Hadoop jar Lib/hbase-server-0.98.9-hadoop2.jar completebulkload hfile_tmp testImport2

1.2.5 Results

Note: 1. If there is a missing packet error message, include the HBase jar package in the classpath of Hadoop; 2. The essence of running this command is an HDFS MV operation and does not start MapReduce.

2.API Code mode

Code is a little more flexible, and many things can be customized.

Directly paste the code bar:

Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsshell;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.keyvalue;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.mapreduce.hfileoutputformat2;import Org.apache.hadoop.hbase.mapreduce.loadincrementalhfiles;import Org.apache.hadoop.hbase.util.bytes;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.input.textinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; Import Org.slf4j.logger;import Org.slf4j.loggerfactory;import Java.io.ioexception;public class BulkLoadJob {static Logger Logger = LoggerfaCtory.getlogger (bulkloadjob.class);p ublic static class Bulkloadmap extends Mapper<longwritable, Text, Immutablebyteswritable, keyvalue> {public void map (longwritable key, Text value, Context context) throws IOException, I nterruptedexception {string[] valuestrsplit = value.tostring (). Split ("\ t"); String hkey = valuestrsplit[0]; String family = Valuestrsplit[1].split (":") [0]; String column = Valuestrsplit[1].split (":") [1]; String Hvalue = valuestrsplit[2];final byte[] RowKey = bytes.tobytes (hkey); final immutablebyteswritable hkey = new Immutab Lebyteswritable (RowKey);//Put hput = new put (rowKey);//byte[] cell = bytes.tobytes (hvalue);//Hput.add (Bytes.tobytes ( Family), bytes.tobytes (column), cell); KeyValue kv = new KeyValue (RowKey, Bytes.tobytes (family), bytes.tobytes (column), Bytes.tobytes (Hvalue)); context.write (HKey, KV);}} public static void Main (string[] args) throws Exception {Configuration conf = hbaseconfiguration.create (); Conf.set (" Hbase.zookeeper.property.clientPort "," 2182 "); CoNf.set ("Hbase.zookeeper.quorum", "msg801,msg802,msg803"), Conf.set ("Hbase.master", "msg801:60000"); string[] Dfsargs = new Genericoptionsparser (conf, args). Getremainingargs (); String InputPath = dfsargs[0]; System.out.println ("Source:" + dfsargs[0]); String OutputPath = dfsargs[1]; System.out.println ("dest:" + dfsargs[1]); Htable htable = null;try {Job Job = job.getinstance (conf, "Test Import hfile & Bulkload"); Job.setjarbyclass (Bulkloadjo B.class); Job.setmapperclass (BulkLoadJob.BulkLoadMap.class); Job.setmapoutputkeyclass ( Immutablebyteswritable.class); Job.setmapoutputvalueclass (keyvalue.class);// Speculationjob.setspeculativeexecution (false); Job.setreducespeculativeexecution (false);//In/out Formatjob.setinputformatclass (Textinputformat.class); Job.setoutputformatclass (Hfileoutputformat2.class); Fileinputformat.setinputpaths (Job, InputPath); Fileoutputformat.setoutputpath (Job, New Path (OutputPath)), htable = new htable (conf, dfsargs[2]); Hfileoutputformat2.configureincrementalload (Job, htableif (Job.waitforcompletion (True)) {Fsshell shell = new Fsshell (conf); try {shell.run (new string[] {"-chmod", "-R", "777", DFSARGS[1]});} catch (Exception e) {logger.error ("Couldnt change the file permissions", e); throw new IOException (e);} Loaded into hbase table loadincrementalhfiles loader = new Loadincrementalhfiles (conf);//two ways can//mode one string[] Loadargs = {OUTPUTPA Th, dfsargs[2]};loader.run (Loadargs);//Mode two//Loader.dobulkload (new Path (OutputPath), htable);} else {logger.error ("Loading failed."); System.exit (1);}} catch (IllegalArgumentException e) {e.printstacktrace ();} finally {if (htable! = null) {Htable.close ();}}}}

  

2.1 Creating a new Table

Create ' testImport3 ', ' fm1 ', ' fm2 '

2.2 Create the Sample2.csv and upload to HDFs with the content:
Key1 fm1:col1 value1
Key1 Fm1:col2 value2
Key1 Fm2:col1 Value3
Key4 Fm1:col1 Value4

Use the command:

Hadoop jar Bulkloadjob.jar hdfs://msg/sample2,csv hdfs://msg/hfileout testImport3

Note: You can use KeyValue and put in 1.mapper; 2. Note The classpath;3 of the jar package. If Hadoop is ha, you need to use Ha's name, such as our active Namenode name, called msg801, But the nameservice of HA is msg, then the path to HDFs must use HDFS://MSG instead of hdfs://msg801:9000 (why?). )。

The specific error is:

Illegalargumentexception:wrong Fs:hdfs://msg801:9000/hfileout/fm2/bbab9d883a574d518cdcb304d1e681e9, expected: Hdfs://msg

Importtsv&bulkload of HBase Data fast Import

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.