A detailed explanation of how Hive data is imported into HBase in 2

Last Update:2015-08-27 Source: Internet

Author: User

Tags null null

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I've been asked this question a lot lately, so just write a summary.

There are 2 basic scenarios for importing hive data into HBase:

1. hbase builds a table, and then an external table is built in hive, so that when data is written in Hive, HBase also updates

2. MapReduce reads hive data and then writes (API or Bulkload) to HBase

1. Hive External Table

Create an HBase table

(1) Create a table classes has 1 column family user

Create ' classes ', ' user '

(2) View the structure of the table

HBase (main):005:0> describe ' classes ' DESCRIPTION ENABLED ' classes ', {NAME = ' user ', data_block_encoding = ' NONE ', Bloomfilter = ' ROW ', Replication_scope = ' 0 ', true VERSIONS = ' 1 ', COMPRESSION = ' NONE ', Min_versi ONS = ' 0 ', TTL = ' 2147483647 ', keep_deleted_cells = ' false ', BLOCKSIZE = ' 65536 ', in_memory = ' false ' , Blockcache = ' true '}

(3) Add 2 rows of data

Put ' classes ', ' 001 ', ' user:name ', ' Jack ' put ' classes ', ' 001 ', ' user:age ', ' Put ' classes ', ' 002 ', ' user:name ', ' Liza ' Put ' classes ', ' 002 ', ' user:age ', ' 18 '

(4) View data in classes

HBase (main):016:0> scan ' classes ' ROW Column+cell 001 column=user:age, timestamp=1404980824151, value=20 001 column= User:name, timestamp=1404980772073, Value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 Column=user: Name, timestamp=1404980953897, Value=liza

(5) Create an external hive table, query validation

Create external table classes (id int, name string, age int.) STORED by ' Org.apache.hadoop.hive.hbase.HBaseStorageHandler ' W ITH serdeproperties ("hbase.columns.mapping" = ": Key,user:name,user:age") tblproperties ("hbase.table.name" = "classes ") SELECT * from Classes;ok1 Jack 202 Liza 18

(6) Add data to HBase

Put ' classes ', ' 003 ', ' user:age ', ' 1820183291839132 ' hbase (main):025:0> scan ' classes ' ROW Column+cell 001 column=user : Age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, Value=jack 002 Column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, Value=liza 003 Column=user:age, timestamp=1404981476497, value=1820183291839132

(7) Hive query, see new data

SELECT * FROM CLASSES;OK1 Jack 202 liza 183 null null--This is NULL because 003 does not have a name, so the complement is null, and age is null because the maximum value is exceeded

(8) as a verification

put  ' classes ', ' 004 ', ' user:name ', ' Test ' put  ' classes ', ' 004 ', ' user:age ', ' 1820183291839112312 '    --  is already super int out hbase (main):030:0> scan  ' classes ' row column+cell 001 column= user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp= 1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18  002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age,  timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp= 1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508,  value=test                    select * from classes;1 jack 202 liza 183 null  null4 test null    --  Super int is also thought to be nullput  ' classes ', ' 005 ', ' user:age ', ' 1231342 ' HBase (main): 034:0* scan   ' Classes ' row column+cell 001 column=user:age, timestamp=1404980824151, value=20  001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age,  timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897,  value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132  004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004  column=user:name, timestamp=1404981551508, value=test 005 column=user:age, timestamp= 1404981720600, value=1231342   select * from classes;1 jack 202  liza 183 null null4 test null5 null 1231342

Note the point:

1. Empty cell in HBase will fill null in hive

2. Mismatched fields in hive and hbase will complement null

3, Bytes Type of data, built hive to represent the addition of #b

http://stackoverflow.com/questions/12909118/number-type-value-in-hbase-not-recognized-by-hive

Http://www.aboutyun.com/thread-8023-1-1.html

4. HBase CF to Hive Map

Https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

2. MapReduce written to HBase

Mr Write to HBase has 2 common methods, 1 is to call the HBase API directly, use table, put write, 2 is to generate hfile by Mr, and then bulkload to HBase, when the amount of data is very large when the recommended use .

Note the point:

1. What if you need to read some values from the hive's path

2. How to handle map and list in hive

There are 8 main separators in hive, namely \001-----> \008

Default ^a \001, ^b \002: ^c \003

The LIS stored in hive, the lowest data format is Jerrick, Liza, Tom, Jerry , map data format is jerrick:23, liza:18, tom:0

So it needs to be handled easily in Mr Reading, such as the map requires: "{" + mapkey.replace ("\002", ","). Replace ("\003", ":") + "}", which is then converted to JSON, and then saved to HBase after ToString.

3, simple example, the code is a lot of limitations, only reference!

Public void map (longwritable key,text value,mapper<longwritable, text,  Immutablebyteswritable, keyvalue>. Context context)  {String filePathString =  ((Filesplit)  context.getinputsplit ()). GetPath (). toString ();///user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0//   Parsing stat_date  and Softidpattern pattern = pattern.compile (REG); Matcher matcher = pattern.matcher (filepathstring); while (Matcher.find ()) {stat_date =  Matcher.group (1); Softid = matcher.group (2);} Rowmap.put ("Stat_date",  stat_date); Rowmap.put ("Softid",  softid); String[] vals = value.tostring (). Split ("\001");try {configuration conf =  Context.getconfiguration (); String cf = conf.get ("hbase.table.cf",  hbase_table_colume_family); String arow = rowkey; for (int index=10; index < vals.length; index++) {byte[] row = bytes.tobytes (arow); immutablebyteswritable k =  new immutablebyteswritable (row); Keyvalue kv = new keyvalue (); if (index == vals.length-1) {//dict need  Logger.info ("D&NBSP;IS&NBSP;:"  + vals[index]); Logger.info ("D&NBSP;IS&NBSP;:"  +  "{" +vals[ Index].replace ("\002",  ","). Replace ("\003",  ":") + "}"); Jsonobject json = new jsonobject ("{" +vals[index].replace ("\002",  ","). Replace ("\003",   ":") + "}"), Kv = new keyvalue (Row, cf.getbytes (), Bytes.tobytes (Valuekeys[index]),  bytes.tobytes (Json.tostring ()));} Else{kv = new keyvalue (Row, cf.getbytes (), Bytes.tobytes (Valuekeys[index]),  Bytes.tobytes (Vals[index]));} Context.write (K,&NBSP;KV);}}  catch  (EXCEPTION&NBSP;E1)  {context.getcounter ("Offile2hbase",  "Map error"). Increment (1); Logger.info ("Map error:"  + E1.tostring ());} Context.getcounter ("Offile2hbase",  "Map total"). Increment (1);}}

4, Bulkload

int jobresult = (Job.waitforcompletion (true))? 0:1;logger.info ("jobresult=" + jobresult); Boolean bulkloadhfiletohbase = boolean.valueof (Conf.getboolean ("Hbase.table.hfile.bulkload", false)); if (Jobresult = = 0) && (Bulkloadhfiletohbase.booleanvalue ())) {Loadincrementalhfiles loader = new Loadincrementalhfiles (conf ); Loader.dobulkload (OutputDir, htable);}

A detailed explanation of how Hive data is imported into HBase in 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More