A detailed explanation of how Hive data is imported into HBase in 2

Source: Internet
Author: User
Tags null null

I've been asked this question a lot lately, so just write a summary.

There are 2 basic scenarios for importing hive data into HBase:

1. hbase builds a table, and then an external table is built in hive, so that when data is written in Hive, HBase also updates

2. MapReduce reads hive data and then writes (API or Bulkload) to HBase

1. Hive External Table

Create an HBase table

(1) Create a table classes has 1 column family user

Create ' classes ', ' user '

(2) View the structure of the table

HBase (main):005:0> describe ' classes ' DESCRIPTION ENABLED ' classes ', {NAME = ' user ', data_block_encoding = ' NONE ', Bloomfilter = ' ROW ', Replication_scope = ' 0 ', true VERSIONS = ' 1 ', COMPRESSION = ' NONE ', Min_versi ONS = ' 0 ', TTL = ' 2147483647 ', keep_deleted_cells = ' false ', BLOCKSIZE = ' 65536 ', in_memory = ' false ' , Blockcache = ' true '}

(3) Add 2 rows of data

Put ' classes ', ' 001 ', ' user:name ', ' Jack ' put ' classes ', ' 001 ', ' user:age ', ' Put ' classes ', ' 002 ', ' user:name ', ' Liza ' Put ' classes ', ' 002 ', ' user:age ', ' 18 '

(4) View data in classes

HBase (main):016:0> scan ' classes ' ROW Column+cell 001 column=user:age, timestamp=1404980824151, value=20 001 column= User:name, timestamp=1404980772073, Value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 Column=user: Name, timestamp=1404980953897, Value=liza

(5) Create an external hive table, query validation

Create external table classes (id int, name string, age int.) STORED by ' Org.apache.hadoop.hive.hbase.HBaseStorageHandler ' W ITH serdeproperties ("hbase.columns.mapping" = ": Key,user:name,user:age") tblproperties ("hbase.table.name" = "classes ") SELECT * from Classes;ok1 Jack 202 Liza 18

(6) Add data to HBase

Put ' classes ', ' 003 ', ' user:age ', ' 1820183291839132 ' hbase (main):025:0> scan ' classes ' ROW Column+cell 001 column=user : Age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, Value=jack 002 Column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, Value=liza 003 Column=user:age, timestamp=1404981476497, value=1820183291839132

(7) Hive query, see new data

SELECT * FROM CLASSES;OK1 Jack 202 liza 183 null null--This is NULL because 003 does not have a name, so the complement is null, and age is null because the maximum value is exceeded

(8) as a verification

put  ' classes ', ' 004 ', ' user:name ', ' Test ' put  ' classes ', ' 004 ', ' user:age ', ' 1820183291839112312 '    --  is already super int out hbase (main):030:0> scan  ' classes ' row column+cell 001 column= user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp= 1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18  002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age,  timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp= 1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508,  value=test                    select * from classes;1 jack 202 liza 183 null  null4 test null    --  Super int is also thought to be nullput  ' classes ', ' 005 ', ' user:age ', ' 1231342 ' HBase (main): 034:0* scan   ' Classes ' row column+cell 001 column=user:age, timestamp=1404980824151, value=20  001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age,  timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897,  value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132  004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004  column=user:name, timestamp=1404981551508, value=test 005 column=user:age, timestamp= 1404981720600, value=1231342   select * from classes;1 jack 202  liza 183 null null4 test null5 null 1231342


Note the point:

1. Empty cell in HBase will fill null in hive

2. Mismatched fields in hive and hbase will complement null

3, Bytes Type of data, built hive to represent the addition of #b

http://stackoverflow.com/questions/12909118/number-type-value-in-hbase-not-recognized-by-hive

Http://www.aboutyun.com/thread-8023-1-1.html

4. HBase CF to Hive Map

Https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration


2. MapReduce written to HBase

Mr Write to HBase has 2 common methods, 1 is to call the HBase API directly, use table, put write, 2 is to generate hfile by Mr, and then bulkload to HBase, when the amount of data is very large when the recommended use .

Note the point:

1. What if you need to read some values from the hive's path    

2. How to handle map and list in hive

There are 8 main separators in hive, namely \001-----> \008

Default ^a \001, ^b \002: ^c \003

The LIS stored in hive, the lowest data format is Jerrick, Liza, Tom, Jerry , map data format is jerrick:23, liza:18, tom:0

So it needs to be handled easily in Mr Reading, such as the map requires: "{" + mapkey.replace ("\002", ","). Replace ("\003", ":") + "}", which is then converted to JSON, and then saved to HBase after ToString.

3, simple example, the code is a lot of limitations, only reference!

Public void map (longwritable key,text value,mapper<longwritable, text,  Immutablebyteswritable, keyvalue>. Context context)  {String filePathString =  ((Filesplit)  context.getinputsplit ()). GetPath (). toString ();///user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0//   Parsing stat_date  and Softidpattern pattern = pattern.compile (REG); Matcher matcher = pattern.matcher (filepathstring); while (Matcher.find ()) {stat_date =  Matcher.group (1); Softid = matcher.group (2);} Rowmap.put ("Stat_date",  stat_date); Rowmap.put ("Softid",  softid); String[] vals = value.tostring (). Split ("\001");try {configuration conf =  Context.getconfiguration (); String cf = conf.get ("hbase.table.cf",  hbase_table_colume_family); String arow = rowkey; for (int index=10; index < vals.length; index++) {byte[] row = bytes.tobytes (arow); immutablebyteswritable k =  new immutablebyteswritable (row); Keyvalue kv = new keyvalue (); if (index == vals.length-1) {//dict need  Logger.info ("D&NBSP;IS&NBSP;:"  + vals[index]); Logger.info ("D&NBSP;IS&NBSP;:"  +  "{" +vals[ Index].replace ("\002",  ","). Replace ("\003",  ":") + "}"); Jsonobject json = new jsonobject ("{" +vals[index].replace ("\002",  ","). Replace ("\003",   ":") + "}"), Kv = new keyvalue (Row, cf.getbytes (), Bytes.tobytes (Valuekeys[index]),  bytes.tobytes (Json.tostring ()));} Else{kv = new keyvalue (Row, cf.getbytes (), Bytes.tobytes (Valuekeys[index]),  Bytes.tobytes (Vals[index]));} Context.write (K,&NBSP;KV);}}  catch  (EXCEPTION&NBSP;E1)  {context.getcounter ("Offile2hbase",  "Map error"). Increment (1); Logger.info ("Map error:"  + E1.tostring ());} Context.getcounter ("Offile2hbase",  "Map total"). Increment (1);}}

4, Bulkload

int jobresult = (Job.waitforcompletion (true))? 0:1;logger.info ("jobresult=" + jobresult); Boolean bulkloadhfiletohbase = boolean.valueof (Conf.getboolean ("Hbase.table.hfile.bulkload", false)); if (Jobresult = = 0) && (Bulkloadhfiletohbase.booleanvalue ())) {Loadincrementalhfiles loader = new Loadincrementalhfiles (conf ); Loader.dobulkload (OutputDir, htable);}




A detailed explanation of how Hive data is imported into HBase in 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.