I've been asked this question a lot lately, so just write a summary.
There are 2 basic scenarios for importing hive data into HBase:
1. hbase builds a table, and then an external table is built in hive, so that when data is written in Hive, HBase also updates
2. MapReduce reads hive data and then writes (API or Bulkload) to HBase
1. Hive External Table
Create an HBase table
(1) Create a table classes has 1 column family user
Create ' classes ', ' user '
(2) View the structure of the table
HBase (main):005:0> describe ' classes ' DESCRIPTION ENABLED ' classes ', {NAME = ' user ', data_block_encoding = ' NONE ', Bloomfilter = ' ROW ', Replication_scope = ' 0 ', true VERSIONS = ' 1 ', COMPRESSION = ' NONE ', Min_versi ONS = ' 0 ', TTL = ' 2147483647 ', keep_deleted_cells = ' false ', BLOCKSIZE = ' 65536 ', in_memory = ' false ' , Blockcache = ' true '}
(3) Add 2 rows of data
Put ' classes ', ' 001 ', ' user:name ', ' Jack ' put ' classes ', ' 001 ', ' user:age ', ' Put ' classes ', ' 002 ', ' user:name ', ' Liza ' Put ' classes ', ' 002 ', ' user:age ', ' 18 '
(4) View data in classes
HBase (main):016:0> scan ' classes ' ROW Column+cell 001 column=user:age, timestamp=1404980824151, value=20 001 column= User:name, timestamp=1404980772073, Value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 Column=user: Name, timestamp=1404980953897, Value=liza
(5) Create an external hive table, query validation
Create external table classes (id int, name string, age int.) STORED by ' Org.apache.hadoop.hive.hbase.HBaseStorageHandler ' W ITH serdeproperties ("hbase.columns.mapping" = ": Key,user:name,user:age") tblproperties ("hbase.table.name" = "classes ") SELECT * from Classes;ok1 Jack 202 Liza 18
(6) Add data to HBase
Put ' classes ', ' 003 ', ' user:age ', ' 1820183291839132 ' hbase (main):025:0> scan ' classes ' ROW Column+cell 001 column=user : Age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, Value=jack 002 Column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, Value=liza 003 Column=user:age, timestamp=1404981476497, value=1820183291839132
(7) Hive query, see new data
SELECT * FROM CLASSES;OK1 Jack 202 liza 183 null null--This is NULL because 003 does not have a name, so the complement is null, and age is null because the maximum value is exceeded
(8) as a verification
put ' classes ', ' 004 ', ' user:name ', ' Test ' put ' classes ', ' 004 ', ' user:age ', ' 1820183291839112312 ' -- is already super int out hbase (main):030:0> scan ' classes ' row column+cell 001 column= user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp= 1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp= 1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508, value=test select * from classes;1 jack 202 liza 183 null null4 test null -- Super int is also thought to be nullput ' classes ', ' 005 ', ' user:age ', ' 1231342 ' HBase (main): 034:0* scan ' Classes ' row column+cell 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508, value=test 005 column=user:age, timestamp= 1404981720600, value=1231342 select * from classes;1 jack 202 liza 183 null null4 test null5 null 1231342
Note the point:
1. Empty cell in HBase will fill null in hive
2. Mismatched fields in hive and hbase will complement null
3, Bytes Type of data, built hive to represent the addition of #b
http://stackoverflow.com/questions/12909118/number-type-value-in-hbase-not-recognized-by-hive
Http://www.aboutyun.com/thread-8023-1-1.html
4. HBase CF to Hive Map
Https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
2. MapReduce written to HBase
Mr Write to HBase has 2 common methods, 1 is to call the HBase API directly, use table, put write, 2 is to generate hfile by Mr, and then bulkload to HBase, when the amount of data is very large when the recommended use .
Note the point:
1. What if you need to read some values from the hive's path
2. How to handle map and list in hive
There are 8 main separators in hive, namely \001-----> \008
Default ^a \001, ^b \002: ^c \003
The LIS stored in hive, the lowest data format is Jerrick, Liza, Tom, Jerry , map data format is jerrick:23, liza:18, tom:0
So it needs to be handled easily in Mr Reading, such as the map requires: "{" + mapkey.replace ("\002", ","). Replace ("\003", ":") + "}", which is then converted to JSON, and then saved to HBase after ToString.
3, simple example, the code is a lot of limitations, only reference!
Public void map (longwritable key,text value,mapper<longwritable, text, Immutablebyteswritable, keyvalue>. Context context) {String filePathString = ((Filesplit) context.getinputsplit ()). GetPath (). toString ();///user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0// Parsing stat_date and Softidpattern pattern = pattern.compile (REG); Matcher matcher = pattern.matcher (filepathstring); while (Matcher.find ()) {stat_date = Matcher.group (1); Softid = matcher.group (2);} Rowmap.put ("Stat_date", stat_date); Rowmap.put ("Softid", softid); String[] vals = value.tostring (). Split ("\001");try {configuration conf = Context.getconfiguration (); String cf = conf.get ("hbase.table.cf", hbase_table_colume_family); String arow = rowkey; for (int index=10; index < vals.length; index++) {byte[] row = bytes.tobytes (arow); immutablebyteswritable k = new immutablebyteswritable (row); Keyvalue kv = new keyvalue (); if (index == vals.length-1) {//dict need Logger.info ("D&NBSP;IS&NBSP;:" + vals[index]); Logger.info ("D&NBSP;IS&NBSP;:" + "{" +vals[ Index].replace ("\002", ","). Replace ("\003", ":") + "}"); Jsonobject json = new jsonobject ("{" +vals[index].replace ("\002", ","). Replace ("\003", ":") + "}"), Kv = new keyvalue (Row, cf.getbytes (), Bytes.tobytes (Valuekeys[index]), bytes.tobytes (Json.tostring ()));} Else{kv = new keyvalue (Row, cf.getbytes (), Bytes.tobytes (Valuekeys[index]), Bytes.tobytes (Vals[index]));} Context.write (K,&NBSP;KV);}} catch (EXCEPTION&NBSP;E1) {context.getcounter ("Offile2hbase", "Map error"). Increment (1); Logger.info ("Map error:" + E1.tostring ());} Context.getcounter ("Offile2hbase", "Map total"). Increment (1);}}
4, Bulkload
int jobresult = (Job.waitforcompletion (true))? 0:1;logger.info ("jobresult=" + jobresult); Boolean bulkloadhfiletohbase = boolean.valueof (Conf.getboolean ("Hbase.table.hfile.bulkload", false)); if (Jobresult = = 0) && (Bulkloadhfiletohbase.booleanvalue ())) {Loadincrementalhfiles loader = new Loadincrementalhfiles (conf ); Loader.dobulkload (OutputDir, htable);}
A detailed explanation of how Hive data is imported into HBase in 2