Syn Good son source: Http://www.cnblogs.com/cssdongl reprint please indicate the source
We all know that when using MapReduce or spark to write a table in a known hbase, declare the following code directly in the driver class of MapReduce or spark
Job.getconfiguration (). Set (tableoutputformat.output_table, tablename);
The MapReduce is then written directly to the context in mapper or reducer, and spark is constructed to contain the pairrddfunctions of the put and Saveashadoopdataset.
And often encounter some requirements are based on input data, after processing need to write hbase more than one table or table name is unknown, you need to follow a field in the data to construct the table name written to HBase.
Because the table name is unknown, so the tableoutputformat.output_table can not be set, then this requirement is easy to implement, respectively summarizing the implementation of MapReduce and spark (in fact, will find the same end)
I. MapReduce writes hbase multiple tables
Add the following code to the main method of the MR
Job.setoutputformatclass (Multitableoutputformat. Class);
You can then construct the table name and put in the context of the mapper or reducer to write multiple hbase tables based on the related fields.
Two. Spark writes hbase multiple tables
Here, I write a number of hbase tables directly with the Spark streaming program I tested, on the code
Object Sparkstreamingwritetohbase {def main (args:array[string]): Unit={var masterurl= "Yarn-client"if(Args.length > 0) {MasterUrl= Args (0)} Val conf=NewSparkconf (). Setappname ("Write to several tables of Hbase"). Setmaster (MasterUrl) Val SSC=NewStreamingContext (Conf, Seconds (5)) Val topics= Set ("app_events") Val Brokers= Propertiesutil.getvalue ("broker_address") Val Kafkaparams=map[string, String] ("Metadata.broker.list", Brokers, "Serializer.class", "Kafka.serializer.StringEncoder") Val Hbasetablesuffix= "_clickcounts"Val hconf=hbaseconfiguration.create () Val Zookeeper= Propertiesutil.getvalue ("zookeeper_address") Hconf.set (Hconstants.zookeeper_quorum, ZOOKEEPER) Val jobconf=NewJobconf (hconf, This. GetClass) Val Kafkadstreams=kafkautils.createdirectstream[string, String, Stringdecoder, Stringdecoder] (SSC, Kafkaparams, topics) Val Appuser Clicks= Kafkadstreams.flatmap (Rdd ={val Data=Jsonobject.fromobject (rdd._2) Some (data)}). Map{jsonline=Val Key= Jsonline.getstring ("appId") + "_" + jsonline.getstring ("UID")) Val Value= Jsonline.getstring ("Click_count") (Key, value)} val result= Appuserclicks.map {item =Val RowKey=item._1 Val Value=item._2 converttohbaseput (RowKey, Value, Hbasetablesuffix)} result.foreachrdd {Rdd=Rdd.saveasnewapihadoopfile ("", classof[immutablebyteswritable], Classof[put], Classof[multitableoutputformat], jobconf)} Ssc.start () Ssc.awaittermination ()} def converttohbaseput (key:string, Value:string, tablenamesuffix:string): (Imm Utablebyteswritable, Put)={val RowKey=key Val TableName= Rowkey.split ("_") (0) +Tablenamesuffix Val put=NewPut (Bytes.tobytes (RowKey)) Put.addcolumn (Bytes.tobytes ("Info"), Bytes.tobytes ("Count"), bytes.tobytes (value)) (Newimmutablebyteswritable (Bytes.tobytes (tableName)), put)}}
Simply described, here in spark streaming is the JSON data read from Kafka, where the AppID field is used to construct TableName to distinguish between different hbase Table. Finally write the RDD to the HBase table with Saveasnewapihadoopfile
Entering the saveasnewapihadoopfile will find nothing different from the MapReduce configuration, as follows
def saveasnewapihadoopfile (path:string, Keyclass:class[_], Valueclass: Class[_], outputformatclass:class[_ <: Newoutputformat[_, _], Conf:co Nfiguration = self.context.hadoopConfiguration) { // Rename this as hadoopconf internally to avoid shadowing (see SPARK-2038). val hadoopconf = conf val job = new Span style= "COLOR: #000000" > Newapihadoopjob (hadoopconf) job.setoutputkeyclass (keyclass) job.setoutputvalueclass (VA Lueclass) Job.setoutputformatclass (outputformatclass) job.getConfiguration.set ( "Mapred.output.dir"
The parameters of this method are ouput path, where it is written to HBase, passed in as empty, and other parameters outputkeyclass,outputvalueclass,outputformatclass,jobconf
The outputformatclass here ensures that it must be multitableoutputformat to ensure that multiple tables are written, yes, here's a point to make sure that the HBase table you're writing to is first create.
MapReduce and Spark write hbase Multi-table summary