Spark Operation HBase

Last Update:2015-11-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HBase after seven years of development, finally at the end of February this year, released the 1.0.0 version. This version offers some exciting features and, without sacrificing stability, introduces a new API. Although 1.0.0 is compatible with older APIs, you should familiarize yourself with the next version of the API as early as possible. and understand how to combine with the current red Spark to write and read data. Given that there is little information at home and abroad about the HBase 1.0.0 new API, pretend this article.

This article will be divided into two parts, the first part explains the use of HBase new API for CRUD basic operations, the second part explains how to write the RDDs within spark into the table of HBase, instead, how the table in HBase is loaded into Spark in RDDs form.

Environment configuration

To avoid unnecessary hassles with version inconsistencies, both the API and the HBase environment are 1.0.0 versions. HBase is a standalone mode, and distributed mode is used in a similar way, with only HBaseConfiguration the modified configuration.

Using SBT load dependencies in the development environment

Name: = "Sparklearn" Version: = "1.0" scalaversion: = "2.10.4" librarydependencies + = "Org.apache.spark" percent "Spark-core"% "1 .3.0 "Librarydependencies + =" org.apache.hbase "%" hbase-client "%" 1.0.0 "librarydependencies + =" Org.apache.hbase "%" Hbase-common "%" 1.0.0 "librarydependencies + =" org.apache.hbase "%" hbase-server "%" 1.0.0 "

CRUD Operations for HBase

The new API was added Connection , HAdmin became, Admin and HTable became Table , Admin and Table can only be Connection obtained through. Connectioncreation is a heavyweight operation, and because it Connection is thread-safe, it is recommended to use a singleton, whose factory method requires one HBaseConfiguration .

Val conf = hbaseconfiguration.create () conf.set ("Hbase.zookeeper.property.clientPort", "2181") Conf.set (" Hbase.zookeeper.quorum "," Master ") the creation of//connection is a heavyweight work, thread safe, is the entry Val conn of the Operation HBase = Connectionfactory.createconnection (CONF)

Create a table

Using Admin Create and delete tables

Val usertable = tablename.valueof ("user")//Create user table Val tabledescr = new Htabledescriptor (usertable) Tabledescr.addfamily (New Hcolumndescriptor ("Basic". GetBytes)) println ("Creating table ' user '.") if (admin.tableexists (usertable)) {  admin.disabletable (usertable)  admin.deletetable (usertable)} Admin.createtable (TABLEDESCR) println ("done!")

Insert, query, scan, delete operations

Operations on HBase need to first create an Action object Put , Get , Delete , and so on, and then call the corresponding method on the Table

try{//Get user table Val table = conn.gettable (usertable) try{//prepare to insert a key for id001 data val p = new Put ("id001". Getby TES)//Specify column and value for the put operation (the previous Put.add method was deprecated) P.addcolumn ("Basic". GetBytes, "name". GetBytes, "Wuchong". GetBytes )//Submit Table.put (P)//query a data val g = new Get ("id001". getBytes) Val result = Table.get (g) val value = Byt Es.tostring (Result.getvalue ("Basic". GetBytes, "name". GetBytes)) println ("GET id001:" +value)//scan data val s = new Sc An () S.addcolumn ("Basic". GetBytes, "name". GetBytes) Val scanner = Table.getscanner (s) try{for (R <-Scanne R) {println ("Found row:" +r) println ("Found Value:" +bytes.tostring (R.getvalue ("Basic". GetBytes, "n  Ame ". getBytes))}}finally {//Make sure scanner close Scanner.close ()}//Delete a piece of data, operate in a similar way as Put val d = new Delete ("id001". GetBytes) D.addcolumn ("Basic". GetBytes, "name". GetBytes) Table.delete (d)}finally {if (table! = N ull) Table.close ()}}finally {conn.close ()}

Spark Operation HBase writes HBase

The first thing to do is write data to HBase, which we need to use PairRDDFunctions.saveAsHadoopDataset . Because HBase is not a file system, the saveAsHadoopFile method is useless.

def saveAsHadoopDataset(conf: JobConf): Unit
Output the RDD to any hadoop-supported storage system, using a Hadoop jobconf object for that storage system

This method requires a jobconf as a parameter, similar to a configuration item, which mainly requires specifying the output format and the output table name.

Step 1: We need to create a jobconf first.

Define the configuration of HBase val conf = hbaseconfiguration.create () conf.set ("Hbase.zookeeper.property.clientPort", "2181") Conf.set ( "Hbase.zookeeper.quorum", "Master")//Specify output format and output table name val jobconf = new jobconf (conf,this.getclass) Jobconf.setoutputformat (Classof[tableoutputformat]) jobconf.set (tableoutputformat.output_table, "user")

Step 2: RDD-to-table schema mapping
The table schema in HBase is generally the case:

row     cf:col_1    cf:col_2

In Spark, we are manipulating the RDD tuple, for example (1,"lilei",14) , (2,"hanmei",18) . We need to RDD[(uid:Int, name:String, age:Int)] convert into RDD[(ImmutableBytesWritable, Put)] . So, we define a convert function to do this conversion work

def convert (triple: (int, String, int)) = {      val p = new Put (bytes.tobytes (triple._1))      P.addcolumn (Bytes.tobytes ( "Basic"), Bytes.tobytes ("name"), Bytes.tobytes (triple._2))      P.addcolumn (Bytes.tobytes ("Basic"), Bytes.tobytes ( "Age"), Bytes.tobytes (Triple._3))      (new immutablebyteswritable, p)}

Step 3: read the RDD and convert

Read RDD data from somewhere and convertval rawdata = List ((1, "Lilei", +), (2, "Hanmei", +), (3, "someone", and)) Val Localda Ta = sc.parallelize (rawdata). Map (convert)

Step 4: saveAsHadoopDataset write to HBase using methods

Localdata.saveashadoopdataset (jobconf)

Read HBase

Spark reads HBase, and we primarily use SparkContext the provided API to load the contents of the newAPIHadoopRDD table into Spark in RDDs.

Val conf =Hbaseconfiguration.create ()
Conf.set ("Hbase.zookeeper.property.clientPort","2181")
Conf.set ("Hbase.zookeeper.quorum","Master")

Set the table name of the query
Conf.set (Tableinputformat.Input_table,"User")

Val Usersrdd = Sc.newapihadooprdd (conf, classof[Tableinputformat],
Classof[org.apache.hadoop.hbase.io.Immutablebyteswritable],
Classof[org.apache.hadoop.hbase.client.Result])

Val count = Usersrdd.count ()
println"Users RDD Count:" + count)
usersrdd.cache () 
  
 //traverse output 
 usersrdd.foreach{case (_,result) => 
  val key = bytes.toint (result.getrow) 
  val name = bytes.tostring (Result.getvalue ( "name". GetBytes) 
  val age = bytes.toint (Result.getvalue ( "age". GetBytes)) 
  println ( "Row key:" +key+ "Age:" +age) 
 }

Spark Operation HBase

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Operation HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Operation HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support