Learn About Spark Streaming

Source: Internet
Author: User
Keywords spark spark streaming spark streaming introduction
Spark Streaming is a micro-batch stream processing framework built on Spark. HBase and Spark Streaming are a good partner, because HBase can provide the following benefits together with Spark Streaming:

Where to get reference data or configuration file data instantly
Store counted or aggregated locations in a manner that supports Spark Streaming commitments that only process once.
The integration point of the HBase-Spark module and Spark Streaming is similar to its conventional Spark integration point, because the following commands can be implemented directly through Spark Streaming DStream.

bulkPut
Used to send put to HBase massively in parallel

bulkDelete
Used to send delete to HBase massively in parallel

bulkGet
Used for mass parallel sending of get to HBase to create a new RDD

mapPartition
Use the Connection object to execute Spark Map functions to allow full access to HBase

hBaseRDD
Simplify distributed scanning to create RDD

Example bulkPut with DStream

The following is an example of bulkPut using DStreams. The RDD batch placement feels very close.

val sc = new SparkContext("local", "test")
val config = new HBaseConfiguration()

val hbaseContext = new HBaseContext(sc, config)
val ssc = new StreamingContext(sc, Milliseconds(200))

val rdd1 = ...
val rdd2 = ...

val queue = mutable.Queue[RDD[(Array[Byte], Array[(Array[Byte],
    Array[Byte], Array[Byte])])]]()

queue += rdd1
queue += rdd2

val dStream = ssc.queueStream(queue)

dStream.hbaseBulkPut(
  hbaseContext,
  TableName.valueOf(tableName),
  (putRecord) => {
   val put = new Put(putRecord._1)
   putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, putValue._3))
   put
  })
The hbaseBulkPut function has three inputs: hbaseContext with boardboard configuration information links us to the HBase Connections in the executive, the table name of the table where we put the data, and the function that converts the records in the DStream into HBase Put objects.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.