Development Series: 03. Spark streaming custom Receivers)

Source: Internet
Author: User

Spark streaming can receive streaming data from any arbitrary data source beyond the one's for which it has in-built support (that is, beyond flume, Kafka, files, sockets, etc .). this requires the developer to implementCyclerThat is customized for processing data from the concerned data source. This Guide walks through the process of implementing a custom processing er and using it in a spark streaming application.

Spark streaming can collect streaming data from any data source, not only built-in support (such as flume, Kafka, files, socket, etc) this requires developers to implement custom receivers that receive data from the relevant data source. This Guide describes how to implement a custom receiver and use it in Spark applications. Implementing a custom Receiver

This starts with implementing a receiver. A custom extends er must extend this abstract class by implementing two methods-onStart(): Things to do to start processing ing data .-onStop(): Things to do to stop processing ing data.

Implement a custom Receiver

This starts from implementing a javaser class. A custom receiver must inherit this abstract class and implement two methods: onstart (): some operations to start receiving data; onstop (): some operations to stop receiving data.

Note thatonStart()AndonStop()Must not block indefinitely. Typically, onstart () wocould start the threads that responsible for processing the data andonStop()Wocould ensure that the processing ing by those threads are stopped. The processing ing threads can also useisStopped(),Receiver Method, to check whether they shocould stop refreshing data.

Note that onstart () and onstop () cannot be blocked indefinitely. Generally, onstart () will start a thread to receive data; onstop () will ensure that these threads are stopped. The receiving thread can also use an isstopped (), assumer class method to check whether they should stop receiving data.

Once the data is stored Ed, that data can be stored inside spark by callingstore(data), Which is a method provided by the Explorer class. There are number of flavorsstore()Which allow you store the stored ed data record-at-a-time or as whole collection of objects/serialized bytes.

Once the data is received, you can call the store (data) method in spark to save the data. This method is provided by the Explorer class. The store () method has a generic parameter that allows each time a piece of data is stored in a container that contains an object/serialized byte array.

Any exception in the processing ing threads shoshould be caught and handled properly to avoid silent failures of the specified er.restart(<exception>)Will restart the caller by asynchronously callingonStop()And then callingonStart()After a delay.stop(<exception>)Will callonStop()And terminate the aggreger. Also,reportError(<error>)Reports A error message to the driver (visible in the logs and UI) without stopping/restarting the login er.

Any exceptions in the receiving thread should be handled correctly to avoid silent failures of the receiver. Restart (<exception>) restarts the receiver by asynchronously calling the onstop () method and calling onstart () after a certain delay (). The stop (<exception>) method calls onstop () and terminates the receiver. Similarly, the reporterror (<error>) method reports an error message to the driver (which can be seen in log and UI) without stopping or restarting the receiver.

The following is a custom processing er that has es a stream of text over a socket. it treats '\ n' delimited lines in the text stream as records and stores them with spark. if the processing ing thread has any error connecting or processing ing, the specified er is restarted to make another attempt to connect.

The following is a custom receiver that receives text through socket. It treats the lines separated by '\ n' in the text stream as records and stores them in spark. If a wrong connection or receiving error occurs during the receiving process, the receiver tries another connection.

Scala

 

class CustomReceiver(host: String, port: Int)  extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {  def onStart() {    // Start the thread that receives data over a connection    new Thread("Socket Receiver") {      override def run() { receive() }    }.start()  }  def onStop() {   // There is nothing much to do as the thread calling receive()   // is designed to stop by itself isStopped() returns false  }  /** Create a socket connection and receive data until receiver is stopped */  private def receive() {    var socket: Socket = null    var userInput: String = null    try {     // Connect to host:port     socket = new Socket(host, port)     // Until stopped or connection broken continue reading     val reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF-8"))     userInput = reader.readLine()     while(!isStopped && userInput != null) {       store(userInput)       userInput = reader.readLine()     }     reader.close()     socket.close()     // Restart in an attempt to connect again when server is active again     restart("Trying to connect again")    } catch {     case e: java.net.ConnectException =>       // restart if could not connect to server       restart("Error connecting to " + host + ":" + port, e)     case t: Throwable =>       // restart if there is any other error       restart("Error receiving data", t)    }  }}

 

 

Using the custom consumer er in a spark streaming application

The custom consumer er can be used in a spark streaming application by usingstreamingContext.receiverStream(<instance of custom receiver>). This will create input dstream using data stored ed by the instance of custom marsher, as shown below

 

Use custom receivers in Spark streaming applications

You can use streamingcontext. receiverstream (<custom receiver instance>) to use a custom receiver. This will create the input dstream using the data received by the custom receiver, as shown below

Scala

 

// Assuming ssc is the StreamingContextval customReceiverStream = ssc.receiverStream(new CustomReceiver(host, port))val words = lines.flatMap(_.split(" "))...

 

The full source code is in the example customer er. Scala.

The complete source code for this example is in customreceiver. Scala.

 

Implementing and using a custom actor-based Receiver

Custom akka actors can also be used to receive data.ActorHelperTrait can be applied on any akka actor, which allows stored ed data to be stored in Spark usingstore(...)Methods. The supervisor strategy of this actor can be configured to handle failures, etc.

 

Implement and use actor-based custom Receivers

You can also customize the actor of akka to receive data.ActorHelperTrait can be applied to all data received by akka actor and stored in Spark using store. Corresponding policies can be configured for the actor to handle faults.

 

class CustomActor extends Actor with ActorHelper {  def receive = {   case data: String => store(data)  }}

 

And a new input stream can be created with this custom actor

You can use this custom actor to create a new input stream.

 

// Assuming ssc is the StreamingContextval lines = ssc.actorStream[String](Props(new CustomActor()), "CustomReceiver")

 

See actorwordcount. Scala for an end-to-end example.

 

Development Series: 03. Spark streaming custom Receivers)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.