Contents of this issue:
- The way receiver starts is conceived
- Receiver Start source thorough analysis
Multiple input source input started, receiver failed to start, as long as our cluster exists in the hope that receiver boot success, running process based on each Teark boot may fail to run.
Starting a different receiver for an application that uses a different RDD partion to represent different receiver, and then starts when different partion execution planes are different teark, and each teark starts with a real start to a receiver.
Pros: This is a simple, simple and ingenious way to use a job on the spark core.
Cons: May fail, this receiver failure during operation will affect execution, the job will fail and the application will fail
Source data input Process source code:
Receiver start-up process source code:
Based on Receiverinputdstreams to get receiver instances,Receiverinputdstreams is from the driver side, a top-level abstraction from spark, spark The streaming job runs as an RDD, and the object represents all input streams, called source objects.
Receiver is a logical level, then distributes them on the worker node, then runs on the physical plane and runs on top of the worker collection.
Loop to receive all data:
Endpoint operation of the data source:
Call Startreceiver:
Note:
-
- Data from: Liaoliang (Spark release version customization)
- Sina Weibo:http://www.weibo.com/ilovepains
Spark Streaming source interpretation of receiver generation full life cycle thorough research and thinking