Sparkstreaming Source Analysis section from the source angle, describes the streaming execution of the code call process. Below is the process of receiving the conversion phase and then a simple analysis, for the analysis of backpressure preparation.
The whole process of sparkstreaming is divided into two stages: the data receiving transformation phase and the job generation and execution phase. The two phases are linked by the block generated by the data reception transformation phase. is based on the recevier of the data received source conversion part of the Code analysis.
The data reception conversion process can be divided into the following key steps:
Receiver receives an external data stream, which is sent to Blockgenerator to be stored in Arraybuffer, and is licensed before storage (specified by "Spark.streaming.receiver.maxRate"). Spark 1.5 is automatically calculated by Backpressure, which represents the maximum rate at which it can be accessed, each storing one piece of data for a license, and blocking if the license receipt is not acquired.
A timer is defined in the Blockgenerater, and the data in the Arraybuffer is taken out in accordance with the set interval timing, which is packed into block, and store The block in blocksforpushing (Block queue Arrayblockingqueue) and empty the Arraybuffer.
The Blockpushingthread thread in the Blockgenerater removes the block information from the blocking queue and sends the message through the Listener (listener) in Onpushblock way to Receiversupervisor.
Receiversupervisor receives the message, it processes the data carried in the message, it stores the data by calling Blockmanager, and reports the stored result information to Receivertracker
After Receivertracker receives the message, it stores the information in the Unassigned Block queue (Streamidtounallocatedblock) and waits for Jobgenerator to assign it to the RDD when the job is generated.
Spark Streaming Data reception process