In order to better understand the processing mechanism of the spark streaming sub-framework, you have to figure out the most basic concepts yourself.
1. Discrete stream (discretized stream,dstream): This is the spark streaming's abstract description of the internal continuous real-time data stream, a real-time data stream We're working on, in spark An instance of streaming that corresponds to a dstream.
2, batch data: This is the first step of the piecemeal, the real-time streaming data in batches for the unit, converting stream processing to time slice data batch processing. As the duration progresses, these processing results result in a corresponding result data stream.
3. Time slice or batch processing interval (batch interval): This is a quantitative standard for the flow of data artificially, taking time slices as the basis for splitting data. The data for a time slice corresponds to an RDD instance.
4 window Length : The length of time the stream data is overwritten by a window. Must be a multiple of the batch time interval.
5. Sliding interval : The length of time elapsed between the previous window and the latter window. Must be a multiple of the false interval for batch processing.
6.Input DStream: An input DStream is a special DStream that connects spark streaming to an external data source to read data.
7,ReceiveR : Long Time (possibly 7 X 24 hours) running in executor. Each receiver is responsible for an input DStream (such as a stream that reads Kafka messages). Each receiver, plus dstream, consumes a core/slot.
Spark Streaming Basic Concepts