Spark Streaming: Key Abstraction and Entrance

Source: Internet
Author: User
Keywords spark spark streaming spark streaming entrance

The key abstraction of Spark Streaming

There is a system dedicated to conversion operations, and we call it DStream. We will first process the incoming data according to time, that is, RDD@time1 RDD@time2 RDD@time3…

But it must be known that these few things do not coexist, because every time a rdd with a time segmentation comes in, the paragraphs that have been processed before have been divided into the conversion operation.
Then, the so-called stream conversion operation is explained. For wordcount, the previously partitioned data becomes a row by row, and then the flatMap operation is performed according to the time partitioned row. After the conversion is completed, the words are formed DStream, but in fact still operate on rdd, compared with our previous wordcount, this is a multiple conversion behavior



Spark Streaming back pressure mechanism
When the speed of received data is greater than the speed of data processing, the backlog of rdd will trigger this mechanism

If the previous data is generated too quickly, there will be a bucket, and a token will be generated in the bucket. Only the data source with the token can be encapsulated as rdd, as long as we control the generation rate of the token, it can be eased Drop this problem, because he managed to control the generation rate of rdd@time, and when a token cannot be obtained, it will form a blocking state and wait for the token to be generated.

The entrance to Spark Streaming
StreamingContext
val conf = new SparkConf().setMaster(master).setAppName(appName);
val ssc = new StreamingContext(conf,Second(1));
//You can access SparkContext through ssc.sparkContext

//Or directly create StreamingContext through sparkContext
var ssc = new StreamingContext(new SparkContext(), Second(1));

After initializing Context:
1. Define the message input source to create DStreams.
2. Define the conversion operation and output operation of DStreams.
3. Start streaming message collection and processing through streamingContext.start()
4. Wait for the program to terminate, can be set by streamingContext.awaitTermination()
5. Stop the program manually by StreamingContext.stop()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.