Spark Streaming source interpretation of state management Updatastatebykey and Mapwithstate decryption

Source: Internet
Author: User

Contents of this issue:

    • Updatestatebykey decryption
    • Mapwithstate decryption

  

  Spark Streaming is a state-management factor:

01, Spark streaming is in accordance with the entire Bachduration division job, each bachduration will produce a job, in order to meet the needs of business operations,

need to calculate data for the last one hours or a week, However, because the amount of data is greater than bachduration, it is unavoidable to maintain the state at this time

02, Spark's state management actually has many functions, Compare typical Updatestatebykey, mapwithstate methods to complete the core steps

  

First, Updatestatebykey:

Update the status in the existing historical data, depending on the Updatefunc function and return a dsteam type

  

  

  

  Eventually, using Dsteam, the data is constantly generated.

    

  Process of generating Rdd, calculation method

    

  For incoming data, a collection of all data by K:

Pros: Each time you need to calculate the RDD, you really need to calculate the RDD, the Rdd how to calculate, it is Cogroup

Cons: Performance issues, because all the data needs to be scanned every time, eventually become Cogroupedrdd, as the amount of data increases the speed of the more slowly

  

  

Second, Mapwithstate:

when the Dstreams is returned, the status update and maintenance history State is based on K, and the function of the update, the time-out, the initial state, etc. are obtained by STATESPEC (which encapsulates the update function) .

Update, delete, The equivalent of recording in a table, which key in the table is manipulated using historical data, State is the table name or index, gets, updates data, maintains status.

  

  

  

  

 All partition are represented by Mapwithstaterddrecord, the data structure is statemap, and the maintenance is based on the state of K

  

  

  

  

    Note:

      • Data from: Liaoliang (Spark release version customization)
      • Sina Weibo:http://www.weibo.com/ilovepains

Spark Streaming source interpretation of state management Updatastatebykey and Mapwithstate decryption

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.