The use of sparkstreaming operator Reducebykeyandwindow

Source: Internet
Author: User
Keywords Spark operator Sparkstreaming reducebykeyandwindow
Tags apache data example function import it is official official website

Reducebykeyandwindow This operator is also lazy, which is used to compute the data in an interval, as shown in the following figure:

Screenshots from the official website, for example, each square represents 5 seconds, above the dotted box is 3 windows is 15 seconds, where 15 seconds is the length of the window, where the dotted line to the solid lines moved 2 squares for 10 seconds, here 10 seconds for every 10 seconds to calculate the length of the window data

For example: The following figure

I understand this: if this is computed using the window function wordcount in the first window (dashed window) computed (AA, 1) (bb,3) (cc,1) When the arrival time 10 seconds after the window moved to the Solid line window, it will calculate the word in the solid window, here is (bb,1) ( cc,2) (aa,1)

Attached program:

Note: Window slide length and window length must be sparkstreaming of the batch processing time, otherwise it will be an error.

Package Cn.lijie.kafka Import cn.lijie.MyLog import org.apache.log4j.Level import org.apache.spark.streaming. {Seconds, StreamingContext} import Org.apache.spark. {Hashpartitioner, sparkconf, sparkcontext}/** * User:lijie * DATE:2017/8/8 * time:14:04/object Sparkwindowdemo {val MyFunc = (it:iterator[(String, Seq[int], Option[int])] => { (x => {(x._1, x._2.sum + x._3.getorelse (0)})} def Main (args:array[string]): unit = {mylog.setlogleavel (level.warn) Val conf = new sparkconf (). Setmaster ("local[2]"). Setappname ("Window") Val sc = new Sparkcontext (conf) val SSC = new StreamingContext (SC, Seconds (2)) Sc.setcheckpointdir ("C : \\Users\\Administrator\\Desktop\\myck01 ") val ds = Ssc.sockettextstream (" ", 9999)//seconds (5) indicates the width of the window Seconds (3) indicates how often to slide (the length of the slide) Val re = Ds.flatmap (_.split ("")). Map ((_, 1). Reducebykeyandwindow (A:int, B:int) => A + b , Seconds (), Seconds (10))//window length and sliding length are the same, so similar to each calculation of their own batch of data, with Updatestatebykey can also calculate the cumulative number of words wordcount Here's just an experiment.val re = Ds.flatmap (_.split ("")). Map ((_, 1)). Reducebykeyandwindow ((A:int, B:int) => A + B, Seconds (4), Seconds (4)). Updatestatebykey (MyFunc, New Hashpartitioner (sc.defaultparallelism), True) Re.print () Ssc.start () Ssc.awaittermination ()}
Related Article

Beyond APAC's No.1 Cloud

19.6% IaaS Market Share in Asia Pacific - Gartner IT Service report, 2018

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.