There have also been recent studies using spark streaming for streaming. This article is a simple example of how to do spark streaming programming with the flow-based count of word counts.
1. Dependent jar Packages
Refer to the article "Using Eclipse and idea to build the Scala+spark development environment," which specifies a dependency library Spark-streaming_2.10.jar in Pom.xml.
<Dependency> <groupId>Org.scala-lang</groupId> <Artifactid>Scala-library</Artifactid> <version>${scala.version}</version> </Dependency> <!--Spark - <Dependency> <groupId>Org.apache.spark</groupId> <Artifactid>spark-core_2.10</Artifactid> <version>1.1.0</version> </Dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactid>spark-streaming_2.10< ;/artifactid> <version>1.1.0</version> </dependency> <!--HDFS - <Dependency> <groupId>Org.apache.hadoop</groupId> <Artifactid>Hadoop-client</Artifactid> <version>2.6.0</version> </Dependency> <Dependency> <groupId>Junit</groupId> <Artifactid>Junit</Artifactid> <version>4.4</version> <Scope>Test</Scope> </Dependency> </Dependency>
2. WordCount code Example
Listen for the socket port, count the number of words received every 5 seconds, and output the text to the screen.
Importorg.apache.spark.SparkConfImportOrg.apache.spark.storage.StorageLevelImportorg.apache.spark.streaming. {Seconds, StreamingContext}Importorg.apache.spark.streaming.StreamingContext.toPairDStreamFunctions/*** Spark Streaming example, count the number of occurrences of all words in the input **/Object Streamingwordcount {def main (args:array[string]) {if(Args.length < 2) {System.err.println ("Usage:networkwordcount ) System.exit (1) } //Create the context with a 5 second batch sizeVal sparkconf =NewSparkconf (). Setappname ("Networkwordcount") Val SSC= New StreamingContext (sparkconf, Seconds (5)) //Create a socket stream on target Ip:port and Count the//words in input stream of \ n delimited textVal lines = ssc.sockettextstream (args (0), args (1). ToInt, Storagelevel.memory_and_disk_ser) val words= Lines.flatmap (_.split ("")) Val wordcounts= Words.map (x = (x, 1)). Reducebykey (_ + _)Wordcounts.print () Ssc.start () Ssc.awaittermination ()}}
3. Submitting Tasks and monitoring clusters
Sockettextstream is the socket port from which the service is monitored.
(1) How the job is submitted:
$SPARK _home/bin/spark-submit--name streamingdemo--class streamingwordcount./sparktest-1.0-snapshot.jar localhost 1234
(2) Monitor socket port:
NC-LK 1234
Spark Streaming Programming Example