Spark live stream Compute Java case

Last Update:2016-06-12 Source: Internet

Author: User

Tags iterable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Now, the online spark-based code is basically Scala, a lot of books are based on Scala, no way, who called Spark is Scala written out, but I do not have a systematic study of Scala, so I can only use Java to write Spark program, Spark supports Java, and Scala is based on the JVM, not to mention, directly on the code

This is the official online example, the Big Data Learning Classic case of Word count
In Linux next terminal enter $ nc-lk 9999

Then run the following code

Packagecom. TG. Spark. Stream;Import Java. Util. Arrays;import org. Apache. Spark.*;import org. Apache. Spark. API. Java. function.*;import org. Apache. Spark. Streaming.*;import org. Apache. Spark. Streaming. API. Java.*;Import Scala. Tuple2;/** * * @author Soup High * * *public class Sparkstream {public static void main (string[] args) {//Create a local StreamingContext with Working thread andBatch//Interval of1Second sparkconf conf = new sparkconf (). Setmaster("Local[4]"). Setappname("Networkwordcount"). Set("Spark.testing.memory","2147480000");Javastreamingcontext JSSC = new Javastreamingcontext (conf, durations. Seconds(1));System. out. println(JSSC);Create a DStream that would connect to hostname:port, like//localhost:9999javareceiverinputdstream<string> lines = JSSC. Sockettextstream("Master",9999);javadstream<string> lines = JSSC. Textfilestream("Hdfs://master:9000/stream");Split each line into words javadstream<string> words = lines. FlatMap(New flatmapfunction<string, string> () {@Override public iterable<string>Pager(Stringx) {System. out. println(Arrays. Aslist(x. Split(" ")). Get(0));Return Arrays. Aslist(x. Split(" "));}        });Count each wordinchEach batch javapairdstream<string, integer> pairs = words. Maptopair(New pairfunction<string, String, integer> () {@Override public tuple2<string, integer>Pager(String s) {return new tuple2<string, integer> (S,1);}        });System. out. println(pairs);javapairdstream<string, integer> wordcounts = pairs. Reducebykey(New Function2<integer, Integer, integer> () {@Override public IntegerPager(integer i1, integer i2) {return i1 + i2;}        });Print the first ten elements of each RDD generatedinchThis DStream to//the console wordcounts. Print();Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", New Text (), New Intwritable (), javapairdstream<text,intwritable> ());Wordcounts. Dstream(). Saveastextfiles("hdfs://master:9000/testfile/","Spark");Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", text,intwritable);System. out. println(wordcounts. Count());Jssc. Start(); System. out. println(wordcounts. Count());//Start the computationJssc. Awaittermination(); Wait for the computation to terminate}}

And then just the terminal input Hello World

# TERMINAL 1:# Running Netcat9999hello world

You can see it through the console.

1357008430000 ms-------------------------------------------(hello,1)(world,1)...

And the real-time files generated by the calculations can also be seen on HDFs

The second case is that the input data source is not passed through the sockettextstream socket, but directly through a file directory on HDFs

Packagecom. TG. Spark. Stream;Import Java. Util. Arrays;import org. Apache. Spark.*;import org. Apache. Spark. API. Java. function.*;import org. Apache. Spark. Streaming.*;import org. Apache. Spark. Streaming. API. Java.*;Import Scala. Tuple2;/** * * @author Soup High * * *public class SparkStream2 {public static void main (string[] args) {//Create a local StreamingContext with Working thread andBatch//Interval of1Second sparkconf conf = new sparkconf (). Setmaster("Local[4]"). Setappname("Networkwordcount"). Set("Spark.testing.memory","2147480000");Javastreamingcontext JSSC = new Javastreamingcontext (conf, durations. Seconds(1));System. out. println(JSSC);Create a DStream that would connect to hostname:port, like//localhost:9999javareceiverinputdstream<string> lines = JSSC. Sockettextstream("Master",9999);javadstream<string> lines = JSSC. Textfilestream("Hdfs://master:9000/stream");Split each line into words javadstream<string> words = lines. FlatMap(New flatmapfunction<string, string> () {@Override public iterable<string>Pager(Stringx) {System. out. println(Arrays. Aslist(x. Split(" ")). Get(0));Return Arrays. Aslist(x. Split(" "));}        });Count each wordinchEach batch javapairdstream<string, integer> pairs = words. Maptopair(New pairfunction<string, String, integer> () {@Override public tuple2<string, integer>Pager(String s) {return new tuple2<string, integer> (S,1);}        });System. out. println(pairs);javapairdstream<string, integer> wordcounts = pairs. Reducebykey(New Function2<integer, Integer, integer> () {@Override public IntegerPager(integer i1, integer i2) {return i1 + i2;}        });Print the first ten elements of each RDD generatedinchThis DStream to//the console wordcounts. Print();Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", New Text (), New Intwritable (), javapairdstream<text,intwritable> ());Wordcounts. Dstream(). Saveastextfiles("hdfs://master:9000/testfile/","Spark");Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", text,intwritable);System. out. println(wordcounts. Count());Jssc. Start(); System. out. println(wordcounts. Count());//Start the computationJssc. Awaittermination(); Wait for the computation to terminate}}

So there is the port has been monitoring your directory, as long as it has a file generation, it will immediately read the contents of it, you can run the program, and then manually add a file to the first directory, you can see the output

Code word is not easy, reprint please specify source http://blog.csdn.net/tanggao1314/article/details/51606721

Reference
Spark Programming Guide

Spark live stream Compute Java case

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More