Spark live stream Compute Java case

Source: Internet
Author: User
Tags iterable

Now, the online spark-based code is basically Scala, a lot of books are based on Scala, no way, who called Spark is Scala written out, but I do not have a systematic study of Scala, so I can only use Java to write Spark program, Spark supports Java, and Scala is based on the JVM, not to mention, directly on the code

This is the official online example, the Big Data Learning Classic case of Word count
In Linux next terminal enter $ nc-lk 9999

Then run the following code

Packagecom. TG. Spark. Stream;Import Java. Util. Arrays;import org. Apache. Spark.*;import org. Apache. Spark. API. Java. function.*;import org. Apache. Spark. Streaming.*;import org. Apache. Spark. Streaming. API. Java.*;Import Scala. Tuple2;/** * * @author Soup High * * *public class Sparkstream {public static void main (string[] args) {//Create a local StreamingContext with Working thread andBatch//Interval of1Second sparkconf conf = new sparkconf (). Setmaster("Local[4]"). Setappname("Networkwordcount"). Set("Spark.testing.memory","2147480000");Javastreamingcontext JSSC = new Javastreamingcontext (conf, durations. Seconds(1));System. out. println(JSSC);Create a DStream that would connect to hostname:port, like//localhost:9999javareceiverinputdstream<string> lines = JSSC. Sockettextstream("Master",9999);javadstream<string> lines = JSSC. Textfilestream("Hdfs://master:9000/stream");Split each line into words javadstream<string> words = lines. FlatMap(New flatmapfunction<string, string> () {@Override public iterable<string>Pager(Stringx) {System. out. println(Arrays. Aslist(x. Split(" ")). Get(0));Return Arrays. Aslist(x. Split(" "));}        });Count each wordinchEach batch javapairdstream<string, integer> pairs = words. Maptopair(New pairfunction<string, String, integer> () {@Override public tuple2<string, integer>Pager(String s) {return new tuple2<string, integer> (S,1);}        });System. out. println(pairs);javapairdstream<string, integer> wordcounts = pairs. Reducebykey(New Function2<integer, Integer, integer> () {@Override public IntegerPager(integer i1, integer i2) {return i1 + i2;}        });Print the first ten elements of each RDD generatedinchThis DStream to//the console wordcounts. Print();Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", New Text (), New Intwritable (), javapairdstream<text,intwritable> ());Wordcounts. Dstream(). Saveastextfiles("hdfs://master:9000/testfile/","Spark");Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", text,intwritable);System. out. println(wordcounts. Count());Jssc. Start(); System. out. println(wordcounts. Count());//Start the computationJssc. Awaittermination(); Wait for the computation to terminate}}

And then just the terminal input Hello World

# TERMINAL 1:# Running Netcat9999hello world

You can see it through the console.

1357008430000 ms-------------------------------------------(hello,1)(world,1)...

And the real-time files generated by the calculations can also be seen on HDFs

The second case is that the input data source is not passed through the sockettextstream socket, but directly through a file directory on HDFs

Packagecom. TG. Spark. Stream;Import Java. Util. Arrays;import org. Apache. Spark.*;import org. Apache. Spark. API. Java. function.*;import org. Apache. Spark. Streaming.*;import org. Apache. Spark. Streaming. API. Java.*;Import Scala. Tuple2;/** * * @author Soup High * * *public class SparkStream2 {public static void main (string[] args) {//Create a local StreamingContext with Working thread andBatch//Interval of1Second sparkconf conf = new sparkconf (). Setmaster("Local[4]"). Setappname("Networkwordcount"). Set("Spark.testing.memory","2147480000");Javastreamingcontext JSSC = new Javastreamingcontext (conf, durations. Seconds(1));System. out. println(JSSC);Create a DStream that would connect to hostname:port, like//localhost:9999javareceiverinputdstream<string> lines = JSSC. Sockettextstream("Master",9999);javadstream<string> lines = JSSC. Textfilestream("Hdfs://master:9000/stream");Split each line into words javadstream<string> words = lines. FlatMap(New flatmapfunction<string, string> () {@Override public iterable<string>Pager(Stringx) {System. out. println(Arrays. Aslist(x. Split(" ")). Get(0));Return Arrays. Aslist(x. Split(" "));}        });Count each wordinchEach batch javapairdstream<string, integer> pairs = words. Maptopair(New pairfunction<string, String, integer> () {@Override public tuple2<string, integer>Pager(String s) {return new tuple2<string, integer> (S,1);}        });System. out. println(pairs);javapairdstream<string, integer> wordcounts = pairs. Reducebykey(New Function2<integer, Integer, integer> () {@Override public IntegerPager(integer i1, integer i2) {return i1 + i2;}        });Print the first ten elements of each RDD generatedinchThis DStream to//the console wordcounts. Print();Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", New Text (), New Intwritable (), javapairdstream<text,intwritable> ());Wordcounts. Dstream(). Saveastextfiles("hdfs://master:9000/testfile/","Spark");Wordcounts. Saveashadoopfiles("hdfs://master:9000/testfile/","Spark", text,intwritable);System. out. println(wordcounts. Count());Jssc. Start(); System. out. println(wordcounts. Count());//Start the computationJssc. Awaittermination(); Wait for the computation to terminate}}

So there is the port has been monitoring your directory, as long as it has a file generation, it will immediately read the contents of it, you can run the program, and then manually add a file to the first directory, you can see the output

Code word is not easy, reprint please specify source http://blog.csdn.net/tanggao1314/article/details/51606721

Reference
Spark Programming Guide

Spark live stream Compute Java case

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.