This article will show
1, how to use spark-streaming access to TCP data and filtering;
2, how to use spark-streaming to access TCP data and to WordCount;
The contents are as follows:
1. Using MAVEN, first solve the pom dependency
<Dependency> <groupId>Org.apache.spark</groupId> <Artifactid>spark-streaming-kafka_2.10</Artifactid> <version>1.6.0</version> </Dependency> <Dependency> <groupId>Org.apache.spark</groupId> <Artifactid>spark-streaming_2.10</Artifactid> <version>1.6.0</version> </Dependency> <Dependency> <groupId>Org.apache.spark</groupId> <Artifactid>spark-core_2.10</Artifactid> <version>1.6.0</version> <Scope>Provided</Scope> </Dependency> <Dependency> <groupId>Org.apache.spark</groupId> <Artifactid>spark-hive_2.10</Artifactid> <version>1.6.0</version> <Scope>Provided</Scope> </Dependency> <Dependency> <groupId>Org.apache.spark</groupId> <Artifactid>spark-sql_2.10</Artifactid> <version>1.6.0</version> <Scope>Provided</Scope> </Dependency>
1, receive TCP data and filter, print the line containing the error
Packagecom.xiaoju.dqa.realtime_streaming;Importorg.apache.spark.SparkConf;Importorg.apache.spark.api.java.function.Function;ImportOrg.apache.spark.streaming.api.java.JavaDStream;ImportOrg.apache.spark.streaming.api.java.JavaStreamingContext;Importorg.apache.spark.streaming.Durations;//nc-lk 9999 Public classsparkstreamingtcp { Public Static voidMain (string[] args) {sparkconf conf=NewSparkconf (). Setmaster ("local"). Setappname ("Streaming word count"); Javastreamingcontext JSSC=NewJavastreamingcontext (conf, durations.seconds (1)); Javadstream<String> lines = Jssc.sockettextstream ("10.93.21.21", 9999); Javadstream<String> Errorlines = Lines.filter (NewFunction<string, boolean>() {@Override PublicBoolean Call (String s)throwsException {returnS.contains ("Error"); } }); Errorlines.print (); Jssc.start (); Jssc.awaittermination (); }}
Execution method
$ spark-submit Realtime-streaming-1.0-snapshot-jar-with-dependencies.jar
# another window
$ nc-lk 9999
# input data
2. Receive Kafka Data and Count (WordCount)
Packagecom.xiaoju.dqa.realtime_streaming;ImportJava.util.*;Importorg.apache.spark.SparkConf;ImportOrg.apache.spark.api.java.JavaSparkContext;Importorg.apache.spark.api.java.function.FlatMapFunction;ImportOrg.apache.spark.api.java.function.Function2;Importorg.apache.spark.api.java.function.PairFunction;ImportOrg.apache.spark.streaming.api.java.*;ImportOrg.apache.spark.streaming.api.java.JavaPairDStream;ImportOrg.apache.spark.streaming.api.java.JavaStreamingContext;Importorg.apache.spark.streaming.kafka.KafkaUtils;Importorg.apache.spark.streaming.Durations;ImportScala. Tuple2;//bin/kafka-console-producer.sh--broker-list localhost:9092--topic test Public classSparkstreamingkafka { Public Static voidMain (string[] args)throwsinterruptedexception {sparkconf conf=NewSparkconf (). Setmaster ("Yarn-client"). Setappname ("Streaming word count"); //String topic = "Offline_log_metrics";String topic = "Test"; intPart = 1; Javasparkcontext SC=Newjavasparkcontext (conf); Sc.setloglevel ("WARN"); Javastreamingcontext JSSC=NewJavastreamingcontext (SC, durations.seconds (10)); Map<string,integer> Topicmap =NewHashmap<string, integer>(); String[] Topics= Topic.split (";"); for(inti=0; i<topics.length; i++) {topicmap.put (topics[i],1); } List<javapairreceiverinputdstream<string, string>> list =NewArraylist<javapairreceiverinputdstream<string, string>>(); for(inti = 0; I < part; i++) {List.add (Kafkautils.createstream (JSSC,"10.93.21.21:2181", "Bigdata_qa", Topicmap)); } Javapairdstream<string, string> wordcountlines = list.get (0); for(inti = 1; I < list.size (); i++) {Wordcountlines=wordcountlines.union (List.get (i)); } Javapairdstream<string, integer> counts = Wordcountlines.flatmap (NewFlatmapfunction<tuple2<string, String>, string>() {@Override PublicIterable<string> Call (tuple2<string, string>stringStringTuple2) {List<String> List2 =NULL; Try { if("". Equals (stringstringtuple2._2) | | Stringstringtuple2._2 = =NULL) {System.out.println ("_2 is null"); Throw NewException ("_2 is null"); } List2= Arrays.aslist (Stringstringtuple2._2.split ("")); } Catch(Exception ex) {ex.printstacktrace (); System.out.println (Ex.getmessage ()); } returnList2; }}). Maptopair (NewPairfunction<string, String, integer>() { PublicTuple2<string, Integer> call (String s)throwsException {Tuple2<string, integer> tuple2 =NULL; Try { if(s==NULL|| "". Equals (s)) {Tuple2=NewTuple2<string, integer> (s, 0); Throw NewException ("s is null"); } tuple2=NewTuple2<string, Integer> (S, 1); } Catch(Exception ex) {ex.printstacktrace (); } returnTuple2; }}). Reducebykey (NewFunction2<integer, Integer, integer>() { PublicInteger call (integer x, integer y)throwsException {returnX +y; } }); Counts.print (); Jssc.start (); Try{jssc.awaittermination (); } Catch(Exception ex) {ex.printstacktrace (); } finally{jssc.close (); } }}
Execution method
$ spark-submit--queue=root.xxx realtime-streaming-1.0-snapshot-jar-with-dependencies.jar
# Open another window, start Kafka producer
$ bin/kafka-console-producer.sh--broker-list localhost:9092--topic test
# input data
Java spark-streaming receive Tcp/kafka data