There is a simple demo of spark-streaming, and there are examples of Kafka successful running, where the combination of both, is also commonly used one.
1. Related component version
First confirm the version, because it is different from the previous version, so it is necessary to record, and still do not use Scala, using Java8,spark 2.0.0,kafka 0.10.
2. Introduction of MAVEN Package
Find some examples of a combination of online, but with my current version is not the same, so it is not successful at all, so explore the next, listing the introduction of the package.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId> spark-streaming-kafka-0-10_2.11</artifactid>
<version>2.0.0</version>
</ Dependency>
Kafka version of the package can be found on the internet is the latest 1.6.3, I tried, has not been able to run successfully under the SPARK2, so found the corresponding kafka0.10 version, note that the spark2.0 Scala version is already 2.11, so include the following must be followed by 2.11, the Scala version.
3.SparkSteamingKafka class
Note that the introduction of the package path is org.apache.spark.streaming.kafka010.xxx, so here the import also put in. Other direct look at the annotation.
Import Java.util.Arrays;
Import java.util.Collection;
Import Java.util.HashMap;
Import Java.util.HashSet;
Import Java.util.Map;
Import Org.apache.kafka.clients.consumer.ConsumerRecord;
Import org.apache.kafka.common.TopicPartition;
Import org.apache.spark.SparkConf;
Import Org.apache.spark.api.java.JavaSparkContext;
Import org.apache.spark.streaming.Durations;
Import Org.apache.spark.streaming.api.java.JavaInputDStream;
Import Org.apache.spark.streaming.api.java.JavaPairDStream;
Import Org.apache.spark.streaming.api.java.JavaStreamingContext;
Import org.apache.spark.streaming.kafka010.ConsumerStrategies;
Import Org.apache.spark.streaming.kafka010.KafkaUtils;
Import org.apache.spark.streaming.kafka010.LocationStrategies; Import Scala.
Tuple2; public class Sparksteamingkafka {public static void main (string[] args) throws Interruptedexception {String
Brokers = "master2:6667";
String topics = "Topic1"; sparkconf conf = new sparkconf (). Setmaster ("local[2]"). SEtappname ("Streaming word count");
Javasparkcontext sc = new Javasparkcontext (conf);
Sc.setloglevel ("WARN");
Javastreamingcontext SSC = new Javastreamingcontext (SC, durations.seconds (1));
collection<string> topicsset = new Hashset<> (Arrays.aslist (Topics.split (",")); Kafka related parameters, necessary.
Missing will be an error map<string, object> kafkaparams = new hashmap<> ();
Kafkaparams.put ("Metadata.broker.list", brokers);
Kafkaparams.put ("Bootstrap.servers", brokers);
Kafkaparams.put ("Group.id", "group1");
Kafkaparams.put ("Key.serializer", "Org.apache.kafka.common.serialization.StringSerializer");
Kafkaparams.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
Kafkaparams.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
Topic partition Map<topicpartition, long> offsets = new hashmap<> (); Offsets.put (NEW topicpartition ("Topic1", 0), 2L); Through Kafkautils.createdirectstream (...) Obtain Kafka data, Kafka the relevant parameters are specified by kafkaparams javainputdstream<consumerrecord<object,object>> lines = kafkautils.c Reatedirectstream (SSC, Locationstrategies.preferconsistent (), Consumerstr Ategies.
Subscribe (Topicsset, Kafkaparams, offsets));
Here is the same as the previous demo, just need to pay attention to this side of the lines in the parameter itself is a Consumerrecord object javapairdstream<string, integer> counts = Lines.flatmap (x-> arrays.aslist (X.value (). toString (). Split ("")). iterator ()). Maptopair (x->
New Tuple2<string, Integer> (x, 1)). Reducebykey ((x, y)-> x + y);
Counts.print (); You can print all the information and look at the Consumerrecord structure//Lines.foreachrdd (Rdd-> {//Rdd.foreach (x-> {//Syst
EM.OUT.PRINTLN (x);
// });
// });
Ssc.start ();
Ssc.awaittermination (); Ssc.cLose (); }
}
4. Run the test
Here use the previous Kafka producer class, put data to the Kafka server, I This is Master2 node deployment Kafka, local test run SPARK2.
Userkafkaproducer producerthread = new Userkafkaproducer (kafkaproperties.topic);
Producerthread.start ();
Then run the Sparksteamingkafka class in 3 and you can see that it has been successful.