JAVA8 spark-streaming Combined Kafka programming (Spark 2.0 & Kafka 0.10) _

JAVA8 spark-streaming Combined Kafka programming (Spark 2.0 & Kafka 0.10) __spark

Last Update:2018-07-27 Source: Internet

Author: User

Tags serialization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There is a simple demo of spark-streaming, and there are examples of Kafka successful running, where the combination of both, is also commonly used one.

1. Related component version
First confirm the version, because it is different from the previous version, so it is necessary to record, and still do not use Scala, using Java8,spark 2.0.0,kafka 0.10.

2. Introduction of MAVEN Package
Find some examples of a combination of online, but with my current version is not the same, so it is not successful at all, so explore the next, listing the introduction of the package.

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId> spark-streaming-kafka-0-10_2.11</artifactid>
      <version>2.0.0</version>
</ Dependency>

Kafka version of the package can be found on the internet is the latest 1.6.3, I tried, has not been able to run successfully under the SPARK2, so found the corresponding kafka0.10 version, note that the spark2.0 Scala version is already 2.11, so include the following must be followed by 2.11, the Scala version.

3.SparkSteamingKafka class
Note that the introduction of the package path is org.apache.spark.streaming.kafka010.xxx, so here the import also put in. Other direct look at the annotation.

Import Java.util.Arrays;
Import java.util.Collection;
Import Java.util.HashMap;
Import Java.util.HashSet;

Import Java.util.Map;
Import Org.apache.kafka.clients.consumer.ConsumerRecord;
Import org.apache.kafka.common.TopicPartition;
Import org.apache.spark.SparkConf;
Import Org.apache.spark.api.java.JavaSparkContext;
Import org.apache.spark.streaming.Durations;
Import Org.apache.spark.streaming.api.java.JavaInputDStream;
Import Org.apache.spark.streaming.api.java.JavaPairDStream;
Import Org.apache.spark.streaming.api.java.JavaStreamingContext;
Import org.apache.spark.streaming.kafka010.ConsumerStrategies;
Import Org.apache.spark.streaming.kafka010.KafkaUtils;

Import org.apache.spark.streaming.kafka010.LocationStrategies; Import Scala.

Tuple2; public class Sparksteamingkafka {public static void main (string[] args) throws Interruptedexception {String
        Brokers = "master2:6667";
        String topics = "Topic1"; sparkconf conf = new sparkconf (). Setmaster ("local[2]"). SEtappname ("Streaming word count");
        Javasparkcontext sc = new Javasparkcontext (conf);
        Sc.setloglevel ("WARN");

        Javastreamingcontext SSC = new Javastreamingcontext (SC, durations.seconds (1));
        collection<string> topicsset = new Hashset<> (Arrays.aslist (Topics.split (",")); Kafka related parameters, necessary.
        Missing will be an error map<string, object> kafkaparams = new hashmap<> ();
        Kafkaparams.put ("Metadata.broker.list", brokers);
        Kafkaparams.put ("Bootstrap.servers", brokers);
        Kafkaparams.put ("Group.id", "group1");
        Kafkaparams.put ("Key.serializer", "Org.apache.kafka.common.serialization.StringSerializer");
        Kafkaparams.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
        Kafkaparams.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
        Topic partition Map<topicpartition, long> offsets = new hashmap<> (); Offsets.put (NEW topicpartition ("Topic1", 0), 2L); Through Kafkautils.createdirectstream (...) Obtain Kafka data, Kafka the relevant parameters are specified by kafkaparams javainputdstream<consumerrecord<object,object>> lines = kafkautils.c Reatedirectstream (SSC, Locationstrategies.preferconsistent (), Consumerstr Ategies.
        Subscribe (Topicsset, Kafkaparams, offsets)); 
                Here is the same as the previous demo, just need to pay attention to this side of the lines in the parameter itself is a Consumerrecord object javapairdstream<string, integer> counts =  Lines.flatmap (x-> arrays.aslist (X.value (). toString (). Split ("")). iterator ()). Maptopair (x->  
        New Tuple2<string, Integer> (x, 1)). Reducebykey ((x, y)-> x + y);
Counts.print (); You can print all the information and look at the Consumerrecord structure//Lines.foreachrdd (Rdd-> {//Rdd.foreach (x-> {//Syst
EM.OUT.PRINTLN (x);
//          });
        //        });
        Ssc.start ();
        Ssc.awaittermination (); Ssc.cLose (); }
}

4. Run the test
Here use the previous Kafka producer class, put data to the Kafka server, I This is Master2 node deployment Kafka, local test run SPARK2.

Userkafkaproducer producerthread = new Userkafkaproducer (kafkaproperties.topic);
Producerthread.start ();

Then run the Sparksteamingkafka class in 3 and you can see that it has been successful.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More