JAVA8 spark-streaming Combined Kafka programming (Spark 2.0 & Kafka 0.10) __spark

Source: Internet
Author: User
Tags serialization

There is a simple demo of spark-streaming, and there are examples of Kafka successful running, where the combination of both, is also commonly used one.

1. Related component version
First confirm the version, because it is different from the previous version, so it is necessary to record, and still do not use Scala, using Java8,spark 2.0.0,kafka 0.10.

2. Introduction of MAVEN Package
Find some examples of a combination of online, but with my current version is not the same, so it is not successful at all, so explore the next, listing the introduction of the package.

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId> spark-streaming-kafka-0-10_2.11</artifactid>
      <version>2.0.0</version>
</ Dependency>

Kafka version of the package can be found on the internet is the latest 1.6.3, I tried, has not been able to run successfully under the SPARK2, so found the corresponding kafka0.10 version, note that the spark2.0 Scala version is already 2.11, so include the following must be followed by 2.11, the Scala version.

3.SparkSteamingKafka class
Note that the introduction of the package path is org.apache.spark.streaming.kafka010.xxx, so here the import also put in. Other direct look at the annotation.

Import Java.util.Arrays;
Import java.util.Collection;
Import Java.util.HashMap;
Import Java.util.HashSet;

Import Java.util.Map;
Import Org.apache.kafka.clients.consumer.ConsumerRecord;
Import org.apache.kafka.common.TopicPartition;
Import org.apache.spark.SparkConf;
Import Org.apache.spark.api.java.JavaSparkContext;
Import org.apache.spark.streaming.Durations;
Import Org.apache.spark.streaming.api.java.JavaInputDStream;
Import Org.apache.spark.streaming.api.java.JavaPairDStream;
Import Org.apache.spark.streaming.api.java.JavaStreamingContext;
Import org.apache.spark.streaming.kafka010.ConsumerStrategies;
Import Org.apache.spark.streaming.kafka010.KafkaUtils;

Import org.apache.spark.streaming.kafka010.LocationStrategies; Import Scala.

Tuple2; public class Sparksteamingkafka {public static void main (string[] args) throws Interruptedexception {String
        Brokers = "master2:6667";
        String topics = "Topic1"; sparkconf conf = new sparkconf (). Setmaster ("local[2]"). SEtappname ("Streaming word count");
        Javasparkcontext sc = new Javasparkcontext (conf);
        Sc.setloglevel ("WARN");

        Javastreamingcontext SSC = new Javastreamingcontext (SC, durations.seconds (1));
        collection<string> topicsset = new Hashset<> (Arrays.aslist (Topics.split (",")); Kafka related parameters, necessary.
        Missing will be an error map<string, object> kafkaparams = new hashmap<> ();
        Kafkaparams.put ("Metadata.broker.list", brokers);
        Kafkaparams.put ("Bootstrap.servers", brokers);
        Kafkaparams.put ("Group.id", "group1");
        Kafkaparams.put ("Key.serializer", "Org.apache.kafka.common.serialization.StringSerializer");
        Kafkaparams.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
        Kafkaparams.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
        Topic partition Map<topicpartition, long> offsets = new hashmap<> (); Offsets.put (NEW topicpartition ("Topic1", 0), 2L); Through Kafkautils.createdirectstream (...) Obtain Kafka data, Kafka the relevant parameters are specified by kafkaparams javainputdstream<consumerrecord<object,object>> lines = kafkautils.c Reatedirectstream (SSC, Locationstrategies.preferconsistent (), Consumerstr Ategies.
        Subscribe (Topicsset, Kafkaparams, offsets)); 
                Here is the same as the previous demo, just need to pay attention to this side of the lines in the parameter itself is a Consumerrecord object javapairdstream<string, integer> counts =  Lines.flatmap (x-> arrays.aslist (X.value (). toString (). Split ("")). iterator ()). Maptopair (x->  
        New Tuple2<string, Integer> (x, 1)). Reducebykey ((x, y)-> x + y);
Counts.print (); You can print all the information and look at the Consumerrecord structure//Lines.foreachrdd (Rdd-> {//Rdd.foreach (x-> {//Syst
EM.OUT.PRINTLN (x);
//          });
        //        });
        Ssc.start ();
        Ssc.awaittermination (); Ssc.cLose (); }
}

4. Run the test
Here use the previous Kafka producer class, put data to the Kafka server, I This is Master2 node deployment Kafka, local test run SPARK2.

Userkafkaproducer producerthread = new Userkafkaproducer (kafkaproperties.topic);
Producerthread.start ();

Then run the Sparksteamingkafka class in 3 and you can see that it has been successful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.