Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)

Last Update:2018-10-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The MAVEN components are as follows:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.3.0</version>
</dependency>

The official website code is as follows:

Pasting

/*
* Licensed to the Apache software Foundation (ASF) under one or more
* Contributor license agreements. See the NOTICE file distributed with
* This work for additional information regarding copyright ownership.
* The ASF licenses this file to under the Apache License, Version 2.0
* (the "License"); You are not a use of this file except in compliance with
* the License. Obtain a copy of the License at
 *
* http://www.apache.org/licenses/LICENSE-2.0
*
* unless required by applicable or agreed to writing, software
* Distributed under the License is distributed on a "as is" BASIS,
* without warranties or CONDITIONS of any KIND, either express or implied.
* See the License for the specific language governing permissions and
* Limitations under the License.
 */

//Scalastyle:off println
 PackageOrg.apache.spark.examples.streaming

ImportOrg.apache.spark.SparkConf
ImportOrg.apache.spark.streaming._
ImportOrg.apache.spark.streaming.kafka010._

/**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage:directkafkawordcount<Brokers> <Topics>
 *   <Brokers>is a list of one or more Kafka brokers
 *   <Topics>is a list of one or more Kafka topics to consume from
 *
* Example:
* $ bin/run-example streaming. Directkafkawordcount broker1-host:port,broker2-host:port \
* Topic1,topic2
 */
ObjectDirectkafkawordcount {
  defMain (args:array[string]) {
    if(Args.length <2) {
System.err.println (S "" "
        | Usage:directkafkawordcount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
        | <topics> is a list of one or more Kafka topics to consume from
        |
        """. Stripmargin)
System.exit (1)
}

Streamingexamples.setstreamingloglevels ()

    ValArray (Brokers, topics) = args

    //Create context with 2 second batch interval
    Valsparkconf =NewSparkconf (). Setappname ("Directkafkawordcount")
    ValSSC =NewStreamingContext (sparkconf, Seconds (2))

    //Create Direct Kafka stream with brokers and topics
    ValTopicsset = Topics.split (","). Toset
    ValKafkaparams = map[string, String] ("Metadata.broker.list"Brokers)
    ValMessages = Kafkautils.createdirectstream[string, String] (
SSc,
      Locationstrategies.preferconsistent,
      Consumerstrategies.subscribe[string, String] (topicsset, Kafkaparams))

    //Get the lines, split them into words, count the words and print
    ValLines = Messages.map (_.value)
    ValWords = Lines.flatmap (_.split (" "))
    ValWordcounts = Words.map (x = = (x, 1L). Reducebykey (_ + _)
Wordcounts.print ()

    Start the computation
    Ssc.start ()
Ssc.awaittermination ()
}
}
//Scalastyle:on println

Run the above code with the following error:

Exception in thread "main" org.apache.kafka.common.config.ConfigException:Missing required configuration " Bootstrap.servers "which has no default value.

Workaround:

is visible by error because Kafka related parameters are not set.

Revise the official website code as follows:

 PackageCn.xdf.userprofile.stream
ImportOrg.apache.spark.SparkConf
ImportOrg.apache.spark.streaming. {Seconds, StreamingContext}
ImportOrg.apache.spark.streaming.kafka010._

ImportScala.collection.mutable

ObjectDirectkafka {
  defMain(args:array[String]):Unit= {
    if(Args.length <2) {
System.Err. println (
        S "" "
           | Usage:directkafkawordcount <brokers> <topics>
           | <brokers> is a list of one or more Kafka brokers
           | <topics> is a list of one or more Kafka topics to consume from
           |
        """. Stripmargin)
System.Exit(1)
}
      ValArray(Brokers,Topics) =args

      varconf =NewSparkconf ()
. Setappname ("Directkafka")
. Setmaster ("local[2]")

      ValSSC =NewStreamingContext (conf, Seconds(2))

      ValTopicsset=topics.split (","). Toset
      ValKafkaparams=mutable. hashmap[String,String]()
       //You must add the following parameters, or you will get an error
         Kafkaparams.put ("Bootstrap.servers",Brokers
Kafkaparams.put ("Group.id", "Group1")
Kafkaparams.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer")
Kafkaparams.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer")
      ValMessages=kafkautils.Createdirectstream[String,String](
SSc,
        Locationstrategies.preferconsistent,
        Consumerstrategies.Subscribe[String,String] (Topicsset,Kafkaparams
)
)
      //Get the lines, split them into words, count the words and print
      ValLines = Messages.map (_.value)
      ValWords = Lines.flatmap (_.split (" "))
      ValWordcounts = Words.map (x = = (x, 1L). Reducebykey (_ + _)
Wordcounts.print ()

      //Start the computation
      Ssc.start ()
Ssc.awaittermination ()

}
}

Run as follows:

Start Kafka

Bin/kafka-server-start./etc/kafka/server.properties &

[2018-10-22 11:24:14,748] INFO [Groupcoordinator 0]: Stabilized group group1 Generation 1 (__consumer_offsets-40) ( Kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:24:14,761] INFO [groupcoordinator 0]: Assignment received From leader to group Group1 for Generation 1 (kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:24:14,779] INFO up Dated Partitionleaderepoch. New: {epoch:0, offset:0}, current: {epoch:-1, offset-1} for Partition: __consumer_offsets-40. Cache now contains 0 entries. (Kafka.server.epoch.LeaderEpochFileCache) [2018-10-22 11:28:19,010] INFO [Groupcoordinator 0]: Preparing to rebalance group group1 with old Generation 1 (__consumer_offsets-40) (Kafka.coordi Nator.group.GroupCoordinator) [2018-10-22 11:28:19,013] INFO [groupcoordinator 0]: group group1 with Generation 2 are now EM Pty (__CONSUMER_OFFSETS-40) (kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:29:29,424] INFO [ Groupmetadatamanager Brokerid=0] removed 0 expired offsets in one milliseconds. (Kafka. Coordinator.group.GroupMetadataManager) [2018-10-22 11:39:29,414] INFO [groupmetadatamanager brokerid=0] removed 0 Expired offsets in 1 milliseconds. (Kafka.coordinator.group.GroupMetadataManager) [2018-10-22 11:49:29,414] INFO [Groupmetadatamanager brokerid=0] removed 0 expired offsets in 1 milliseconds. (Kafka.coordinator.group.GroupMetadataManager)

Run Spark

/usr/local/spark-2.3.0/bin/spark-submit--class Cn.xdf.userprofile.stream.DirectKafka--master Yarn-- Driver-memory 2g--num-executors 1--executor-memory 2g--executor-cores 1 Userprofile2.0.jar localhost:9092 Test

2018-10-22 11:28:16 INFO dagscheduler:54-submitting 1 missing tasks from Resultstage 483 (shuffledrdd[604) at redu Cebykey at directkafka.scala:46) (first tasks is for partitions Vector (1)) 2018-10-22 11:28:16 INFO taskscheduler Impl:54-adding task set 483.0 with 1 tasks2018-10-22 11:28:16 INFO tasksetmanager:54-starting task 0.0 in stage 483.0 (TID 362, localhost, executor driver, partition 1, process_local, 7649 bytes) 2018-10-22 11:28:16 INFO executor : 54-running task 0.0 in stage 483.0 (TID 362) 2018-10-22 11:28:16 INFO shuffleblockfetcheriterator:54-getting 0 N On-empty blocks out of 1 blocks2018-10-22 11:28:16 INFO shuffleblockfetcheriterator:54-started 0 remotes fetches in 0 ms2018-10-22 11:28:16 INFO executor:54-finished task 0.0 in stage 483.0 (TID 362). 1091 bytes result sent to driver2018-10-22 11:28:16 INFO tasksetmanager:54-finished task 0.0 in stage 483.0 (TID 3 4 ms on localhost (executor driver) (1/1) 2018-10-22 11:28:16 INFO taskschedulerimpl:54-removed TaskSet 483.0, whose tasks all completed, from pool 20 18-10-22 11:28:16 INFO dagscheduler:54-resultstage 483 (print at directkafka.scala:47) finished in 0.008 s2018-10- 11:28:16 INFO dagscheduler:54-job 241 Finished:print at Directkafka.scala:47, took 0.009993 s----------------- --------------------------time:1540178896000 Ms-------------------------------------------

Start producer

[Email protected] kafka_2.11-1.0.0]# bin/kafka-console-producer.sh--topic test--broker-list localhost:9092

> Hello

> Hello Me

View results:

(hello,2) (me,1) (you,1) 2018-10-22 11:57:08 INFO jobscheduler:54-finished job streaming job 1540180628000 ms.0 from job set of time 1540180628000 ms2018-10-22 11:57:08 INFO jobscheduler:54-total delay:0.119 s for time 1540180628000 MS (Executio n:0.072 s) 2018-10-22 11:57:08 info shuffledrdd:54-removing RDD 154 from persistence list2018-10-22 11:57:08 INFO mappartitionsrdd:54-removing RDD 153 from persistence list2018-10-22 11:57:08 INFO blockmanager:54-removin G RDD 1532018-10-22 11:57:08 info blockmanager:54-removing RDD 1542018-10-22 11:57:08 info mappartitionsrdd: 54-removing Rdd from Persistence list2018-10-22 11:57:08 INFO blockmanager:54-removing RDD 1522018-10-22 11: 57:08 Info mappartitionsrdd:54-removing RDD 151 from persistence list2018-10-22 11:57:08 INFO BlockManager:5 4-removing rdd 1512018-10-22 11:57:08 INFO kafkardd:54-removing rdd from Persistence list2018-10-22 11:57:08 INFO &NBSp Blockmanager:54-removing RDD 150

Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support