Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)

Source: Internet
Author: User

The MAVEN components are as follows:

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>

The official website code is as follows:

Pasting

/*
* Licensed to the Apache software Foundation (ASF) under one or more
* Contributor license agreements. See the NOTICE file distributed with
* This work for additional information regarding copyright ownership.
* The ASF licenses this file to under the Apache License, Version 2.0
* (the "License"); You are not a use of this file except in compliance with
* the License. Obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* unless required by applicable or agreed to writing, software
* Distributed under the License is distributed on a "as is" BASIS,
* without warranties or CONDITIONS of any KIND, either express or implied.
* See the License for the specific language governing permissions and
* Limitations under the License.
*/

//Scalastyle:off println
PackageOrg.apache.spark.examples.streaming

ImportOrg.apache.spark.SparkConf
ImportOrg.apache.spark.streaming._
ImportOrg.apache.spark.streaming.kafka010._

/**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage:directkafkawordcount<Brokers> <Topics>
* <Brokers>is a list of one or more Kafka brokers
* <Topics>is a list of one or more Kafka topics to consume from
*
* Example:
* $ bin/run-example streaming. Directkafkawordcount broker1-host:port,broker2-host:port \
* Topic1,topic2
*/
ObjectDirectkafkawordcount {
defMain (args:array[string]) {
if(Args.length <2) {
System.err.println (S "" "
| Usage:directkafkawordcount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more Kafka topics to consume from
|
""". Stripmargin)
System.exit (1)
}

Streamingexamples.setstreamingloglevels ()

ValArray (Brokers, topics) = args

//Create context with 2 second batch interval
Valsparkconf =NewSparkconf (). Setappname ("Directkafkawordcount")
ValSSC =NewStreamingContext (sparkconf, Seconds (2))

//Create Direct Kafka stream with brokers and topics
ValTopicsset = Topics.split (","). Toset
ValKafkaparams = map[string, String] ("Metadata.broker.list"Brokers)
ValMessages = Kafkautils.createdirectstream[string, String] (
SSc,
Locationstrategies.preferconsistent,
Consumerstrategies.subscribe[string, String] (topicsset, Kafkaparams))

//Get the lines, split them into words, count the words and print
ValLines = Messages.map (_.value)
ValWords = Lines.flatmap (_.split (" "))
ValWordcounts = Words.map (x = = (x, 1L). Reducebykey (_ + _)
Wordcounts.print ()

Start the computation
Ssc.start ()
Ssc.awaittermination ()
}
}
//Scalastyle:on println

Run the above code with the following error:

Exception in thread "main" org.apache.kafka.common.config.ConfigException:Missing required configuration " Bootstrap.servers "which has no default value.

Workaround:

is visible by error because Kafka related parameters are not set.

Revise the official website code as follows:

 PackageCn.xdf.userprofile.stream
ImportOrg.apache.spark.SparkConf
ImportOrg.apache.spark.streaming. {Seconds, StreamingContext}
ImportOrg.apache.spark.streaming.kafka010._

ImportScala.collection.mutable

ObjectDirectkafka {
defMain(args:array[String]):Unit= {
if(Args.length <2) {
System.Err. println (
S "" "
| Usage:directkafkawordcount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more Kafka topics to consume from
|
""". Stripmargin)
System.Exit(1)
}
ValArray(Brokers,Topics) =args

varconf =NewSparkconf ()
. Setappname ("Directkafka")
. Setmaster ("local[2]")

ValSSC =NewStreamingContext (conf, Seconds(2))

ValTopicsset=topics.split (","). Toset
ValKafkaparams=mutable. hashmap[String,String]()
//You must add the following parameters, or you will get an error
Kafkaparams.put ("Bootstrap.servers",Brokers
Kafkaparams.put ("Group.id", "Group1")
Kafkaparams.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer")
Kafkaparams.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer")
ValMessages=kafkautils.Createdirectstream[String,String](
SSc,
Locationstrategies.preferconsistent,
Consumerstrategies.Subscribe[String,String] (Topicsset,Kafkaparams
)
)
//Get the lines, split them into words, count the words and print
ValLines = Messages.map (_.value)
ValWords = Lines.flatmap (_.split (" "))
ValWordcounts = Words.map (x = = (x, 1L). Reducebykey (_ + _)
Wordcounts.print ()

//Start the computation
Ssc.start ()
Ssc.awaittermination ()

}
}

Run as follows:

Start Kafka

Bin/kafka-server-start./etc/kafka/server.properties &

[2018-10-22 11:24:14,748] INFO [Groupcoordinator 0]: Stabilized group group1 Generation 1 (__consumer_offsets-40) ( Kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:24:14,761] INFO [groupcoordinator 0]: Assignment received From leader to group Group1 for Generation 1 (kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:24:14,779] INFO up Dated Partitionleaderepoch. New: {epoch:0, offset:0}, current: {epoch:-1, offset-1} for Partition: __consumer_offsets-40. Cache now contains 0 entries. (Kafka.server.epoch.LeaderEpochFileCache) [2018-10-22 11:28:19,010] INFO [Groupcoordinator 0]: Preparing to rebalance group group1 with old Generation 1 (__consumer_offsets-40) (Kafka.coordi Nator.group.GroupCoordinator) [2018-10-22 11:28:19,013] INFO [groupcoordinator 0]: group group1 with Generation 2 are now EM Pty (__CONSUMER_OFFSETS-40) (kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:29:29,424] INFO [ Groupmetadatamanager Brokerid=0] removed 0 expired offsets in one milliseconds. (Kafka. Coordinator.group.GroupMetadataManager) [2018-10-22 11:39:29,414] INFO [groupmetadatamanager brokerid=0] removed 0 Expired offsets in 1 milliseconds. (Kafka.coordinator.group.GroupMetadataManager) [2018-10-22 11:49:29,414] INFO [Groupmetadatamanager brokerid=0] removed 0 expired offsets in 1 milliseconds. (Kafka.coordinator.group.GroupMetadataManager)

Run Spark

/usr/local/spark-2.3.0/bin/spark-submit--class Cn.xdf.userprofile.stream.DirectKafka--master Yarn-- Driver-memory 2g--num-executors 1--executor-memory 2g--executor-cores 1 Userprofile2.0.jar localhost:9092 Test

2018-10-22 11:28:16 INFO  dagscheduler:54-submitting 1 missing tasks from Resultstage 483 (shuffledrdd[604) at redu Cebykey at directkafka.scala:46) (first tasks is for partitions Vector (1)) 2018-10-22 11:28:16 INFO  taskscheduler Impl:54-adding task set 483.0 with 1 tasks2018-10-22 11:28:16 INFO  tasksetmanager:54-starting task 0.0 in stage 483.0 (TID 362, localhost, executor driver, partition 1, process_local, 7649 bytes) 2018-10-22 11:28:16 INFO  executor : 54-running task 0.0 in stage 483.0 (TID 362) 2018-10-22 11:28:16 INFO  shuffleblockfetcheriterator:54-getting 0 N On-empty blocks out of 1 blocks2018-10-22 11:28:16 INFO  shuffleblockfetcheriterator:54-started 0 remotes fetches in 0 ms2018-10-22 11:28:16 INFO  executor:54-finished task 0.0 in stage 483.0 (TID 362). 1091 bytes result sent to driver2018-10-22 11:28:16 INFO  tasksetmanager:54-finished task 0.0 in stage 483.0 (TID 3 4 ms on localhost (executor driver) (1/1) 2018-10-22 11:28:16 INFO  taskschedulerimpl:54-removed TaskSet 483.0, whose tasks all completed, from pool 20 18-10-22 11:28:16 INFO  dagscheduler:54-resultstage 483 (print at directkafka.scala:47) finished in 0.008 s2018-10- 11:28:16 INFO  dagscheduler:54-job 241 Finished:print at Directkafka.scala:47, took 0.009993 s----------------- --------------------------time:1540178896000 Ms-------------------------------------------

Start producer

[Email protected] kafka_2.11-1.0.0]# bin/kafka-console-producer.sh--topic test--broker-list localhost:9092

> Hello

> Hello Me

View results:

(hello,2) (me,1) (you,1) 2018-10-22 11:57:08 INFO  jobscheduler:54-finished job streaming job 1540180628000 ms.0 from job set of time 1540180628000 ms2018-10-22 11:57:08 INFO  jobscheduler:54-total delay:0.119 s for time 1540180628000 MS (Executio n:0.072 s) 2018-10-22 11:57:08 info  shuffledrdd:54-removing RDD 154 from persistence list2018-10-22 11:57:08 INFO  mappartitionsrdd:54-removing RDD 153 from persistence list2018-10-22 11:57:08 INFO  blockmanager:54-removin G RDD 1532018-10-22 11:57:08 info  blockmanager:54-removing RDD 1542018-10-22 11:57:08 info  mappartitionsrdd: 54-removing Rdd from Persistence list2018-10-22 11:57:08 INFO  blockmanager:54-removing RDD 1522018-10-22 11: 57:08 Info  mappartitionsrdd:54-removing RDD 151 from persistence list2018-10-22 11:57:08 INFO  BlockManager:5 4-removing rdd 1512018-10-22 11:57:08 INFO  kafkardd:54-removing rdd from Persistence list2018-10-22 11:57:08 INFO &NBSp Blockmanager:54-removing RDD 150

Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.