The MAVEN components are as follows:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>
The official website code is as follows:
Pasting
/*
* Licensed to the Apache software Foundation (ASF) under one or more
* Contributor license agreements. See the NOTICE file distributed with
* This work for additional information regarding copyright ownership.
* The ASF licenses this file to under the Apache License, Version 2.0
* (the "License"); You are not a use of this file except in compliance with
* the License. Obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* unless required by applicable or agreed to writing, software
* Distributed under the License is distributed on a "as is" BASIS,
* without warranties or CONDITIONS of any KIND, either express or implied.
* See the License for the specific language governing permissions and
* Limitations under the License.
*/
//Scalastyle:off println
PackageOrg.apache.spark.examples.streaming
ImportOrg.apache.spark.SparkConf
ImportOrg.apache.spark.streaming._
ImportOrg.apache.spark.streaming.kafka010._
/**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage:directkafkawordcount<Brokers> <Topics>
* <Brokers>is a list of one or more Kafka brokers
* <Topics>is a list of one or more Kafka topics to consume from
*
* Example:
* $ bin/run-example streaming. Directkafkawordcount broker1-host:port,broker2-host:port \
* Topic1,topic2
*/
ObjectDirectkafkawordcount {
defMain (args:array[string]) {
if(Args.length <2) {
System.err.println (S "" "
| Usage:directkafkawordcount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more Kafka topics to consume from
|
""". Stripmargin)
System.exit (1)
}
Streamingexamples.setstreamingloglevels ()
ValArray (Brokers, topics) = args
//Create context with 2 second batch interval
Valsparkconf =NewSparkconf (). Setappname ("Directkafkawordcount")
ValSSC =NewStreamingContext (sparkconf, Seconds (2))
//Create Direct Kafka stream with brokers and topics
ValTopicsset = Topics.split (","). Toset
ValKafkaparams = map[string, String] ("Metadata.broker.list"Brokers)
ValMessages = Kafkautils.createdirectstream[string, String] (
SSc,
Locationstrategies.preferconsistent,
Consumerstrategies.subscribe[string, String] (topicsset, Kafkaparams))
//Get the lines, split them into words, count the words and print
ValLines = Messages.map (_.value)
ValWords = Lines.flatmap (_.split (" "))
ValWordcounts = Words.map (x = = (x, 1L). Reducebykey (_ + _)
Wordcounts.print ()
Start the computation
Ssc.start ()
Ssc.awaittermination ()
}
}
//Scalastyle:on println
Run the above code with the following error:
Exception in thread "main" org.apache.kafka.common.config.ConfigException:Missing required configuration " Bootstrap.servers "which has no default value.
Workaround:
is visible by error because Kafka related parameters are not set.
Revise the official website code as follows:
PackageCn.xdf.userprofile.stream
ImportOrg.apache.spark.SparkConf
ImportOrg.apache.spark.streaming. {Seconds, StreamingContext}
ImportOrg.apache.spark.streaming.kafka010._
ImportScala.collection.mutable
ObjectDirectkafka {
defMain(args:array[String]):Unit= {
if(Args.length <2) {
System.Err. println (
S "" "
| Usage:directkafkawordcount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more Kafka topics to consume from
|
""". Stripmargin)
System.Exit(1)
}
ValArray(Brokers,Topics) =args
varconf =NewSparkconf ()
. Setappname ("Directkafka")
. Setmaster ("local[2]")
ValSSC =NewStreamingContext (conf, Seconds(2))
ValTopicsset=topics.split (","). Toset
ValKafkaparams=mutable. hashmap[String,String]()
//You must add the following parameters, or you will get an error
Kafkaparams.put ("Bootstrap.servers",Brokers
Kafkaparams.put ("Group.id", "Group1")
Kafkaparams.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer")
Kafkaparams.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer")
ValMessages=kafkautils.Createdirectstream[String,String](
SSc,
Locationstrategies.preferconsistent,
Consumerstrategies.Subscribe[String,String] (Topicsset,Kafkaparams
)
)
//Get the lines, split them into words, count the words and print
ValLines = Messages.map (_.value)
ValWords = Lines.flatmap (_.split (" "))
ValWordcounts = Words.map (x = = (x, 1L). Reducebykey (_ + _)
Wordcounts.print ()
//Start the computation
Ssc.start ()
Ssc.awaittermination ()
}
}
Run as follows:
Start Kafka
Bin/kafka-server-start./etc/kafka/server.properties &
[2018-10-22 11:24:14,748] INFO [Groupcoordinator 0]: Stabilized group group1 Generation 1 (__consumer_offsets-40) ( Kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:24:14,761] INFO [groupcoordinator 0]: Assignment received From leader to group Group1 for Generation 1 (kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:24:14,779] INFO up Dated Partitionleaderepoch. New: {epoch:0, offset:0}, current: {epoch:-1, offset-1} for Partition: __consumer_offsets-40. Cache now contains 0 entries. (Kafka.server.epoch.LeaderEpochFileCache) [2018-10-22 11:28:19,010] INFO [Groupcoordinator 0]: Preparing to rebalance group group1 with old Generation 1 (__consumer_offsets-40) (Kafka.coordi Nator.group.GroupCoordinator) [2018-10-22 11:28:19,013] INFO [groupcoordinator 0]: group group1 with Generation 2 are now EM Pty (__CONSUMER_OFFSETS-40) (kafka.coordinator.group.GroupCoordinator) [2018-10-22 11:29:29,424] INFO [ Groupmetadatamanager Brokerid=0] removed 0 expired offsets in one milliseconds. (Kafka. Coordinator.group.GroupMetadataManager) [2018-10-22 11:39:29,414] INFO [groupmetadatamanager brokerid=0] removed 0 Expired offsets in 1 milliseconds. (Kafka.coordinator.group.GroupMetadataManager) [2018-10-22 11:49:29,414] INFO [Groupmetadatamanager brokerid=0] removed 0 expired offsets in 1 milliseconds. (Kafka.coordinator.group.GroupMetadataManager)
Run Spark
/usr/local/spark-2.3.0/bin/spark-submit--class Cn.xdf.userprofile.stream.DirectKafka--master Yarn-- Driver-memory 2g--num-executors 1--executor-memory 2g--executor-cores 1 Userprofile2.0.jar localhost:9092 Test
2018-10-22 11:28:16 INFO dagscheduler:54-submitting 1 missing tasks from Resultstage 483 (shuffledrdd[604) at redu Cebykey at directkafka.scala:46) (first tasks is for partitions Vector (1)) 2018-10-22 11:28:16 INFO taskscheduler Impl:54-adding task set 483.0 with 1 tasks2018-10-22 11:28:16 INFO tasksetmanager:54-starting task 0.0 in stage 483.0 (TID 362, localhost, executor driver, partition 1, process_local, 7649 bytes) 2018-10-22 11:28:16 INFO executor : 54-running task 0.0 in stage 483.0 (TID 362) 2018-10-22 11:28:16 INFO shuffleblockfetcheriterator:54-getting 0 N On-empty blocks out of 1 blocks2018-10-22 11:28:16 INFO shuffleblockfetcheriterator:54-started 0 remotes fetches in 0 ms2018-10-22 11:28:16 INFO executor:54-finished task 0.0 in stage 483.0 (TID 362). 1091 bytes result sent to driver2018-10-22 11:28:16 INFO tasksetmanager:54-finished task 0.0 in stage 483.0 (TID 3 4 ms on localhost (executor driver) (1/1) 2018-10-22 11:28:16 INFO taskschedulerimpl:54-removed TaskSet 483.0, whose tasks all completed, from pool 20 18-10-22 11:28:16 INFO dagscheduler:54-resultstage 483 (print at directkafka.scala:47) finished in 0.008 s2018-10- 11:28:16 INFO dagscheduler:54-job 241 Finished:print at Directkafka.scala:47, took 0.009993 s----------------- --------------------------time:1540178896000 Ms-------------------------------------------
Start producer
[Email protected] kafka_2.11-1.0.0]# bin/kafka-console-producer.sh--topic test--broker-list localhost:9092
> Hello
> Hello Me
View results:
(hello,2) (me,1) (you,1) 2018-10-22 11:57:08 INFO jobscheduler:54-finished job streaming job 1540180628000 ms.0 from job set of time 1540180628000 ms2018-10-22 11:57:08 INFO jobscheduler:54-total delay:0.119 s for time 1540180628000 MS (Executio n:0.072 s) 2018-10-22 11:57:08 info shuffledrdd:54-removing RDD 154 from persistence list2018-10-22 11:57:08 INFO mappartitionsrdd:54-removing RDD 153 from persistence list2018-10-22 11:57:08 INFO blockmanager:54-removin G RDD 1532018-10-22 11:57:08 info blockmanager:54-removing RDD 1542018-10-22 11:57:08 info mappartitionsrdd: 54-removing Rdd from Persistence list2018-10-22 11:57:08 INFO blockmanager:54-removing RDD 1522018-10-22 11: 57:08 Info mappartitionsrdd:54-removing RDD 151 from persistence list2018-10-22 11:57:08 INFO BlockManager:5 4-removing rdd 1512018-10-22 11:57:08 INFO kafkardd:54-removing rdd from Persistence list2018-10-22 11:57:08 INFO &NBSp Blockmanager:54-removing RDD 150
Scala spark-streaming Integrated Kafka (Spark 2.3 Kafka 0.10)