Python code:
Import Time fromPysparkImportSparkcontext fromPyspark.streamingImportStreamingContext fromPyspark.streaming.kafkaImportkafkautils fromoperatorImportADDSC= Sparkcontext (master="Local[1]", appname="Pythonsparkstreamingrokiddtsncount") SSC= StreamingContext (SC, 2) Zkquorum='localhost:2181'Topic= {'Rokid': 1}groupid="Test-consumer-group"Lines=Kafkautils.createstream (SSC, Zkquorum, GroupID, topic) Lines1= Lines.flatmap (LambdaX:x.split ("\ n")) Valuestr= Lines1.map (LambdaX:x.value.decode ()) Valuedict= Valuestr.map (Lambdax:eval (x)) message= Valuedict.map (Lambdax:x["message"]) Rdd2= Message.map (LambdaX: (Time.strftime ("%y-%m-%d", Time.localtime (Float (x.split ("\u0001") [0].split ("\u0002") [1])/1000) +"|"+x.split ("\u0001") [1].split ("\u0002") (1],1)). Map (LambdaX: (x[0],x[1])) Rdd3=Rdd2.reducebykey (add) rdd3.saveastextfiles ("/tmp/wordcount") Rdd3.pprint () Ssc.start () ssc.awaittermination ( )
Execution sparkstreaming:
Spark/bin/spark-submit--jars Spark-streaming-kafka-0-8-assembly_2.11-2.1.0.jar ReadFromKafkaStreaming.py
Which Spark-streaming-kafka-0.98-assembly_2.11-2.1.0.jar download from the following website
http://search.maven.org
As a primer reference.
Python3+spark2.1+kafka0.8+sparkstreaming