Spark version is 1.0
Kafka version is 0.8
 
Let's take a look at the architecture diagram of Kafka for more information please refer to the official
 
I have three machines on my side. For Kafka Log Collection
A 192.168.1.1 for server
B 192.168.1.2 for Producer
C 192.168.1.3 for Consumer
 
First, execute the following command in the Kafka installation directory on a
 
./kafka-server-start.sh. /config/server.properties
 
Start Kafka through NETSTAT-NPL to see if the default port 9092 is turned on
 
b for our Nginx log generation server, where the log is the website is written to Access-nginx.log in real time
So we can see the log information that the current Web site is requesting by Tail-f. Do not perform tail-f if you have a large website visit
 
Also we have to deploy Kafka on B, if you do not write Kafka client (view Client API address)
 
Execute the following command to push the data into the cluster
 
Tail-n 0-f   /www/nh-nginx02/access.log  | bin/kafka-console-producer.sh--broker-list 192.168.1.1:9092--  Topic sb-nginx03
 
So we push the log into the Kafka message.
 
C, now let's write consumer pull data, or deploy Kafka and execute the command.
 
SB--from-beginning
 
Parameters
–ZOOKEEPER Specifies the address and port of the zookeeper in your cluster.
–topic to match the name we specified when we push in B.
 
The above method is only for the shell command line, how to write consumer through spark.
Assuming you've downloaded the spark1.0 source, assume you've deployed an environment like SBT Scala.
 
The Scala code is as follows:
 
Package test   Import java.util.Properties       Import org.apache.spark.streaming._ import Org.apach
E.spark.streaming.streamingcontext._ Import org.apache.spark.streaming.kafka._ Import org.apache.spark.SparkConf     Object Kafkatest {  def main (args:array[string]) {if (Args.length < 5) {System.err. println ("Usage:kafkatest <zkQuorum> <group> <topics> <numThreads> <output>") SYSTEM.E XIT (1)} Val Array (Zkquorum, group, topics, numthreads,output) = args val sparkconf = new sparkconf (). Setappna  Me ("Kafkatest") val ssc = new StreamingContext (sparkconf, Seconds (2)) Ssc.checkpoint ("Checkpoint")   Val Topicpmap = Topics.split (","). Map ((_,numthreads.toint)). Tomap val lines = Kafkautils.createstream (SSC, Zkquorum, Grou P, Topicpmap). Map (_._2) lines.saveastextfiles (output) Ssc.start () ssc.awaittermination ()  //.saveas Textfile (output)    }
 }  
and then compile
Mvn-phadoop-2.3-dhadoop.version=2.3.0-cdh5.0.1-dskiptests Package
 
Then the spark job commits
 
./bin/spark-submit  --master local[*]  --class org.apache.spark.fccs.KafkaTest./test/target/scala-2.10/ Spark-test-1.0.0-hadoop2.3.0-cdh5.0.1.jar  zoo02 my-test  sb-nginx03 1 hdfs://192.168.1.1:9100/tmp/ Spark-log.txt
 
The results are as follows:
Spark Scala