Spark version is 1.0
Kafka version is 0.8
Let's take a look at the architecture diagram of Kafka for more information please refer to the official
I have three machines on my side. For Kafka Log Collection
A 192.168.1.1 for server
B 192.168.1.2 for Producer
C 192.168.1.3 for Consumer
First, execute the following command in the Kafka installation directory on a
./kafka-server-start.sh. /config/server.properties
Start Kafka through NETSTAT-NPL to see if the default port 9092 is turned on
b for our Nginx log generation server, where the log is the website is written to Access-nginx.log in real time
So we can see the log information that the current Web site is requesting by Tail-f. Do not perform tail-f if you have a large website visit
Also we have to deploy Kafka on B, if you do not write Kafka client (view Client API address)
Execute the following command to push the data into the cluster
Tail-n 0-f /www/nh-nginx02/access.log | bin/kafka-console-producer.sh--broker-list 192.168.1.1:9092-- Topic sb-nginx03
So we push the log into the Kafka message.
C, now let's write consumer pull data, or deploy Kafka and execute the command.
SB--from-beginning
Parameters
–ZOOKEEPER Specifies the address and port of the zookeeper in your cluster.
–topic to match the name we specified when we push in B.
The above method is only for the shell command line, how to write consumer through spark.
Assuming you've downloaded the spark1.0 source, assume you've deployed an environment like SBT Scala.
The Scala code is as follows:
Package test Import java.util.Properties Import org.apache.spark.streaming._ import Org.apach
E.spark.streaming.streamingcontext._ Import org.apache.spark.streaming.kafka._ Import org.apache.spark.SparkConf Object Kafkatest { def main (args:array[string]) {if (Args.length < 5) {System.err. println ("Usage:kafkatest <zkQuorum> <group> <topics> <numThreads> <output>") SYSTEM.E XIT (1)} Val Array (Zkquorum, group, topics, numthreads,output) = args val sparkconf = new sparkconf (). Setappna Me ("Kafkatest") val ssc = new StreamingContext (sparkconf, Seconds (2)) Ssc.checkpoint ("Checkpoint") Val Topicpmap = Topics.split (","). Map ((_,numthreads.toint)). Tomap val lines = Kafkautils.createstream (SSC, Zkquorum, Grou P, Topicpmap). Map (_._2) lines.saveastextfiles (output) Ssc.start () ssc.awaittermination () //.saveas Textfile (output) }
}
and then compile
Mvn-phadoop-2.3-dhadoop.version=2.3.0-cdh5.0.1-dskiptests Package
Then the spark job commits
./bin/spark-submit --master local[*] --class org.apache.spark.fccs.KafkaTest./test/target/scala-2.10/ Spark-test-1.0.0-hadoop2.3.0-cdh5.0.1.jar zoo02 my-test sb-nginx03 1 hdfs://192.168.1.1:9100/tmp/ Spark-log.txt
The results are as follows:
Spark Scala