Spark reads the Kafka nginx Web log message and writes it to HDFs

Source: Internet
Author: User
Tags zookeeper

Spark version is 1.0
Kafka version is 0.8

Let's take a look at the architecture diagram of Kafka for more information please refer to the official


I have three machines on my side. For Kafka Log Collection
A 192.168.1.1 for server
B 192.168.1.2 for Producer
C 192.168.1.3 for Consumer

First, execute the following command in the Kafka installation directory on a

./kafka-server-start.sh. /config/server.properties

Start Kafka through NETSTAT-NPL to see if the default port 9092 is turned on

b for our Nginx log generation server, where the log is the website is written to Access-nginx.log in real time
So we can see the log information that the current Web site is requesting by Tail-f. Do not perform tail-f if you have a large website visit

Also we have to deploy Kafka on B, if you do not write Kafka client (view Client API address)

Execute the following command to push the data into the cluster

Tail-n 0-f   /www/nh-nginx02/access.log  | bin/kafka-console-producer.sh--broker-list 192.168.1.1:9092--  Topic sb-nginx03

So we push the log into the Kafka message.

C, now let's write consumer pull data, or deploy Kafka and execute the command.

SB--from-beginning

Parameters
–ZOOKEEPER Specifies the address and port of the zookeeper in your cluster.
–topic to match the name we specified when we push in B.

The above method is only for the shell command line, how to write consumer through spark.
Assuming you've downloaded the spark1.0 source, assume you've deployed an environment like SBT Scala.

The Scala code is as follows:

Package test   Import java.util.Properties       Import org.apache.spark.streaming._ import Org.apach
E.spark.streaming.streamingcontext._ Import org.apache.spark.streaming.kafka._ Import org.apache.spark.SparkConf     Object Kafkatest {  def main (args:array[string]) {if (Args.length < 5) {System.err. println ("Usage:kafkatest <zkQuorum> <group> <topics> <numThreads> <output>") SYSTEM.E XIT (1)} Val Array (Zkquorum, group, topics, numthreads,output) = args val sparkconf = new sparkconf (). Setappna  Me ("Kafkatest") val ssc = new StreamingContext (sparkconf, Seconds (2)) Ssc.checkpoint ("Checkpoint")   Val Topicpmap = Topics.split (","). Map ((_,numthreads.toint)). Tomap val lines = Kafkautils.createstream (SSC, Zkquorum, Grou P, Topicpmap). Map (_._2) lines.saveastextfiles (output) Ssc.start () ssc.awaittermination ()  //.saveas Textfile (output)    }
 } 

and then compile
Mvn-phadoop-2.3-dhadoop.version=2.3.0-cdh5.0.1-dskiptests Package

Then the spark job commits

./bin/spark-submit  --master local[*]  --class org.apache.spark.fccs.KafkaTest./test/target/scala-2.10/ Spark-test-1.0.0-hadoop2.3.0-cdh5.0.1.jar  zoo02 my-test  sb-nginx03 1 hdfs://192.168.1.1:9100/tmp/ Spark-log.txt

The results are as follows:
Spark Scala

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.