Spark reads the Kafka nginx Web log message and writes it to HDFs

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark version is 1.0
Kafka version is 0.8

Let's take a look at the architecture diagram of Kafka for more information please refer to the official

I have three machines on my side. For Kafka Log Collection
A 192.168.1.1 for server
B 192.168.1.2 for Producer
C 192.168.1.3 for Consumer

First, execute the following command in the Kafka installation directory on a

./kafka-server-start.sh. /config/server.properties

Start Kafka through NETSTAT-NPL to see if the default port 9092 is turned on

b for our Nginx log generation server, where the log is the website is written to Access-nginx.log in real time
So we can see the log information that the current Web site is requesting by Tail-f. Do not perform tail-f if you have a large website visit

Also we have to deploy Kafka on B, if you do not write Kafka client (view Client API address)

Execute the following command to push the data into the cluster

Tail-n 0-f   /www/nh-nginx02/access.log  | bin/kafka-console-producer.sh--broker-list 192.168.1.1:9092--  Topic sb-nginx03

So we push the log into the Kafka message.

C, now let's write consumer pull data, or deploy Kafka and execute the command.

SB--from-beginning

Parameters
–ZOOKEEPER Specifies the address and port of the zookeeper in your cluster.
–topic to match the name we specified when we push in B.

The above method is only for the shell command line, how to write consumer through spark.
Assuming you've downloaded the spark1.0 source, assume you've deployed an environment like SBT Scala.

The Scala code is as follows:

Package test   Import java.util.Properties       Import org.apache.spark.streaming._ import Org.apach
E.spark.streaming.streamingcontext._ Import org.apache.spark.streaming.kafka._ Import org.apache.spark.SparkConf     Object Kafkatest {  def main (args:array[string]) {if (Args.length < 5) {System.err. println ("Usage:kafkatest <zkQuorum> <group> <topics> <numThreads> <output>") SYSTEM.E XIT (1)} Val Array (Zkquorum, group, topics, numthreads,output) = args val sparkconf = new sparkconf (). Setappna  Me ("Kafkatest") val ssc = new StreamingContext (sparkconf, Seconds (2)) Ssc.checkpoint ("Checkpoint")   Val Topicpmap = Topics.split (","). Map ((_,numthreads.toint)). Tomap val lines = Kafkautils.createstream (SSC, Zkquorum, Grou P, Topicpmap). Map (_._2) lines.saveastextfiles (output) Ssc.start () ssc.awaittermination ()  //.saveas Textfile (output)    }
 }

and then compile
Mvn-phadoop-2.3-dhadoop.version=2.3.0-cdh5.0.1-dskiptests Package

Then the spark job commits

./bin/spark-submit  --master local[*]  --class org.apache.spark.fccs.KafkaTest./test/target/scala-2.10/ Spark-test-1.0.0-hadoop2.3.0-cdh5.0.1.jar  zoo02 my-test  sb-nginx03 1 hdfs://192.168.1.1:9100/tmp/ Spark-log.txt

The results are as follows:
Spark Scala

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark reads the Kafka nginx Web log message and writes it to HDFs

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support