International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Spark reads the Kafka nginx Web log message and writes it to HDFs

Last Update:2018-07-26 Source: Internet

Author: User

Tags zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark version is 1.0
Kafka version is 0.8

Let's take a look at the architecture diagram of Kafka for more information please refer to the official

I have three machines on my side. For Kafka Log Collection
A 192.168.1.1 for server
B 192.168.1.2 for Producer
C 192.168.1.3 for Consumer

First, execute the following command in the Kafka installation directory on a

./kafka-server-start.sh. /config/server.properties

Start Kafka through NETSTAT-NPL to see if the default port 9092 is turned on

b for our Nginx log generation server, where the log is the website is written to Access-nginx.log in real time
So we can see the log information that the current Web site is requesting by Tail-f. Do not perform tail-f if you have a large website visit

Also we have to deploy Kafka on B, if you do not write Kafka client (view Client API address)

Execute the following command to push the data into the cluster

Tail-n 0-f   /www/nh-nginx02/access.log  | bin/kafka-console-producer.sh--broker-list 192.168.1.1:9092--  Topic sb-nginx03

So we push the log into the Kafka message.

C, now let's write consumer pull data, or deploy Kafka and execute the command.

SB--from-beginning

Parameters
–ZOOKEEPER Specifies the address and port of the zookeeper in your cluster.
–topic to match the name we specified when we push in B.

The above method is only for the shell command line, how to write consumer through spark.
Assuming you've downloaded the spark1.0 source, assume you've deployed an environment like SBT Scala.

The Scala code is as follows:

Package test   Import java.util.Properties       Import org.apache.spark.streaming._ import Org.apach
E.spark.streaming.streamingcontext._ Import org.apache.spark.streaming.kafka._ Import org.apache.spark.SparkConf     Object Kafkatest {  def main (args:array[string]) {if (Args.length < 5) {System.err. println ("Usage:kafkatest <zkQuorum> <group> <topics> <numThreads> <output>") SYSTEM.E XIT (1)} Val Array (Zkquorum, group, topics, numthreads,output) = args val sparkconf = new sparkconf (). Setappna  Me ("Kafkatest") val ssc = new StreamingContext (sparkconf, Seconds (2)) Ssc.checkpoint ("Checkpoint")   Val Topicpmap = Topics.split (","). Map ((_,numthreads.toint)). Tomap val lines = Kafkautils.createstream (SSC, Zkquorum, Grou P, Topicpmap). Map (_._2) lines.saveastextfiles (output) Ssc.start () ssc.awaittermination ()  //.saveas Textfile (output)    }
 }

and then compile
Mvn-phadoop-2.3-dhadoop.version=2.3.0-cdh5.0.1-dskiptests Package

Then the spark job commits

./bin/spark-submit  --master local[*]  --class org.apache.spark.fccs.KafkaTest./test/target/scala-2.10/ Spark-test-1.0.0-hadoop2.3.0-cdh5.0.1.jar  zoo02 my-test  sb-nginx03 1 hdfs://192.168.1.1:9100/tmp/ Spark-log.txt

The results are as follows:
Spark Scala

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

kafka and spark tutorial kafka and spark spark connect to kafka kafka and spark streaming example difference between kafka and spark streaming spark kafka real time stream processing using kafka and spark

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark reads the Kafka nginx Web log message and writes it to HDFs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support