Flume combined with Spark test

Last Update:2015-05-19 Source: Internet

Author: User

Tags spark mllib

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, in the Test Flume combines Kafka with spark streaming experiments. Today, the simple combination of flume and spark to make a record here, to avoid users detours. There are not thoughtful places also want to pass by the great God a lot of advice.

The experiment is relatively simple, divided into two parts: first, Use avro-client send data two, Use Netcat Send Data

first the Spark program requires Two jar packages for Flume :

flume-ng-sdk-1.4.0,spark-streaming-flume_2.11-1.2.0

First, using avro-client to send data

1, write the Spark program, the function of the program is to receive Flume events

Import org.apache.log4j. {level, Logger}

Import org.apache.spark.SparkConf

Importorg.apache.spark.storage.StorageLevel

Import Org.apache.spark.streaming._

Import Org.apache.spark.streaming.flume._

Object flumeeventtest{

Defmain (Args:array[string]) {

Logger.getlogger ("Org.apache.spark"). SetLevel (Level.warn)

Logger.getlogger ("Org.apache.eclipse.jetty.server"). SetLevel (Level.off)

val hostname = args (0)

Val port = args (1). ToInt

Val batchinterval = args (2)

Val sparkconf = newsparkconf (). Setappname ("Flumeeventcount"). Setmaster ("local[2]")

Val SSC = new StreamingContext (sparkconf,batchinterval)

Valstream = Flumeutils.createstream (ssc,hostname,port,storagelevel.memory_only)

Stream.count (). Map (cnt = "Received" + cnt + "flumeevents."). Print ()

Ssc.start ()

Ssc.awaittermination ()

}

2. Flume configuration file Parameters

A1.channels = C1

A1.sinks = K1

A1.sources = R1

A1.sinks.k1.type = Avro

A1.sinks.k1.channel = C1

A1.sinks.k1.hostname = localhost

A1.sinks.k1.port = 9999

A1.sources.r1.type = Avro

A1.sources.r1.bind = localhost

A1.sources.r1.port = 44444

A1.sources.r1.channels = C1

A1.channels.c1.type = Memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

here, you use Avro to send data to the 44444 port of Flume, and then flume send data to Spark via 9999 .

3. Run the Spark program:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/00/wKioL1VZ3-rTms-DAARGtjgpkZw139.jpg "title=" 1.png " alt= "Wkiol1vz3-rtms-daargtjgpkzw139.jpg"/>

4. start the flumeagent with the Flume configuration file

.. /bin/flume-ng agent--conf conf--conf-file./flume-conf.conf--name A1

-dflume.root.logger=info,console

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/00/wKioL1VZ4CDiMthYAAH_cOl6pjY346.jpg "title=" 1.png " alt= "Wkiol1vz4cdimthyaah_col6pjy346.jpg"/>

Spark Run Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/06/wKiom1VZ3siTwDOUAASE2X0MoCA073.jpg "title=" 1.png " alt= "Wkiom1vz3sitwdouaase2x0moca073.jpg"/>

5. use Avro to send files:

./flume-ng avro-client--conf conf-hlocalhost-p 44444-f/opt/servicesclient/spark/spark/conf/ Spark-env.sh.template-dflume.root.logger=debug,console

Flume Agent Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6D/00/wKioL1VZ4HzzHiyqAAJ3D7S2Pq8533.jpg "title=" 1.png " alt= "Wkiol1vz4hzzhiyqaaj3d7s2pq8533.jpg"/>

Spark Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6D/06/wKiom1VZ3x3BKoWbAADO-QcZsr8975.jpg "title=" 1.png " alt= "Wkiom1vz3x3bkowbaado-qczsr8975.jpg"/>

Second, use Netcat Send Data

1, the Spark program Ibid .

2. Configure Flume parameters

A1.channels = C1

A1.sinks = K1

A1.sources = R1

A1.sinks.k1.type = Avro

A1.sinks.k1.channel = C1

A1.sinks.k1.hostname = localhost

A1.sinks.k1.port = 9999

A1.sources.r1.type = netcat

A1.sources.r1.bind = localhost

A1.sources.r1.port = 44444

A1.sources.r1.channels = C1

A1.channels.c1.type = Memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

here, use telnet as the Flume data source

3. Run the Spark program Ibid .

4. start the flumeagent with the Flume configuration file

.. /bin/flume-ng agent--conf conf--conf-file./flume-conf.conf--name A1

-dflume.root.logger=info,console

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/06/wKiom1VZ347weQb5AAFLvljShX4407.jpg "title=" 1.png " alt= "Wkiom1vz347weqb5aaflvljshx4407.jpg"/>

Note: This uses netcat as the Flume data source, noting the difference between the effects of Avro as a source

5. Send data using telnet

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/06/wKiom1VZ37vxFf6qAACLv97PbkU112.jpg "title=" 1.png " alt= "Wkiom1vz37vxff6qaaclv97pbku112.jpg"/>

Spark Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/00/wKioL1VZ4VPju0zuAAEJLaaGuZc992.jpg "title=" 1.png " alt= "Wkiol1vz4vpju0zuaaejlaaguzc992.jpg"/>

This is a simple demo, if you are really using flume to collect data in your project, using Kafka as a distributed message queue, using spark streaming real-time computing, you need to study flume and spark flow calculations in detail.

Some time ago to train the department to demonstrate a few examples of spark streaming: Text processing, network data processing, stateful operations and window operations, these days have time to collate, share to everyone. Includes two simple demos of Spark Mllib: K-means based user classification and film recommendation system based on collaborative filtering.

Today, I read the ML course at Stanford Andrew Ng, and it's great to share the links with you:

Http://open.163.com/special/opencourse/machinelearning.html

This article is from the "one step. One Step" blog, be sure to keep this source http://snglw.blog.51cto.com/5832405/1652508

Original address: http://snglw.blog.51cto.com/5832405/1652508

Flume combined with Spark test

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More