Flume combined with Spark test

Source: Internet
Author: User
Tags spark mllib

Recently, in the Test Flume combines Kafka with spark streaming experiments. Today, the simple combination of flume and spark to make a record here, to avoid users detours. There are not thoughtful places also want to pass by the great God a lot of advice.


The experiment is relatively simple, divided into two parts: first, Use avro-client send data two, Use Netcat Send Data

first the Spark program requires Two jar packages for Flume :

flume-ng-sdk-1.4.0,spark-streaming-flume_2.11-1.2.0


First, using avro-client to send data

1, write the Spark program, the function of the program is to receive Flume events


Import org.apache.log4j. {level, Logger}

Import org.apache.spark.SparkConf

Importorg.apache.spark.storage.StorageLevel

Import Org.apache.spark.streaming._

Import Org.apache.spark.streaming.flume._

Object flumeeventtest{

Defmain (Args:array[string]) {

Logger.getlogger ("Org.apache.spark"). SetLevel (Level.warn)

Logger.getlogger ("Org.apache.eclipse.jetty.server"). SetLevel (Level.off)

val hostname = args (0)

Val port = args (1). ToInt

Val batchinterval = args (2)

Val sparkconf = newsparkconf (). Setappname ("Flumeeventcount"). Setmaster ("local[2]")

Val SSC = new StreamingContext (sparkconf,batchinterval)

Valstream = Flumeutils.createstream (ssc,hostname,port,storagelevel.memory_only)

Stream.count (). Map (cnt = "Received" + cnt + "flumeevents."). Print ()

Ssc.start ()

Ssc.awaittermination ()

}

}


2. Flume configuration file Parameters


A1.channels = C1

A1.sinks = K1

A1.sources = R1

A1.sinks.k1.type = Avro

A1.sinks.k1.channel = C1

A1.sinks.k1.hostname = localhost

A1.sinks.k1.port = 9999

A1.sources.r1.type = Avro

A1.sources.r1.bind = localhost

A1.sources.r1.port = 44444

A1.sources.r1.channels = C1

A1.channels.c1.type = Memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100


here, you use Avro to send data to the 44444 port of Flume, and then flume send data to Spark via 9999 .


3. Run the Spark program:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/00/wKioL1VZ3-rTms-DAARGtjgpkZw139.jpg "title=" 1.png " alt= "Wkiol1vz3-rtms-daargtjgpkzw139.jpg"/>


4. start the flumeagent with the Flume configuration file

.. /bin/flume-ng agent--conf conf--conf-file./flume-conf.conf--name A1

-dflume.root.logger=info,console

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/00/wKioL1VZ4CDiMthYAAH_cOl6pjY346.jpg "title=" 1.png " alt= "Wkiol1vz4cdimthyaah_col6pjy346.jpg"/>

Spark Run Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/06/wKiom1VZ3siTwDOUAASE2X0MoCA073.jpg "title=" 1.png " alt= "Wkiom1vz3sitwdouaase2x0moca073.jpg"/>


5. use Avro to send files:

./flume-ng avro-client--conf conf-hlocalhost-p 44444-f/opt/servicesclient/spark/spark/conf/ Spark-env.sh.template-dflume.root.logger=debug,console


Flume Agent Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6D/00/wKioL1VZ4HzzHiyqAAJ3D7S2Pq8533.jpg "title=" 1.png " alt= "Wkiol1vz4hzzhiyqaaj3d7s2pq8533.jpg"/>

Spark Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6D/06/wKiom1VZ3x3BKoWbAADO-QcZsr8975.jpg "title=" 1.png " alt= "Wkiom1vz3x3bkowbaado-qczsr8975.jpg"/>


Second, use Netcat Send Data


1, the Spark program Ibid .

2. Configure Flume parameters

A1.channels = C1

A1.sinks = K1

A1.sources = R1

A1.sinks.k1.type = Avro

A1.sinks.k1.channel = C1

A1.sinks.k1.hostname = localhost

A1.sinks.k1.port = 9999

A1.sources.r1.type = netcat

A1.sources.r1.bind = localhost

A1.sources.r1.port = 44444

A1.sources.r1.channels = C1

A1.channels.c1.type = Memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100


here, use telnet as the Flume data source


3. Run the Spark program Ibid .


4. start the flumeagent with the Flume configuration file

.. /bin/flume-ng agent--conf conf--conf-file./flume-conf.conf--name A1

-dflume.root.logger=info,console

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/06/wKiom1VZ347weQb5AAFLvljShX4407.jpg "title=" 1.png " alt= "Wkiom1vz347weqb5aaflvljshx4407.jpg"/>

Note: This uses netcat as the Flume data source, noting the difference between the effects of Avro as a source


5. Send data using telnet

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/06/wKiom1VZ37vxFf6qAACLv97PbkU112.jpg "title=" 1.png " alt= "Wkiom1vz37vxff6qaaclv97pbku112.jpg"/>


Spark Effect:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/00/wKioL1VZ4VPju0zuAAEJLaaGuZc992.jpg "title=" 1.png " alt= "Wkiol1vz4vpju0zuaaejlaaguzc992.jpg"/>



This is a simple demo, if you are really using flume to collect data in your project, using Kafka as a distributed message queue, using spark streaming real-time computing, you need to study flume and spark flow calculations in detail.


Some time ago to train the department to demonstrate a few examples of spark streaming: Text processing, network data processing, stateful operations and window operations, these days have time to collate, share to everyone. Includes two simple demos of Spark Mllib: K-means based user classification and film recommendation system based on collaborative filtering.



Today, I read the ML course at Stanford Andrew Ng, and it's great to share the links with you:

Http://open.163.com/special/opencourse/machinelearning.html

This article is from the "one step. One Step" blog, be sure to keep this source http://snglw.blog.51cto.com/5832405/1652508

Original address: http://snglw.blog.51cto.com/5832405/1652508

Flume combined with Spark test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.