Recently, in the Test Flume combines Kafka with spark streaming experiments. Today, the simple combination of flume and spark to make a record here, to avoid users detours. There are not thoughtful places also want to pass by the great God a lot of advice.
The experiment is relatively simple, divided into two parts: first, Use avro-client send data two, Use Netcat Send Data
first the Spark program requires Two jar packages for Flume :
flume-ng-sdk-1.4.0,spark-streaming-flume_2.11-1.2.0
First, using avro-client to send data
1, write the Spark program, the function of the program is to receive Flume events
Import org.apache.log4j. {level, Logger}
Import org.apache.spark.SparkConf
Importorg.apache.spark.storage.StorageLevel
Import Org.apache.spark.streaming._
Import Org.apache.spark.streaming.flume._
Object flumeeventtest{
Defmain (Args:array[string]) {
Logger.getlogger ("Org.apache.spark"). SetLevel (Level.warn)
Logger.getlogger ("Org.apache.eclipse.jetty.server"). SetLevel (Level.off)
val hostname = args (0)
Val port = args (1). ToInt
Val batchinterval = args (2)
Val sparkconf = newsparkconf (). Setappname ("Flumeeventcount"). Setmaster ("local[2]")
Val SSC = new StreamingContext (sparkconf,batchinterval)
Valstream = Flumeutils.createstream (ssc,hostname,port,storagelevel.memory_only)
Stream.count (). Map (cnt = "Received" + cnt + "flumeevents."). Print ()
Ssc.start ()
Ssc.awaittermination ()
}
}
2. Flume configuration file Parameters
A1.channels = C1
A1.sinks = K1
A1.sources = R1
A1.sinks.k1.type = Avro
A1.sinks.k1.channel = C1
A1.sinks.k1.hostname = localhost
A1.sinks.k1.port = 9999
A1.sources.r1.type = Avro
A1.sources.r1.bind = localhost
A1.sources.r1.port = 44444
A1.sources.r1.channels = C1
A1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
here, you use Avro to send data to the 44444 port of Flume, and then flume send data to Spark via 9999 .
3. Run the Spark program:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/00/wKioL1VZ3-rTms-DAARGtjgpkZw139.jpg "title=" 1.png " alt= "Wkiol1vz3-rtms-daargtjgpkzw139.jpg"/>
4. start the flumeagent with the Flume configuration file
.. /bin/flume-ng agent--conf conf--conf-file./flume-conf.conf--name A1
-dflume.root.logger=info,console
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/00/wKioL1VZ4CDiMthYAAH_cOl6pjY346.jpg "title=" 1.png " alt= "Wkiol1vz4cdimthyaah_col6pjy346.jpg"/>
Spark Run Effect:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/06/wKiom1VZ3siTwDOUAASE2X0MoCA073.jpg "title=" 1.png " alt= "Wkiom1vz3sitwdouaase2x0moca073.jpg"/>
5. use Avro to send files:
./flume-ng avro-client--conf conf-hlocalhost-p 44444-f/opt/servicesclient/spark/spark/conf/ Spark-env.sh.template-dflume.root.logger=debug,console
Flume Agent Effect:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6D/00/wKioL1VZ4HzzHiyqAAJ3D7S2Pq8533.jpg "title=" 1.png " alt= "Wkiol1vz4hzzhiyqaaj3d7s2pq8533.jpg"/>
Spark Effect:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6D/06/wKiom1VZ3x3BKoWbAADO-QcZsr8975.jpg "title=" 1.png " alt= "Wkiom1vz3x3bkowbaado-qczsr8975.jpg"/>
Second, use Netcat Send Data
1, the Spark program Ibid .
2. Configure Flume parameters
A1.channels = C1
A1.sinks = K1
A1.sources = R1
A1.sinks.k1.type = Avro
A1.sinks.k1.channel = C1
A1.sinks.k1.hostname = localhost
A1.sinks.k1.port = 9999
A1.sources.r1.type = netcat
A1.sources.r1.bind = localhost
A1.sources.r1.port = 44444
A1.sources.r1.channels = C1
A1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
here, use telnet as the Flume data source
3. Run the Spark program Ibid .
4. start the flumeagent with the Flume configuration file
.. /bin/flume-ng agent--conf conf--conf-file./flume-conf.conf--name A1
-dflume.root.logger=info,console
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/06/wKiom1VZ347weQb5AAFLvljShX4407.jpg "title=" 1.png " alt= "Wkiom1vz347weqb5aaflvljshx4407.jpg"/>
Note: This uses netcat as the Flume data source, noting the difference between the effects of Avro as a source
5. Send data using telnet
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6D/06/wKiom1VZ37vxFf6qAACLv97PbkU112.jpg "title=" 1.png " alt= "Wkiom1vz37vxff6qaaclv97pbku112.jpg"/>
Spark Effect:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/00/wKioL1VZ4VPju0zuAAEJLaaGuZc992.jpg "title=" 1.png " alt= "Wkiol1vz4vpju0zuaaejlaaguzc992.jpg"/>
This is a simple demo, if you are really using flume to collect data in your project, using Kafka as a distributed message queue, using spark streaming real-time computing, you need to study flume and spark flow calculations in detail.
Some time ago to train the department to demonstrate a few examples of spark streaming: Text processing, network data processing, stateful operations and window operations, these days have time to collate, share to everyone. Includes two simple demos of Spark Mllib: K-means based user classification and film recommendation system based on collaborative filtering.
Today, I read the ML course at Stanford Andrew Ng, and it's great to share the links with you:
Http://open.163.com/special/opencourse/machinelearning.html
This article is from the "one step. One Step" blog, be sure to keep this source http://snglw.blog.51cto.com/5832405/1652508
Original address: http://snglw.blog.51cto.com/5832405/1652508
Flume combined with Spark test