Recently, after listening to Liaoliang's 2016 Big Data spark "mushroom cloud" action, Flume,kafka and spark streaming need to be integrated.
Feel a moment difficult to get started, or start from the simple: my idea is that, flume produce data, and then output to spark streaming,flume source data is netcat (address: localhost, port 22222), The output is Avro (address: localhost, port is 11111). The processing of the Spark streaming is a direct output with several events.
One, configuration file
The Flume configuration file is as follows: Example5.properties
Note to add A1.sinks.k1.avro.useLocalTimeStamp = True, this sentence, otherwise, the general report such an error: "
unable to DeliverEvent. Exceptionfollows.
org.Apache.Flume.eventdeliveryexception: Java.Lang.NullPointerException: expectedtimestamp
”
Thank the brother for offering the solution:
Http://blog.selfup.cn/1601.html
a1.sources = r1a1.channels = c1a1.sinks = k1 a1.sources.r1.type = netcata1.sources.r1.bind = 192.168.0.10a1.sources.r1.port = 22222a1.sources.r1.channels = c1 a1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100 a1.sinks.k1.type = avroa1.sinks.k1.channel = c1a1.sinks.k1.hostname = 192.168.0.10a1.sinks.k1.port = 11111a1.sinks.k1.avro.uselocaltimestamp = true Second, write the processing code// Create a batch of streamingcontext,10 seconds Val ssc = new streamingcontext (Sparkconf, seconds ()) Val hostname = args (0) Val port = args (1) .tointval storagelevel = Storagelevel.memory_onlyval flumestream = flumeutils.createstream (Ssc, hostname, port ) Flumestream.count (). Map (cnt => "received " + cnt + " flume events." ). Print ()//start running Ssc.start ()//Calculate complete Exit ssc.awaittermination () Ssc.stop () Here is a pit is always reported not to find flumeutils, in fact, he in Spark-examples-1.6.1-hadoop2.6.0.jar this bag, I through the source code added to the package Ah, is not val sparkconf = new sparkconf (). Setappname ("Adclickedstreamingstats") .setmaster ("local[5]"). SetJars ( list ( "/lib/spark-1.6.1/spark-streaming-kafka_2.10-1.6.1.jar", "/lib/kafka-0.10.0/ Kafka-clients-0.10.0.1.jar ", "/lib/kafka-0.10.0/kafka_2.10-0.10.0.1.jar ", "/lib/ Spark-1.6.1/spark-streaming_2.10-1.6.1.jar ", "/lib/kafka-0.10.0/metrics-core-2.2.0.jar ", "/lib/kafka-0.10.0/zkclient-0.8.jar", "/lib/spark-1.6.1/mysql-connector-java-5.1.13-bin.jar", "/lib/spark-1.6.1/spark-examples-1.6.1-hadoop2.6.0.jar", "/opt/ Spark-1.5.0-bin-hadoop2.6/sparkapps.jar ) no way, Overlord the bow, or it, bin/spark-submit --class com.dT.spark.flume.sparkstreamingflume --jars /lib/spark-1.6.1/ spark-examples-1.6.1-hadoop2.6.0.jar --master local[5] sparkapps.jar 192.168.0.10 11111 the rest of the dishes! Run Test
1. Submit First
Submit in Spark, generate 11111 listener
* In/spark-submit--class Com.dt.spark.flume.SparkStreamingFlume
--jars/lib/spark-1.6.1/spark-examples-1.6.1-hadoop2.6.0.jar
--master local[5] Sparkapps.jar 192.168.0.10 11111
Otherwise, you will not be connected to port 11111.
2, Flume start
$ bin/flume-ng Agent--conf conf--conf-file example5.properties --name a1-dflume.root.logger=info,console
Because the Avro way, it will output to port 11111, then start 22222 port monitoring
There's a pit in the middle. Unable to create RPC client using hostname:192.168.0.10, port:11111
Such a mistake, the original is Bin/flume-ng agent--conf conf--conf-file conf/example5.properties--name A1-dflume.root.logger=info, Console
The name in the wrong
3. Trigger Data:
telnet localhost 22222
The input string, and then the effect appears on the console of the flume.
2016 Big data spark "mushroom cloud" action flume integration spark streaming