2016 Big data spark "mushroom cloud" action flume integration spark streaming

Source: Internet
Author: User

Recently, after listening to Liaoliang's 2016 Big Data spark "mushroom cloud" action, Flume,kafka and spark streaming need to be integrated.

Feel a moment difficult to get started, or start from the simple: my idea is that, flume produce data, and then output to spark streaming,flume source data is netcat (address: localhost, port 22222), The output is Avro (address: localhost, port is 11111). The processing of the Spark streaming is a direct output with several events.


One, configuration file

The Flume configuration file is as follows: Example5.properties

Note to add A1.sinks.k1.avro.useLocalTimeStamp = True, this sentence, otherwise, the general report such an error: "

unable to DeliverEvent. Exceptionfollows.

org.Apache.Flume.eventdeliveryexception: Java.Lang.NullPointerException: expectedtimestamp


Thank the brother for offering the solution:

Http://blog.selfup.cn/1601.html

a1.sources = r1a1.channels = c1a1.sinks = k1  a1.sources.r1.type  = netcata1.sources.r1.bind = 192.168.0.10a1.sources.r1.port =  22222a1.sources.r1.channels = c1  a1.channels.c1.type =  memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100   a1.sinks.k1.type = avroa1.sinks.k1.channel = c1a1.sinks.k1.hostname =  192.168.0.10a1.sinks.k1.port = 11111a1.sinks.k1.avro.uselocaltimestamp = true Second, write the processing code// Create a batch of streamingcontext,10 seconds Val ssc = new streamingcontext (Sparkconf, seconds ()) Val  hostname = args (0) Val port = args (1) .tointval storagelevel =  Storagelevel.memory_onlyval flumestream = flumeutils.createstream (Ssc, hostname, port ) Flumestream.count (). Map (cnt =>  "received "  + cnt +  " flume events."  ). Print ()//start running Ssc.start ()//Calculate complete Exit ssc.awaittermination () Ssc.stop ()      Here is a pit is always reported not to find flumeutils, in fact, he in Spark-examples-1.6.1-hadoop2.6.0.jar this bag, I through the source code added to the package Ah, is not val sparkconf  = new sparkconf (). Setappname ("Adclickedstreamingstats")   .setmaster ("local[5]"). SetJars (  list (   "/lib/spark-1.6.1/spark-streaming-kafka_2.10-1.6.1.jar",   "/lib/kafka-0.10.0/ Kafka-clients-0.10.0.1.jar ",  "/lib/kafka-0.10.0/kafka_2.10-0.10.0.1.jar ",  "/lib/ Spark-1.6.1/spark-streaming_2.10-1.6.1.jar ",  "/lib/kafka-0.10.0/metrics-core-2.2.0.jar ",    "/lib/kafka-0.10.0/zkclient-0.8.jar",   "/lib/spark-1.6.1/mysql-connector-java-5.1.13-bin.jar",    "/lib/spark-1.6.1/spark-examples-1.6.1-hadoop2.6.0.jar",   "/opt/ Spark-1.5.0-bin-hadoop2.6/sparkapps.jar )      no way, Overlord the bow, or it,   bin/spark-submit  --class com.dT.spark.flume.sparkstreamingflume   --jars /lib/spark-1.6.1/ spark-examples-1.6.1-hadoop2.6.0.jar    --master local[5] sparkapps.jar  192.168.0.10 11111     the rest of the dishes!    Run Test

1. Submit First

Submit in Spark, generate 11111 listener
* In/spark-submit--class Com.dt.spark.flume.SparkStreamingFlume
--jars/lib/spark-1.6.1/spark-examples-1.6.1-hadoop2.6.0.jar
--master local[5] Sparkapps.jar 192.168.0.10 11111

Otherwise, you will not be connected to port 11111.


2, Flume start

$ bin/flume-ng Agent--conf conf--conf-file example5.properties --name a1-dflume.root.logger=info,console

Because the Avro way, it will output to port 11111, then start 22222 port monitoring

There's a pit in the middle. Unable to create RPC client using hostname:192.168.0.10, port:11111

Such a mistake, the original is Bin/flume-ng agent--conf conf--conf-file conf/example5.properties--name A1-dflume.root.logger=info, Console

The name in the wrong


3. Trigger Data:

telnet localhost 22222

The input string, and then the effect appears on the console of the flume.


2016 Big data spark "mushroom cloud" action flume integration spark streaming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.