Summary of the integration of spark streaming and flume in CDH environment

Source: Internet
Author: User

How to do integration, in fact, especially simple, online is actually a tutorial.

http://blog.csdn.net/fighting_one_piece/article/details/40667035 look here. I'm using the first integration.    When you do, there are a variety of problems.   Probably from from 2014.12.17 5 o'clock in the morning to 2014.12.17 night 18 o'clock 30 summed up in fact very simple, but do a long time AH Ah!!!  This kind of thing, a fall into your wit. Question 1, need to refer to a variety of packages, these packages to break into your jar, because the use of spark on yarn mode, so if not hit, in the cluster is unable to find the dependency package!!!  Where to find it? Go straight to search.maven.org to find. Question 2: Because the spark on the yarn cluster, so listen to only listen to localhost, otherwise if you specify the IP, then not the node under the IP, it will be because the monitoring is not the problem problem 3:CDH in the flume start, you have to go to find/-name Flume.conf, look for it, and then find the latest, like the Cloudera Manager configuration file, then, Flume start with this profile problem 4: Do not use the cluster directly, first Test with a single point. Because a single point of test will find a variety of problems.  Solve and then go to cluster test Problem 5: Be sure to pay attention to the version! The version of Spark in cdh5.2 is 1.1.0, and the plugin I used has been 1.1. Version 1!!!   Well, for this, I got it from noon. This is going to a fall into your wit!!! The Spark code is as follows
Package Com.harkimport Java.io.Fileimport Org.apache.spark.SparkConfimport Org.apache.spark.storage.StorageLevelimport Org.apache.spark.streaming.flume.FlumeUtilsimport Org.apache.spark.streaming. {Seconds, Streamingcontext}import org.apache.spark.streaming.streamingcontext._/** * Created by Administrator on  2014-12-16. */object sparkstreamingflumetest {def main (args:array[string]) {//println ("Harkhark") Val Path = new File (".").    Getcanonicalpath ()//file workaround = new File (".");    System.getproperties (). Put ("Hadoop.home.dir", Path);    New File ("./bin"). Mkdirs ();    New File ("./bin/winutils.exe"). CreateNewFile (); Val sparkconf = new sparkconf (). Setappname ("Hdfswordcount"). Setmaster ("local[2]") val sparkconf = new sparkconf (). Set AppName ("Hdfswordcount")//Create the context val SSC = new StreamingContext (sparkconf, Seconds ())//val ho Stname = "127.0.0.1" val hostname = "localhost" val port = 2345 val storagelevel = storagelevel.memory_only   Val flumestream = Flumeutils.createstream (SSC, hostname, port, Storagelevel) Flumestream.count (). Map (cnt = "rece Ived "+ cnt +" flume events. "). Print () Ssc.start () Ssc.awaittermination ()}}

  

The flume configuration file is as follows
# paste flume.conf here. example:# Sources, channels, and sinks is defined per# agent name, in This  Case' Tier1 '. Tier1.sources=Source1tier1.channels=channel1tier1.sinks=sink1# for each source, channel, and sink, set# standard Properties.tier1.sources.source1.type=Exectier1.sources.source1.command= Tail-f/opt/data/test3/123Tier1.sources.source1.channels=Channel1tier1.channels.channel1.type=Memory#tier1.sinks.sink1.type=Loggertier1.sinks.sink1.type=Avrotier1.sinks.sink1.hostname=Localhosttier1.sinks.sink1.port= 2345Tier1.sinks.sink1.channel=channel1# Other properties is specific to each type of yhx.hadoop.dn01# source, channel, or sink. Inch This  Case, we# Specify the capacity of the memory channel.tier1.channels.channel1.capacity= 100

The Spark Start command is as follows:
Spark-submit--driver-memory 512m--executor-memory 512m--executor-cores 1  --num-executors 3--class Com.hark.SparkStreamingFlumeTest--deploy-mode cluster--master Yarn/opt/spark/sparktest.jar

The Flume Start command is as follows:
Flume-ng Agent--conf/opt/cloudera-manager/run/cloudera-scm-agent/process/585-flume-agent--conf-file/opt/ cloudera-manager/run/cloudera-scm-agent/process/585-flume-agent/flume.conf--name Tier1-dflume.root.logger=info, Console

Summary of the integration of spark streaming and flume in CDH environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.