How to do integration, in fact, especially simple, online is actually a tutorial.
http://blog.csdn.net/fighting_one_piece/article/details/40667035 look here. I'm using the first integration. When you do, there are a variety of problems. Probably from from 2014.12.17 5 o'clock in the morning to 2014.12.17 night 18 o'clock 30 summed up in fact very simple, but do a long time AH Ah!!! This kind of thing, a fall into your wit. Question 1, need to refer to a variety of packages, these packages to break into your jar, because the use of spark on yarn mode, so if not hit, in the cluster is unable to find the dependency package!!! Where to find it? Go straight to search.maven.org to find. Question 2: Because the spark on the yarn cluster, so listen to only listen to localhost, otherwise if you specify the IP, then not the node under the IP, it will be because the monitoring is not the problem problem 3:CDH in the flume start, you have to go to find/-name Flume.conf, look for it, and then find the latest, like the Cloudera Manager configuration file, then, Flume start with this profile problem 4: Do not use the cluster directly, first Test with a single point. Because a single point of test will find a variety of problems. Solve and then go to cluster test Problem 5: Be sure to pay attention to the version! The version of Spark in cdh5.2 is 1.1.0, and the plugin I used has been 1.1. Version 1!!! Well, for this, I got it from noon. This is going to a fall into your wit!!!
The Spark code is as follows:
Package Com.harkimport Java.io.Fileimport Org.apache.spark.SparkConfimport Org.apache.spark.storage.StorageLevelimport Org.apache.spark.streaming.flume.FlumeUtilsimport Org.apache.spark.streaming. {Seconds, Streamingcontext}import org.apache.spark.streaming.streamingcontext._/** * Created by Administrator on 2014-12-16. */object sparkstreamingflumetest {def main (args:array[string]) {//println ("Harkhark") Val Path = new File ("."). Getcanonicalpath ()//file workaround = new File ("."); System.getproperties (). Put ("Hadoop.home.dir", Path); New File ("./bin"). Mkdirs (); New File ("./bin/winutils.exe"). CreateNewFile (); Val sparkconf = new sparkconf (). Setappname ("Hdfswordcount"). Setmaster ("local[2]") val sparkconf = new sparkconf (). Set AppName ("Hdfswordcount")//Create the context val SSC = new StreamingContext (sparkconf, Seconds ())//val ho Stname = "127.0.0.1" val hostname = "localhost" val port = 2345 val storagelevel = storagelevel.memory_only Val flumestream = Flumeutils.createstream (SSC, hostname, port, Storagelevel) Flumestream.count (). Map (cnt = "rece Ived "+ cnt +" flume events. "). Print () Ssc.start () Ssc.awaittermination ()}}
The flume configuration file is as follows:
# paste flume.conf here. example:# Sources, channels, and sinks is defined per# agent name, in This Case' Tier1 '. Tier1.sources=Source1tier1.channels=channel1tier1.sinks=sink1# for each source, channel, and sink, set# standard Properties.tier1.sources.source1.type=Exectier1.sources.source1.command= Tail-f/opt/data/test3/123Tier1.sources.source1.channels=Channel1tier1.channels.channel1.type=Memory#tier1.sinks.sink1.type=Loggertier1.sinks.sink1.type=Avrotier1.sinks.sink1.hostname=Localhosttier1.sinks.sink1.port= 2345Tier1.sinks.sink1.channel=channel1# Other properties is specific to each type of yhx.hadoop.dn01# source, channel, or sink. Inch This Case, we# Specify the capacity of the memory channel.tier1.channels.channel1.capacity= 100
The Spark Start command is as follows:
Spark-submit--driver-memory 512m--executor-memory 512m--executor-cores 1 --num-executors 3--class Com.hark.SparkStreamingFlumeTest--deploy-mode cluster--master Yarn/opt/spark/sparktest.jar
The Flume Start command is as follows:
Flume-ng Agent--conf/opt/cloudera-manager/run/cloudera-scm-agent/process/585-flume-agent--conf-file/opt/ cloudera-manager/run/cloudera-scm-agent/process/585-flume-agent/flume.conf--name Tier1-dflume.root.logger=info, Console
Summary of the integration of spark streaming and flume in CDH environment