Summary of the integration of spark streaming and flume in CDH environment

Last Update:2014-12-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How to do integration, in fact, especially simple, online is actually a tutorial.

http://blog.csdn.net/fighting_one_piece/article/details/40667035 look here. I'm using the first integration. When you do, there are a variety of problems. Probably from from 2014.12.17 5 o'clock in the morning to 2014.12.17 night 18 o'clock 30 summed up in fact very simple, but do a long time AH Ah!!! This kind of thing, a fall into your wit. Question 1, need to refer to a variety of packages, these packages to break into your jar, because the use of spark on yarn mode, so if not hit, in the cluster is unable to find the dependency package!!! Where to find it? Go straight to search.maven.org to find. Question 2: Because the spark on the yarn cluster, so listen to only listen to localhost, otherwise if you specify the IP, then not the node under the IP, it will be because the monitoring is not the problem problem 3:CDH in the flume start, you have to go to find/-name Flume.conf, look for it, and then find the latest, like the Cloudera Manager configuration file, then, Flume start with this profile problem 4: Do not use the cluster directly, first Test with a single point. Because a single point of test will find a variety of problems. Solve and then go to cluster test Problem 5: Be sure to pay attention to the version! The version of Spark in cdh5.2 is 1.1.0, and the plugin I used has been 1.1. Version 1!!! Well, for this, I got it from noon. This is going to a fall into your wit!!! The Spark code is as follows：

Package Com.harkimport Java.io.Fileimport Org.apache.spark.SparkConfimport Org.apache.spark.storage.StorageLevelimport Org.apache.spark.streaming.flume.FlumeUtilsimport Org.apache.spark.streaming. {Seconds, Streamingcontext}import org.apache.spark.streaming.streamingcontext._/** * Created by Administrator on  2014-12-16. */object sparkstreamingflumetest {def main (args:array[string]) {//println ("Harkhark") Val Path = new File (".").    Getcanonicalpath ()//file workaround = new File (".");    System.getproperties (). Put ("Hadoop.home.dir", Path);    New File ("./bin"). Mkdirs ();    New File ("./bin/winutils.exe"). CreateNewFile (); Val sparkconf = new sparkconf (). Setappname ("Hdfswordcount"). Setmaster ("local[2]") val sparkconf = new sparkconf (). Set AppName ("Hdfswordcount")//Create the context val SSC = new StreamingContext (sparkconf, Seconds ())//val ho Stname = "127.0.0.1" val hostname = "localhost" val port = 2345 val storagelevel = storagelevel.memory_only   Val flumestream = Flumeutils.createstream (SSC, hostname, port, Storagelevel) Flumestream.count (). Map (cnt = "rece Ived "+ cnt +" flume events. "). Print () Ssc.start () Ssc.awaittermination ()}}

The flume configuration file is as follows：

# paste flume.conf here. example:# Sources, channels, and sinks is defined per# agent name, in This  Case' Tier1 '. Tier1.sources=Source1tier1.channels=channel1tier1.sinks=sink1# for each source, channel, and sink, set# standard Properties.tier1.sources.source1.type=Exectier1.sources.source1.command= Tail-f/opt/data/test3/123Tier1.sources.source1.channels=Channel1tier1.channels.channel1.type=Memory#tier1.sinks.sink1.type=Loggertier1.sinks.sink1.type=Avrotier1.sinks.sink1.hostname=Localhosttier1.sinks.sink1.port= 2345Tier1.sinks.sink1.channel=channel1# Other properties is specific to each type of yhx.hadoop.dn01# source, channel, or sink. Inch This  Case, we# Specify the capacity of the memory channel.tier1.channels.channel1.capacity= 100

The Spark Start command is as follows:

Spark-submit--driver-memory 512m--executor-memory 512m--executor-cores 1  --num-executors 3--class Com.hark.SparkStreamingFlumeTest--deploy-mode cluster--master Yarn/opt/spark/sparktest.jar

The Flume Start command is as follows:

Flume-ng Agent--conf/opt/cloudera-manager/run/cloudera-scm-agent/process/585-flume-agent--conf-file/opt/ cloudera-manager/run/cloudera-scm-agent/process/585-flume-agent/flume.conf--name Tier1-dflume.root.logger=info, Console

Summary of the integration of spark streaming and flume in CDH environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of the integration of spark streaming and flume in CDH environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of the integration of spark streaming and flume in CDH environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support