Use of Flume

Last Update:2016-09-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Installation and configuration of 1.flume

1.1 Configuring Java_home, modifying the/opt/cdh/flume-1.5.0-cdh5.3.6/conf/flume-env.sh file

1.2 Configuring HDFS Integration

1.2.1 Add HDFs jar package to/opt/cdh/flume-1.5.0-cdh5.3.6/lib directory

Commons-configuration-1.6.jar

Hadoop-common-2.5.0-cdh5.3.6.jar

Hadoop-hdfs-2.5.0-cdh5.3.6.jar

Hadoop-auth-2.5.0-cdh5.3.6.jar

1.3 View Flume versions Bin/flume-ng version

2. Write the first agent case, Source using NetCat source,channel using memory channel,sink using logger sink

2.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/a1-conf.properties

# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in the case called ' Agent ' #定义agent的三要素: source, channel, sinka1.sources = S1a1.channels = C1a1.sinks = k1# definition sourcea1.sources.s1.type=netcata1.sources.s1.bind=life-hadoop.life.coma1.sources.s1.port=44444# Definition channela1.channels.c1.type=memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Defines the relationship between sinka1.sinks.k1.type=logger# definitions A1.sources.s1.channels=c1a1.sinks.k1.channel=c1

2.2 Installing Telnet

sudo rpm-ivh xinetd-2.3.14-38.el6.x86_64.rpm telnet-0.17-47.el6_3.1.x86_64.rpm telnet-server-0.17-47.el6_3.1.x86_ 64.rpm

sudo/etc/rc.d/init.d/xinetd restart

2.3 Start Agent

Bin/flume-ng Agent--conf conf/--name A1--conf-file Conf/a1-conf.properties-dflume.root.logger=debug,console

2.4 Connecting to Telnet and testing

Telnet life-hadoop.life.com 44444

3. Write a second agent to collect hive logs to HDFs in real time

3.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/hive-tail-conf.properties

# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in this case called ' Agent ' # Real-time collection of hive logs to HDFs file system # defines three elements of the agent: source, channel, sinka2.sources = S2a2.channe ls = c2a2.sinks = k2# definition sourcea2.sources.s2.type=execa2.sources.s2.command=tail-f/opt/cdh/hive-0.13.1-cdh5.3.6/logs /hive.log# definition channela2.channels.c2.type=memorya2.channels.c2.capacity = 1000a2.channels.c2.transactioncapacity = 100# definition Sinka2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/ hive-tail# each time you refresh the number of event to HDFs, default: 100a2.sinks.k2.hdfs.batchsize=10# Modify the file type, Default: sequencefilea2.sinks.k2.hdfs.filetype=datastream# Modify the Write format of the file, default: writablea2.sinks.k2.hdfs.writeformat=text# Define the relationship between the three A2.SOURCES.S2.CHANNELS=C2A2.SINKS.K2.CHANNEL=C2

3.2 Start Flume client Start collection

Bin/flume-ng Agent--conf conf/--name A2--conf-file conf/hive-tail-conf.properties-dflume.root.logger=debug,console

3.3 Start the Hive client and view the changes under the User/yanglin/flume/hive-tail directory of the HDFS system

3.4 For a Hadoop cluster configured with HA, we need to copy the Core-site.xml and Hdfs-site.xml to the Conf directory of the Flume installation directory

3.5 Regular expressions can be used in A2.sinks.k2.hdfs.path if you need to automatically create different directories in HDFs based on time

hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail-time/%y%m%d

Also, you must specify: A2.sinks.k2.hdfs.uselocaltimestamp=true

4. Third agent case, using spooling source to monitor files in a directory in real time, extracted into the HDFS system by a qualified file

4.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/spooling-conf.properties

# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in this case called ' Agent ' # Real-time viewing of file changes under the specified directory to collect eligible files into the HDFs file system # Three elements defining the agent: source, channel, sinka3.sources = S3a3.channels = C3a3.sinks = k3# definition sourcea3.sources.s3.type=spooldira3.sources.s3.spooldir=/opt/cdh/ flume-1.5.0-cdh5.3.6/spooling/logs# set the suffix name of the file after collection is complete, default:. completeda3.sources.s3.filesuffix=.delete# those files that are set in the specified directory are not collected, by default: All are collected a3.sources.s3.ignorepattern=^ (.) *\\.log$ #定义channela3. channels.c3.type=filea3.channels.c3.capacity = 1000a3.channels.c3.transactioncapacity = 100a3.channels.c3.checkpointdir =/opt/cdh/flume-1.5.0-cdh5.3.6/spooling/checkpointa3.channels.c3.datadirs =/opt/ cdh/flume-1.5.0-cdh5.3.6/spooling/data# definition Sinka3.sinks.k3.type=hdfsa3.sinks.k3.hdfs.path = hdfs:// life-hadoop.life.com:8020/user/yanglin/flume/spooling-logs/%y%m%d# each time it refreshes the number of event to HDFs, Default: 100a3.sinks.k3.hdfs.batchsize=10# modify file type, default: sequencefilea3.sinks.k3.hdfs.filetype=datastream#Modify the file's write format, default: writablea3.sinks.k3.hdfs.writeformat=text# set the header with a timestamp a3.sinks.k3.hdfs.uselocaltimestamp=true# Define the relationship between the three A3.SOURCES.S3.CHANNELS=C3A3.SINKS.K3.CHANNEL=C3

4.2 Start flume Client for monitoring and collection

Bin/flume-ng Agent--conf conf/--name A3--conf-file Conf/spooling-conf.properties-dflume.root.logger=debug,console

4.3 Viewing collection results

Use of Flume

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use of Flume

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use of Flume

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support