Use of Flume

Source: Internet
Author: User

Installation and configuration of 1.flume

1.1 Configuring Java_home, modifying the/opt/cdh/flume-1.5.0-cdh5.3.6/conf/flume-env.sh file

    

1.2 Configuring HDFS Integration

1.2.1 Add HDFs jar package to/opt/cdh/flume-1.5.0-cdh5.3.6/lib directory

Commons-configuration-1.6.jar

Hadoop-common-2.5.0-cdh5.3.6.jar

Hadoop-hdfs-2.5.0-cdh5.3.6.jar

Hadoop-auth-2.5.0-cdh5.3.6.jar

1.3 View Flume versions Bin/flume-ng version

    

2. Write the first agent case, Source using NetCat source,channel using memory channel,sink using logger sink

2.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/a1-conf.properties

    

# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in the case called ' Agent ' #定义agent的三要素: source, channel, sinka1.sources = S1a1.channels = C1a1.sinks = k1# definition sourcea1.sources.s1.type=netcata1.sources.s1.bind=life-hadoop.life.coma1.sources.s1.port=44444# Definition channela1.channels.c1.type=memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Defines the relationship between sinka1.sinks.k1.type=logger# definitions A1.sources.s1.channels=c1a1.sinks.k1.channel=c1

2.2 Installing Telnet

sudo rpm-ivh xinetd-2.3.14-38.el6.x86_64.rpm telnet-0.17-47.el6_3.1.x86_64.rpm telnet-server-0.17-47.el6_3.1.x86_ 64.rpm

sudo/etc/rc.d/init.d/xinetd restart

    

2.3 Start Agent

Bin/flume-ng Agent--conf conf/--name A1--conf-file Conf/a1-conf.properties-dflume.root.logger=debug,console

    

2.4 Connecting to Telnet and testing

Telnet life-hadoop.life.com 44444

    

    

3. Write a second agent to collect hive logs to HDFs in real time

3.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/hive-tail-conf.properties

    

# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in this case called ' Agent ' # Real-time collection of hive logs to HDFs file system # defines three elements of the agent: source, channel, sinka2.sources = S2a2.channe ls = c2a2.sinks = k2# definition sourcea2.sources.s2.type=execa2.sources.s2.command=tail-f/opt/cdh/hive-0.13.1-cdh5.3.6/logs /hive.log# definition channela2.channels.c2.type=memorya2.channels.c2.capacity = 1000a2.channels.c2.transactioncapacity = 100# definition Sinka2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/ hive-tail# each time you refresh the number of event to HDFs, default: 100a2.sinks.k2.hdfs.batchsize=10# Modify the file type, Default: sequencefilea2.sinks.k2.hdfs.filetype=datastream# Modify the Write format of the file, default: writablea2.sinks.k2.hdfs.writeformat=text# Define the relationship between the three A2.SOURCES.S2.CHANNELS=C2A2.SINKS.K2.CHANNEL=C2

3.2 Start Flume client Start collection

Bin/flume-ng Agent--conf conf/--name A2--conf-file conf/hive-tail-conf.properties-dflume.root.logger=debug,console

3.3 Start the Hive client and view the changes under the User/yanglin/flume/hive-tail directory of the HDFS system

    

3.4 For a Hadoop cluster configured with HA, we need to copy the Core-site.xml and Hdfs-site.xml to the Conf directory of the Flume installation directory

3.5 Regular expressions can be used in A2.sinks.k2.hdfs.path if you need to automatically create different directories in HDFs based on time

hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail-time/%y%m%d

Also, you must specify: A2.sinks.k2.hdfs.uselocaltimestamp=true

4. Third agent case, using spooling source to monitor files in a directory in real time, extracted into the HDFS system by a qualified file

4.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/spooling-conf.properties

# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in this case called ' Agent ' # Real-time viewing of file changes under the specified directory to collect eligible files into the HDFs file system # Three elements defining the agent: source, channel, sinka3.sources = S3a3.channels = C3a3.sinks = k3# definition sourcea3.sources.s3.type=spooldira3.sources.s3.spooldir=/opt/cdh/ flume-1.5.0-cdh5.3.6/spooling/logs# set the suffix name of the file after collection is complete, default:. completeda3.sources.s3.filesuffix=.delete# those files that are set in the specified directory are not collected, by default: All are collected a3.sources.s3.ignorepattern=^ (.) *\\.log$ #定义channela3. channels.c3.type=filea3.channels.c3.capacity = 1000a3.channels.c3.transactioncapacity = 100a3.channels.c3.checkpointdir =/opt/cdh/flume-1.5.0-cdh5.3.6/spooling/checkpointa3.channels.c3.datadirs =/opt/ cdh/flume-1.5.0-cdh5.3.6/spooling/data# definition Sinka3.sinks.k3.type=hdfsa3.sinks.k3.hdfs.path = hdfs:// life-hadoop.life.com:8020/user/yanglin/flume/spooling-logs/%y%m%d# each time it refreshes the number of event to HDFs, Default: 100a3.sinks.k3.hdfs.batchsize=10# modify file type, default: sequencefilea3.sinks.k3.hdfs.filetype=datastream#Modify the file's write format, default: writablea3.sinks.k3.hdfs.writeformat=text# set the header with a timestamp a3.sinks.k3.hdfs.uselocaltimestamp=true# Define the relationship between the three A3.SOURCES.S3.CHANNELS=C3A3.SINKS.K3.CHANNEL=C3

4.2 Start flume Client for monitoring and collection

Bin/flume-ng Agent--conf conf/--name A3--conf-file Conf/spooling-conf.properties-dflume.root.logger=debug,console

4.3 Viewing collection results

    

    

    

    

    

Use of Flume

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.