Installation and configuration of 1.flume
1.1 Configuring Java_home, modifying the/opt/cdh/flume-1.5.0-cdh5.3.6/conf/flume-env.sh file
1.2 Configuring HDFS Integration
1.2.1 Add HDFs jar package to/opt/cdh/flume-1.5.0-cdh5.3.6/lib directory
Commons-configuration-1.6.jar
Hadoop-common-2.5.0-cdh5.3.6.jar
Hadoop-hdfs-2.5.0-cdh5.3.6.jar
Hadoop-auth-2.5.0-cdh5.3.6.jar
1.3 View Flume versions Bin/flume-ng version
2. Write the first agent case, Source using NetCat source,channel using memory channel,sink using logger sink
2.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/a1-conf.properties
# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in the case called ' Agent ' #定义agent的三要素: source, channel, sinka1.sources = S1a1.channels = C1a1.sinks = k1# definition sourcea1.sources.s1.type=netcata1.sources.s1.bind=life-hadoop.life.coma1.sources.s1.port=44444# Definition channela1.channels.c1.type=memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Defines the relationship between sinka1.sinks.k1.type=logger# definitions A1.sources.s1.channels=c1a1.sinks.k1.channel=c1
2.2 Installing Telnet
sudo rpm-ivh xinetd-2.3.14-38.el6.x86_64.rpm telnet-0.17-47.el6_3.1.x86_64.rpm telnet-server-0.17-47.el6_3.1.x86_ 64.rpm
sudo/etc/rc.d/init.d/xinetd restart
2.3 Start Agent
Bin/flume-ng Agent--conf conf/--name A1--conf-file Conf/a1-conf.properties-dflume.root.logger=debug,console
2.4 Connecting to Telnet and testing
Telnet life-hadoop.life.com 44444
3. Write a second agent to collect hive logs to HDFs in real time
3.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/hive-tail-conf.properties
# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in this case called ' Agent ' # Real-time collection of hive logs to HDFs file system # defines three elements of the agent: source, channel, sinka2.sources = S2a2.channe ls = c2a2.sinks = k2# definition sourcea2.sources.s2.type=execa2.sources.s2.command=tail-f/opt/cdh/hive-0.13.1-cdh5.3.6/logs /hive.log# definition channela2.channels.c2.type=memorya2.channels.c2.capacity = 1000a2.channels.c2.transactioncapacity = 100# definition Sinka2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path = hdfs://life-hadoop.life.com:8020/user/yanglin/flume/ hive-tail# each time you refresh the number of event to HDFs, default: 100a2.sinks.k2.hdfs.batchsize=10# Modify the file type, Default: sequencefilea2.sinks.k2.hdfs.filetype=datastream# Modify the Write format of the file, default: writablea2.sinks.k2.hdfs.writeformat=text# Define the relationship between the three A2.SOURCES.S2.CHANNELS=C2A2.SINKS.K2.CHANNEL=C2
3.2 Start Flume client Start collection
Bin/flume-ng Agent--conf conf/--name A2--conf-file conf/hive-tail-conf.properties-dflume.root.logger=debug,console
3.3 Start the Hive client and view the changes under the User/yanglin/flume/hive-tail directory of the HDFS system
3.4 For a Hadoop cluster configured with HA, we need to copy the Core-site.xml and Hdfs-site.xml to the Conf directory of the Flume installation directory
3.5 Regular expressions can be used in A2.sinks.k2.hdfs.path if you need to automatically create different directories in HDFs based on time
hdfs://life-hadoop.life.com:8020/user/yanglin/flume/hive-tail-time/%y%m%d
Also, you must specify: A2.sinks.k2.hdfs.uselocaltimestamp=true
4. Third agent case, using spooling source to monitor files in a directory in real time, extracted into the HDFS system by a qualified file
4.1 Writing/opt/cdh/flume-1.5.0-cdh5.3.6/conf/spooling-conf.properties
# The configuration file needs to define the sources, # the channels and the sinks.# sources, channels and sinks is Defin Ed per agent, # in this case called ' Agent ' # Real-time viewing of file changes under the specified directory to collect eligible files into the HDFs file system # Three elements defining the agent: source, channel, sinka3.sources = S3a3.channels = C3a3.sinks = k3# definition sourcea3.sources.s3.type=spooldira3.sources.s3.spooldir=/opt/cdh/ flume-1.5.0-cdh5.3.6/spooling/logs# set the suffix name of the file after collection is complete, default:. completeda3.sources.s3.filesuffix=.delete# those files that are set in the specified directory are not collected, by default: All are collected a3.sources.s3.ignorepattern=^ (.) *\\.log$ #定义channela3. channels.c3.type=filea3.channels.c3.capacity = 1000a3.channels.c3.transactioncapacity = 100a3.channels.c3.checkpointdir =/opt/cdh/flume-1.5.0-cdh5.3.6/spooling/checkpointa3.channels.c3.datadirs =/opt/ cdh/flume-1.5.0-cdh5.3.6/spooling/data# definition Sinka3.sinks.k3.type=hdfsa3.sinks.k3.hdfs.path = hdfs:// life-hadoop.life.com:8020/user/yanglin/flume/spooling-logs/%y%m%d# each time it refreshes the number of event to HDFs, Default: 100a3.sinks.k3.hdfs.batchsize=10# modify file type, default: sequencefilea3.sinks.k3.hdfs.filetype=datastream#Modify the file's write format, default: writablea3.sinks.k3.hdfs.writeformat=text# set the header with a timestamp a3.sinks.k3.hdfs.uselocaltimestamp=true# Define the relationship between the three A3.SOURCES.S3.CHANNELS=C3A3.SINKS.K3.CHANNEL=C3
4.2 Start flume Client for monitoring and collection
Bin/flume-ng Agent--conf conf/--name A3--conf-file Conf/spooling-conf.properties-dflume.root.logger=debug,console
4.3 Viewing collection results
Use of Flume