Flume environment Deployment and configuration detailed and case book _linux

Source: Internet
Author: User
Tags failover syslog hadoop fs

One, what is flume?


As a real-time log collection system developed by Cloudera, Flume is recognized and widely used by the industry. The initial release version of Flume is currently known collectively as Flume OG (original Generation), which belongs to Cloudera. However, with the expansion of the FLume function, the FLume OG code engineering is bloated, the core component design is unreasonable, the core configuration is not standard, etc., especially in the last release OG of FLume 0.94.0, the phenomenon of log transmission instability is especially serious, in order to solve these problems, 2011 October 22, Cloudera completed the Flume-728 and made a landmark change to Flume: Refactoring core components, core configurations, and code architectures, and the reconstructed version is collectively called Flume NG (Next generation); Another reason for this change is to Flume Into Apache, Cloudera Flume renamed Apache Flume.





Characteristics of Flume:


Flume is a distributed, reliable, and highly available system for mass log capture, aggregation, and transmission. Enables the customization of data senders in the journaling system for data collection, while Flume provides the ability to simply process data and write to various data recipients (such as text, HDFS, hbase, etc.).


The flume data stream runs through events (event) throughout. An event is a flume base unit of data that carries log data (in byte arrays) and carries header information, which is generated by source outside of the agent, that is formatted when the source captures the event, and then the source pushes the event (single or multiple). In channel. You can think of channel as a buffer that will save the event until sink is done with the event. Sink is responsible for persisting the log or pushing the event to another source.





Reliability of Flume


When a node fails, the log can be transmitted to other nodes without loss. Flume provides three levels of reliability assurance, from strong to weak in order: End-to-end (Received data agent first write event to disk, when the data transfer success, then delete; If the data sent failed, you can resend.) ), Store On failure (this is also the strategy adopted by scribe, when the data receiver crash, write the data to the local, after recovery, continue to send), BestEffort (data sent to the receiver, will not be confirmed).





The recoverability of Flume:


Or rely on the channel. Recommended use of FileChannel, event persistence in the local file system (poor performance).





Some of the core concepts of flume:


The agent runs Flume using the JVM. Each machine runs an agent, but can contain multiple sources and sinks in one agent.


The client produces data that runs on a separate thread.


Source collects data from the client and passes it to channel.


Sink collects data from channel and runs on a separate thread.


Channel connects sources and sinks, which is a bit like a queue.


Events can be log records, Avro objects, and so on.





Flume an independent operating unit with the agent as the smallest. An agent is a JVM. The single agent consists of source, sink and channel components, as shown in the following figure:

It is noteworthy that Flume provides a large number of built-in source, channel, and sink types. Different types of source,channel and sink can be combined freely. The combination method is very flexible based on the configuration file set by the user. For example: Channel can put events in memory, or can be persisted to the local hard drive. Sink can write the log to HDFs, HBase, or even another source, and so on. Flume supports users to build multilevel streams, which means that multiple agents can work together and support Fan-in, fan-out, contextual Routing, Backup Routes, which is also NB. As shown in the following illustration:

Ii. where is the official website of Flume?
http://flume.apache.org/

Third, where to download?

Http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz

Iv. How to install?
1 will download the flume package, extract into the/home/hadoop directory, you have completed 50%: simple

2) Modify flume-env.sh configuration file, mainly java_home variable settings

root@m1:/home/hadoop/flume-1.5.0-bin# CP conf/flume-env.sh.template conf/flume-env.sh root@m1:/home/hadoop/ flume-1.5.0-bin# VI conf/flume-env.sh # Licensed to the Apache Software Foundation (ASF) under one # or more contributor L Icense agreements. The NOTICE file # distributed with this work for additional information # regarding copyright. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); You are not to use this file except in compliance # with the License.  You may obtain a copy of the License in # # http://www.apache.org/licenses/LICENSE-2.0 # unless required by applicable Writing, software # Distributed under the License is distributed on ' as is ' basis, # without Warra
Nties or CONDITIONS of any KIND, either express OR implied.
 
# The License for the specific language governing permissions and # Limitations under the License. # If This file is placed at flume_conf_dir/flume-env.sh, it'll be SOURCEd # during Flume startup.
 
# enviroment variables can be set here. Java_home=/usr/lib/jvm/java-7-oracle # give Flume more memory and pre-allocate, enable remote monitoring via JMX #JAVA_O Pts= "-xms100m-xmx200m-dcom.sun.management.jmxremote" # The Flume conf directory is always included in the CL
Asspath.
 

 #FLUME_CLASSPATH = ""

3 Verify that the installation is successful

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng version
flume 1.5.0
Source code repository: Https://git-wip-us.apache.org/repos/asf/flume.git
revision:8633220df808c4cd0c13d1cf0320454a94f1ea97
Compiled by Hshreedharan on Wed could 7 14:49:18 PDT 2014 from
source with checksum a01fe726e4380ba0c9f7a7d222db961f
   
    root@m1:/home/hadoop#

   

The information above indicates that the installation was successful


V. The case of Flume
1) Case 1:avro
Avro can send a given file to the Flume,avro source using the Avro RPC mechanism.
A) Creating an agent configuration file

root@m1:/home/hadoop#vi/home/hadoop/flume-1.5.0-bin/conf/avro.conf
 
a1.sources = R1
a1.sinks = K1
A1.channels = C1
 
# describe/configure the source
a1.sources.r1.type = Avro A1.sources.r1.channels
= C1
A1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
 
# Describe the sink
a1.sinks.k1.type = Logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = Memory A1.channels.c1.capacity
= 1000
a1.channels.c1.transactionCapacity =
 
# Bind The source and sink to the channel
a1.sources.r1.channels = c1
   a1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/avro.conf-n A1-dflume.root.logger=info,console

c) Create the specified file

root@m1:/home/hadoop# echo "Hello World" >/home/hadoop/flume-1.5.0-bin/log.00

d) Use Avro-client to send files

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng avro-client-c. -H m1-p 4141-f/home/hadoop/flume-1.5.0-bin/log.00

F in the M1 console, you can see the following information, note the last line:

root@m1:/home/hadoop/flume-1.5.0-bin/conf#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/avro.conf-n a1-dflume.root.logger=info,console info:sourcing Environment Configuration script/home/hadoop/flume-1.5.0-bin/conf/flume-env.sh info:including Hadoop libraries found via (/home/ Hadoop/hadoop-2.2.0/bin/hadoop) for HDFS access info:excluding/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/ Slf4j-api-1.7.5.jar from Classpath info:excluding/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/ Slf4j-log4j12-1.7.5.jar from Classpath -08-10 10:43:25,112 (New I/O worker #1) [Info-org.apache.avro.ipc.nettyserver $NettyServerAvroHandler. Handleupstream (nettyserver.java:171)] [id:0x92464c4f,/192.168.1.50:59850:>/ 192.168.1.50:4141] unbound-08-10 10:43:25,112 (New I/O worker #1) [info-org.apache.avro.ipc.nettyserver$ Nettyserveravrohandler.handleupstream (nettyserver.java:171)] [id:0x92464c4f,/192.168.1.50:59850:>/ 192.168.1.50:4141] closed-08-10 10:43:25,112 (New I/O worker #1) [info-org.apache.avro.ipc.nettyserver$
Nettyserveravrohandler.channelclosed (nettyserver.java:209)] Connection to/192.168.1.50:59850 disconnected. -08-10 10:43:26,718 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO-        Org.apache.flume.sink.LoggerSink.process (loggersink.java:70)] Event: {headers:{} body:68 6C 6C 6F 6F 6C 64
 Hello World}

2) Case 2:spool
Spool the new files in the directory under the monitoring configuration and read the data in the file. Two points to note:
1 files copied to the spool directory can no longer open the edit.
2) Spool directory can not contain the corresponding subdirectories
A) Creating an agent configuration file

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/spool.conf
a1.sources = R1
a1.sinks = K1
A1.channels = C1
# describe/configure the source
a1.sources.r1.type = spooldir a1.sources.r1.channels
= c1< C7/>a1.sources.r1.spooldir =/home/hadoop/flume-1.5.0-bin/logs
A1.sources.r1.fileHeader = True
# Describe The sink
a1.sinks.k1.type = Logger
# Use a channel which buffers events in memory a1.channels.c1.type
= Mem Ory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to The channel
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/spool.conf-n A1-dflume.root.logger=info,console

c) Append files to the/home/hadoop/flume-1.5.0-bin/logs directory

root@m1:/home/hadoop# echo "Spool test1" >/home/hadoop/flume-1.5.0-bin/logs/spool_text.log

D in the M1 console, you can see the following related information:

/08/10 11:37:13 INFO Source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:13 INFO Source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:14 INFO Avro. Reliablespoolingfileeventreader:preparing to move File/home/hadoop/flume-1.5.0-bin/logs/spool_text.log to/home/ HADOOP/FLUME-1.5.0-BIN/LOGS/SPOOL_TEXT.LOG.COMPLETED/08/10 11:37:14 INFO Source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:14 INFO Source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:14 INFO sink. Loggersink:event: {headers:{file=/home/hadoop/flume-1.5.0-bin/logs/spool_text.log} body:73 6F 6F 6C 20 74 65 73 74 3 1 spool test1}/08/10 11:37:15 INFO source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:15 INFO Source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:16 INFO Source. Spooldirectorysource:spoOling Directory Source Runner has shutdown. /08/10 11:37:16 INFO Source.
Spooldirectorysource:spooling Directory Source Runner has shutdown. /08/10 11:37:17 INFO Source.
 Spooldirectorysource:spooling Directory Source Runner has shutdown.

3) Case 3:exec
EXEC executes a given command to get the source of the output, and if you want to use the tail command, you must make the file large enough to see the output
A) Creating an agent configuration file

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/exec_tail.conf
a1.sources = R1
a1.sinks = K1
A1.channels = C1
# describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = C1
A1.sources.r1.command = tail-f/home/hadoop/flume-1.5.0-bin/log_exec_tail
# Describe the sink
A1.sinks.k1.type = Logger
# Use a channel which buffers events in memory a1.channels.c1.type
= Memory
a1.ch annels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the CHANNEL
   a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/exec_tail.conf-n A1-dflume.root.logger=info,console

C Generate enough content in the file

root@m1:/home/hadoop# for I in {1..100};d o echo "exec tail$i" >>/home/hadoop/flume-1.5.0-bin/log_exec_tail;echo $ I;sleep 0.1;done

E in the M1 console, you can see the following information:

-08-10 10:59:25,513 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO- Org.apache.flume.sink.LoggerSink.process (loggersink.java:70)] Event: {headers:{} body:65 6C 20 74 6 5 GHz exec Tail test} -08-10 10:59:34,535 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO-ORG.APACHE.FLUME.S Ink. Loggersink.process (loggersink.java:70)] Event: {headers:{} body:65-The 6C-GHz exec tail T EST} -08-10 11:01:40,557 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO-          Org.apache.flume.sink.LoggerSink.process (loggersink.java:70)] Event: {headers:{} body:65 6C 31 EXEC Tail1} -08-10 11:01:41,180 (sinkrunner-pollingrunner-defaultsinkprocessor) [Info-org.apache.flume.sink.logger Sink.process (loggersink.java:70)] Event: {headers:{} body:65 6C exec tail2}-08-10 11: 01:41,180 (sinkrunner-pollingrunner-defaultsinkprocessor) [info-org.apache.flume.sink.loggersink.procesS (loggersink.java:70)] Event: {headers:{} body:65 6C-exec tail3}-08-10 11:01:41,181 ( Sinkrunner-pollingrunner-defaultsinkprocessor) [Info-org.apache.flume.sink.loggersink.process (LoggerSink.java ()] Event: {headers:{} body:65 -08-10 11:01:41,181 (sinkrunner-pollin)-6C exec Tail4} Grunner-defaultsinkprocessor) [Info-org.apache.flume.sink.loggersink.process (loggersink.java:70)] Event: {headers : {} body:65 tail5-6C exec -08-10 11:01:41,181 (sinkrunner-pollingrunner-defaultsinkpro Cessor) [Info-org.apache.flume.sink.loggersink.process (loggersink.java:70)] Event: {headers:{} body:65 78 65 63 20 74  6C-exec tail6} ... -08-10 11:01:51,550 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO] [* -Org.apache.flume.sink.LoggerSink.process (loggersink.java:70)] Event: {headers:{} body:65 6C 39 3 6 exec Tail96}-08-10 11:01:51,550 (sinkrunner-pollingrunner-defaultsinkprocessor) [Info-org.apache.flume.sink.loggersink.process (LoggerSink.java ()] Event: {headers:{} body:65 (6C) -08-10 11:01:51,551 (Sinkrunner-poll) Ingrunner-defaultsinkprocessor) [Info-org.apache.flume.sink.loggersink.process (LoggerSink.java:70)] Event: { headers:{} body:65 -08-10 11:01:51,551 (SINKRUNNER-POLLINGRUNNER-DEFAU) 6C/exec Tail98} Ltsinkprocessor) [Info-org.apache.flume.sink.loggersink.process (loggersink.java:70)] Event: {headers:{} body:65 78 tail99 6C exec- -08-10 11:01:51,551 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO  -Org.apache.flume.sink.LoggerSink.process (loggersink.java:70)] Event: {headers:{} body:65 6C 31 30
 EXEC tail100}

4) Case 4:syslogtcp
SYSLOGTCP listening TCP port as a data source
A) Creating an agent configuration file

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf
a1.sources = R1
a1.sinks = K1
a1.channels = C1
# describe/configure the source
a1.sources.r1.type = syslogtcp
A1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = C1
# Describe the sink
a1.sinks.k1.type = Logger
# Use a channel which buffers events in memory a1.channels.c1.type
= Memory
A1.channels.c1.capaci ty = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the channel
A1.SOURCES.R1. Channels = C1
A1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf-n A1-dflume.root.logger=info,console

c) testing to generate syslog

root@m1:/home/hadoop# echo "Hello idoall.org syslog" | NC localhost 5140

D in the M1 console, you can see the following information:

/08/10 11:41:45 INFO node. Pollingpropertiesfileconfigurationprovider:reloading Configuration File:/home/hadoop/flume-1.5.0-bin/conf/syslog _TCP.CONF/08/10 11:41:45 INFO Conf. flumeconfiguration:added sinks:k1 AGENT:A1/08/10 11:41:45 INFO conf. FLUMECONFIGURATION:PROCESSING:K1/08/10 11:41:45 INFO Conf. FLUMECONFIGURATION:PROCESSING:K1/08/10 11:41:45 INFO Conf. Flumeconfiguration:post-validation flume configuration contains configuration for Agents: [A1]/08/10 11:41:45 INFO node. abstractconfigurationprovider:creating CHANNELS/08/10 11:41:45 INFO channel. Defaultchannelfactory:creating instance of channel C1 type MEMORY/08/10 11:41:45 INFO node. abstractconfigurationprovider:created Channel C1/08/10 11:41:45 INFO source. defaultsourcefactory:creating instance of source R1, type SYSLOGTCP/08/10 11:41:45 INFO sink. defaultsinkfactory:creating instance of Sink:k1, TYPE:LOGGER/08/10 11:41:45 INFO node. Abstractconfigurationprovider:channel C1 connected to [R1, K1]/08/10 11:41: INFO node. Application:starting New configuration:{Sourcerunners:{r1=eventdrivensourcerunner: {source:o Rg.apache.flume.source.syslogtcpsource{name:r1,state:idle}} Sinkrunners:{k1=sinkrunner: {policy:o Rg.apache.flume.sink.defaultsinkprocessor@6538b14 countergroup:{name:null counters:{}}} channels:{c1= ORG.APACHE.FLUME.CHANNEL.MEMORYCHANNEL{NAME:C1}}/08/10 11:41:45 INFO node. Application:starting Channel C1/08/10 11:41:45 INFO instrumentation.
monitoredcountergroup:monitored Counter Group for Type:channel, name:c1:Successfully registered new MBean. /08/10 11:41:45 INFO Instrumentation. Monitoredcountergroup:component Type:channel, name:c1 STARTED/08/10 11:41:45 INFO node. Application:starting Sink K1/08/10 11:41:45 INFO node. Application:starting source R1/08/10 11:41:45 INFO source. Syslogtcpsource:syslog TCP Source Starting .../08/10 11:42:15 WARN source.
Syslogutils:event created from Invalid Syslog data. /08/10 11:42:15 INFO sink. Loggersink:event: {headers:{severity=0, flume.syslog.status=invalid, facility=0} body:68 6C 6C 6F-IDOALL.O 6F, 6C, 6C 2E 6F
 RG}

5) Case 5:jsonhandler
A) Creating an agent configuration file

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/post_json.conf
a1.sources = R1
a1.sinks = K1
A1.channels = C1
# describe/configure the source
a1.sources.r1.type = Org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 8888
a1.sources.r1.channels = C1
# Describe the sink
a1.sinks.k1.type = logger< c10/># use a channel which buffers events in memory A1.channels.c1.type = Memory a1.channels.c1.capacity
= 1000< c13/>a1.channels.c1.transactioncapacity =
# Bind The source and sink to the channel
A1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/post_json.conf-n A1-dflume.root.logger=info,console

C to generate a POST request in JSON format

root@m1:/home/hadoop# curl-x post-d ' [{"headers": {"a": "A1", "B": "B1"}, "Body": "Idoall.org_body"}] ' Http://localhos t:8888

D in the M1 console, you can see the following information:
/

08/10 11:49:59 INFO node. Application:starting Channel C1/08/10 11:49:59 INFO instrumentation.
monitoredcountergroup:monitored Counter Group for Type:channel, name:c1:Successfully registered new MBean. /08/10 11:49:59 INFO Instrumentation. Monitoredcountergroup:component Type:channel, name:c1 STARTED/08/10 11:49:59 INFO node. Application:starting Sink K1/08/10 11:49:59 INFO node. Application:starting Source R1/08/10 11:49:59 INFO mortbay.log:Logging to Org.slf4j.impl.Log4jLoggerAdapter ( Org.mortbay.log) via ORG.MORTBAY.LOG.SLF4JLOG/08/10 11:49:59 info MORTBAY.LOG:JETTY-6.1.26/08/10 11:50:00 Info mortbay . log:started SELECTCHANNELCONNECTOR@0.0.0.0:8888/08/10 11:50:00 INFO instrumentation.
monitoredcountergroup:monitored Counter Group for Type:source, name:r1:Successfully registered new MBean. /08/10 11:50:00 INFO Instrumentation. Monitoredcountergroup:component Type:source, name:r1 started/08/10 12:14:32 INFO sink. Loggersink:event: {headers:{b=b1, a=a1} body:69 64 6F 6C 6C 2E 6F-idoall.org_body 5F 6F
 

6) Case 6:hadoop sink
The installation deployment for the hadoop2.2.0 section, please refer to the article ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+ hive0.13.1 Distributed Environment Deployment
A) Create agent configuration file

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/hdfs_sink.conf
a1.sources = R1
a1.sinks = K1
A1.channels = C1
# describe/configure the source
a1.sources.r1.type = syslogtcp A1.sources.r1.port
= 5140< C7/>a1.sources.r1.host = localhost
a1.sources.r1.channels = C1
# Describe the sink
a1.sinks.k1.type = HDFs
A1.sinks.k1.channel = C1
A1.sinks.k1.hdfs.path = hdfs://m1:9000/user/flume/syslogtcp
A1.sinks.k1.hdfs.filePrefix = Syslog
A1.sinks.k1.hdfs.round = True
A1.sinks.k1.hdfs.roundValue = ten
A1.sinks.k1.hdfs.roundUnit = Minute
# Use a channel which buffers events in memory
A1.channels.c1.type = memory< c19/>a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the C Hannel
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/hdfs_sink.conf-n A1-dflume.root.logger=info,console

c) testing to generate syslog

root@m1:/home/hadoop# echo "Hello Idoall flume-> Hadoop testing One" | NC localhost 5140

D in the M1 console, you can see the following information:

/08/10 12:20:39 INFO Instrumentation.
monitoredcountergroup:monitored Counter Group for Type:channel, name:c1:Successfully registered new MBean. /08/10 12:20:39 INFO Instrumentation. Monitoredcountergroup:component Type:channel, name:c1 STARTED/08/10 12:20:39 INFO node. Application:starting Sink K1/08/10 12:20:39 INFO node. Application:starting Source R1/08/10 12:20:39 INFO instrumentation.
monitoredcountergroup:monitored Counter Group for Type:sink, name:k1:Successfully registered new MBean. /08/10 12:20:39 INFO Instrumentation. Monitoredcountergroup:component Type:sink, name:k1 started/08/10 12:20:39 INFO source. Syslogtcpsource:syslog TCP Source Starting .../08/10 12:21:46 WARN source.
Syslogutils:event created from Invalid Syslog data. /08/10 12:21:49 INFO HDFs. Hdfssequencefile:writeformat = writable, Userawlocalfilesystem = FALSE/08/10 12:21:49 INFO HDFs. Bucketwriter:creating HDFS://M1:9000/USER/FLUME/SYSLOGTCP//SYSLOG.1407644509504.TMP/08/10 12:22:20 INFO HDFs.Bucketwriter:closing HDFS://M1:9000/USER/FLUME/SYSLOGTCP//SYSLOG.1407644509504.TMP/08/10 12:22:20 INFO HDFs. Bucketwriter:close tries INCREMENTED/08/10 12:22:20 INFO HDFs. Bucketwriter:renaming hdfs://m1:9000/user/flume/syslogtcp/syslog.1407644509504.tmp to hdfs://m1:9000/user/flume/ SYSLOGTCP/SYSLOG.1407644509504/08/10 12:22:20 INFO HDFs.
 Hdfseventsink:writer callback called.

E Open a window on the M1 and go to the Hadoop to check whether the file is generated

root@m1:/home/hadoop#/home/hadoop/hadoop-2.2.0/bin/hadoop fs-ls/user/flume/syslogtcp
Found 1 Items
- rw-r--r--  3 root supergroup    2014-08-10 12:22/user/flume/syslogtcp/syslog.1407644509504
root@m1:/ home/hadoop#/home/hadoop/hadoop-2.2.0/bin/hadoop fs-cat/user/flume/syslogtcp/syslog.1407644509504
SEQ! Org.apache.hadoop.io.LongWritable "Org.apache.hadoop.io.byteswritable^;>gv$hello Idoall Flume-> Hadoop Testing One

7) Case 7:file Roll Sink
A) Creating an agent configuration file

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/file_roll.conf
a1.sources = R1
a1.sinks = K1
A1.channels = C1
# describe/configure the source
a1.sources.r1.type = syslogtcp A1.sources.r1.port
= 5555< C7/>a1.sources.r1.host = localhost
a1.sources.r1.channels = C1
# Describe the sink
a1.sinks.k1.type = File _roll
a1.sinks.k1.sink.directory =/home/hadoop/flume-1.5.0-bin/logs
# Use a channel which buffers events in Memory
a1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the channel
a1.sources.r1.channels = c1
   a1.sinks.k1.channel = C1

b) Start Flume agent A1

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/file_roll.conf-n A1-dflume.root.logger=info,console

c) test generates log

root@m1:/home/hadoop# echo "Hello idoall.org syslog" | NC localhost 5555
root@m1:/home/hadoop# echo "Hello idoall.org syslog 2" | nc localhost 5555

D to see if a file is generated under/home/hadoop/flume-1.5.0-bin/logs, and a new file is generated every 30 seconds by default

root@m1:/home/hadoop# ll/home/hadoop/flume-1.5.0-bin/logs
Total dosage 272
drwxr-xr-x 3 root root  4096 Aug 10 12:50 ./
Drwxr-xr-x 9 root  4096 Aug 10 10:59. /
-rw-r--r--1 root   Aug 12:49 1407646164782-1
-rw-r--r--1 root root   0 Aug 10 12:49 14076461 64782-2
-rw-r--r--1 root   0 Aug 12:50 1407646164782-3 root@m1:/home/hadoop# cat/home/hadoop/
flume-1.5.0-bin/logs/1407646164782-1/home/hadoop/flume-1.5.0-bin/logs/1407646164782-2
Hello idoall.org syslog
Hello idoall.org syslog 2

The

8) case 8:replicating Channel Selector
Flume supports fan out streams from one source to multiple channels. There are two modes of fan out, respectively, replication and reuse. In the case of replication, the stream event is sent to all configuration channels. In the case of reuse, the event is sent to a subset of the available channels. Fan out flow requires a rule that specifies the source and Fan out channels.
This time we need to use the M1,M2 two machines
a) to create the Replicating_channel_selector profile in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf
a1.sources = R1
a1.sinks = K1 K2
a1.channels = C1 C2
# describe/configure the source
a1.sources.r1.type =
syslogtcp A1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = C1
C2 A1.sources.r1.selector.type = Replicating
# Describe the sink
a1.sinks.k1.type = Avro
A1.sinks.k1.channel = C1
A1.sinks.k1.hostname = m1
a1.sinks.k1.port = 5555 A1.sinks.k2.type
= Avro
a1.sinks.k2.channel = C2
a1.sinks.k2.hostname = m2
a1.sinks.k2.port = 5555
# use a channel which FERS events in memory
a1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
A1.channels.c2.type = Memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

(b) Create Replicating_channel_selector_avro configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf
a1.sources = R1
a1.sinks = K1
a1.channels = C1
# describe/configure the source
a1.sources.r1.type = Avro
A1.sources.r1.channels = C1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5555
# Describe the sink
A1.sinks.k1.type = Logger
# Use a channel which buffers events in memory
A1.channels.c1.type = memory
   
    a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the Chann El
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

   

(c) Copy 2 profiles to M2 on M1

root@m1:/home/hadoop/flume-1.5.0-bin# Scp-r/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf
root@m1:/home/hadoop/ flume-1.5.0-bin# scp-r/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf root@m2:/home/ Hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf<br>

D open 4 windows and start two flume agents on M1 and M2 at the same time

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf-n A1-dflume.root.logger=info,console
root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c.-f/home/hadoop/flume-1.5.0-bin/conf /replicating_channel_selector.conf-n A1-dflume.root.logger=info,console

e) and then on any machine in M1 or M2, the test produces the syslog

root@m1:/home/hadoop# echo "Hello idoall.org syslog" | NC localhost 5140

f The following information can be seen in the sink window of M1 and M2, indicating that the information is synchronized:

/08/10 14:08:18 INFO IPC.
Nettyserver:connection to/192.168.1.51:46844 disconnected. /08/10 14:08:52 INFO IPC. Nettyserver: [id:0x90f8fe1f,/192.168.1.50:35873 =&gt;/192.168.1.50:5555] OPEN/08/10 14:08:52 INFO IPC. Nettyserver: [id:0x90f8fe1f,/192.168.1.50:35873 =&gt;/192.168.1.50:5555] BOUND:/192.168.1.50:5555/08/10 14:08:52 INFO IPC. Nettyserver: [id:0x90f8fe1f,/192.168.1.50:35873 =&gt;/192.168.1.50:5555] CONNECTED:/192.168.1.50:35873/08/10 14:08:59 INFO IPC. Nettyserver: [id:0xd6318635,/192.168.1.51:46858 =&gt;/192.168.1.50:5555] OPEN/08/10 14:08:59 INFO IPC. Nettyserver: [id:0xd6318635,/192.168.1.51:46858 =&gt;/192.168.1.50:5555] BOUND:/192.168.1.50:5555/08/10 14:08:59 INFO IPC. Nettyserver: [id:0xd6318635,/192.168.1.51:46858 =&gt;/192.168.1.50:5555] CONNECTED:/192.168.1.51:46858/08/10 14:09:20 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:68 6C 6C 6F 6F 6C
 6C 2E 6F Hello idoall.org}


                 9) Case 9:multiplexing Channel Selector
A) Create Multiplexing_channel_selector profile in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf a1.sources = R1 A1.sinks = K1 K2 a1.channels = C1 C2 # describe/configure the source A1.sources.r1.type = Org.apache.flume.source.http.HTT Psource A1.sources.r1.port = 5140 a1.sources.r1.channels = C1 C2 A1.sources.r1.selector.type = multiplexing A1.sources.r1 . Selector.header = Type #映射允许每个值通道可以重叠.
The default value can contain any number of channels. A1.sources.r1.selector.mapping.baidu = C1 A1.sources.r1.selector.mapping.ali = C2 A1.sources.r1.selector.default = C1 # Describe the sink A1.sinks.k1.type = Avro A1.sinks.k1.channel = C1 A1.sinks.k1.hostname = M1 A1.sinks.k1.port = 5555 a1.si Nks.k2.type = Avro A1.sinks.k2.channel = C2 A1.sinks.k2.hostname = m2 a1.sinks.k2.port = 5555 # Use a channel which buffer s events in memory A1.channels.c1.type = Memory A1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity =-A1
 . Channels.c2.type = Memory A1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100

(b) Create Multiplexing_channel_selector_avro configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf
a1.sources = R1
a1.sinks = K1
a1.channels = C1
# describe/configure the source
a1.sources.r1.type = avro< C6/>a1.sources.r1.channels = C1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5555
# Describe the sink< C10/>a1.sinks.k1.type = Logger
# Use a channel which buffers events in memory
a1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
# Bind The source and sink to the Channe L
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

(c) Copy 2 profiles to m2

root@m1:/home/hadoop/flume-1.5.0-bin# Scp-r/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf
root@m1:/home/hadoop/ flume-1.5.0-bin# scp-r/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf root@m2:/home/ Hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf

D open 4 windows and start two flume agents on M1 and M2 at the same time

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf-n A1-dflume.root.logger=info, Console
root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c.-f/home/hadoop/ Flume-1.5.0-bin/conf/multiplexing_channel_selector.conf-n A1-dflume.root.logger=info,console

e) and then on any machine in M1 or M2, the test produces the syslog

root@m1:/home/hadoop# curl-x post-d ' [{' headers ': {' type ': ' Baidu '}, ' body ': ' Idoall_test1 '}] ' http://localhost:5140 &  amp;& curl-x post-d ' [{' headers ': {' type ': ' Ali '}, ' body ': ' Idoall_test2 '}] ' http://localhost:5140 && Curl -X post-d ' [{' headers ': {' type ': ' QQ '}, ' body ': ' Idoall_test3 '}] ' http://localhost:5140

f) In the M1 sink window, you can see the following information:

14/08/10 14:32:21 INFO node. application:starting Sink K1 14/08/10 14:32:21 INFO node. Application:starting source R1 14/08/10 14:32:21 INFO source.
Avrosource:starting Avro source R1: {bindaddress:0.0.0.0, port:5555} ... 14/08/10 14:32:21 INFO Instrumentation.
monitoredcountergroup:monitored Counter Group for Type:source, name:r1:Successfully registered new MBean. 14/08/10 14:32:21 INFO Instrumentation. Monitoredcountergroup:component Type:source, Name:r1 started 14/08/10 INFO SOURCE.
Avrosource:avro source R1 started. 14/08/10 14:32:36 INFO IPC. Nettyserver: [Id:0xcf00eea6,/192.168.1.50:35916 =&gt;/192.168.1.50:5555] OPEN 14/08/10 INFO IPC. Nettyserver: [Id:0xcf00eea6,/192.168.1.50:35916 =&gt;/192.168.1.50:5555] BOUND:/192.168.1.50:5555 14/08/10 14:32:36 INFO IPC. Nettyserver: [Id:0xcf00eea6,/192.168.1.50:35916 =&gt;/192.168.1.50:5555] CONNECTED:/192.168.1.50:35916 14/08/10 14:32:44 INFO IPC. Nettyserver: [id:0x432f5468,/192.168.1.51:46945 =&gt;/192.168.1.50:5555] OPEN 14/08/10 14:32:44 INFO IPC. Nettyserver: [id:0x432f5468,/192.168.1.51:46945 =&gt;/192.168.1.50:5555] BOUND:/192.168.1.50:5555 14/08/10 14:32:44 INFO IPC. Nettyserver: [id:0x432f5468,/192.168.1.51:46945 =&gt;/192.168.1.50:5555] CONNECTED:/192.168.1.51:46945 14/08/10 14:34:11 INFO sink.  Loggersink:event: {headers:{type=baidu} body:69 6F 6C 6C 5F idoall_test1} 14/08/10 14:34:57 INFO sink.
 Loggersink:event: {headers:{type=qq} body:69 6F 6C 6C 5F Idoall_test3}

g) In the M2 sink window, you can see the following information:

14/08/10 14:32:27 INFO node. application:starting Sink K1 14/08/10 14:32:27 INFO node. Application:starting source R1 14/08/10 14:32:27 INFO source.
Avrosource:starting Avro source R1: {bindaddress:0.0.0.0, port:5555} ... 14/08/10 14:32:27 INFO Instrumentation.
monitoredcountergroup:monitored Counter Group for Type:source, name:r1:Successfully registered new MBean. 14/08/10 14:32:27 INFO Instrumentation. Monitoredcountergroup:component Type:source, Name:r1 started 14/08/10 INFO SOURCE.
Avrosource:avro source R1 started. 14/08/10 14:32:36 INFO IPC. Nettyserver: [ID:0X7C2F0AEC,/192.168.1.50:38104 =&gt;/192.168.1.51:5555] OPEN 14/08/10 INFO IPC. Nettyserver: [ID:0X7C2F0AEC,/192.168.1.50:38104 =&gt;/192.168.1.51:5555] BOUND:/192.168.1.51:5555 14/08/10 14:32:36 INFO IPC. Nettyserver: [ID:0X7C2F0AEC,/192.168.1.50:38104 =&gt;/192.168.1.51:5555] CONNECTED:/192.168.1.50:38104 14/08/10 14:32:44 INFO IPC. Nettyserver: [id:0x3d36f553,/192.168.1.51:48599 =&gt;/192.168.1.51:5555] OPEN 14/08/10 14:32:44 INFO IPC. Nettyserver: [id:0x3d36f553,/192.168.1.51:48599 =&gt;/192.168.1.51:5555] BOUND:/192.168.1.51:5555 14/08/10 14:32:44 INFO IPC. Nettyserver: [id:0x3d36f553,/192.168.1.51:48599 =&gt;/192.168.1.51:5555] CONNECTED:/192.168.1.51:48599 14/08/10 14:34:33 INFO sink.
 Loggersink:event: {Headers:{type=ali} body:69 6F 6C 6C 5F Idoall_test2}

The

can be seen to be distributed to different channel
&NBSP
10) Case 10:flume Sink processors
Failover machine is always sent to one of the sin according to the different conditions in the header. K, when this sink is not available, automatically sent to the next sink.
&NBSP
A to create the Flume_sink_processors configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf a1.sources = R1 a1.sinks = K1 K2
A1.channels = C1 C2 #这个是配置failover的关键, you need to have a sink group a1.sinkgroups = G1 a1.sinkgroups.g1.sinks = K1 K2 #处理的类型是failover
A1.sinkgroups.g1.processor.type = Failover #优先级, the higher the number, the greater the priority, each sink must have a different priority A1.SINKGROUPS.G1.PROCESSOR.PRIORITY.K1 = 5 A1.sinkgroups.g1.processor.priority.k2 = #设置为10秒, of course, can be changed to your actual situation faster or slower a1.sinkgroups.g1.processor.maxpenalty = 10000 # describe/configure the source A1.sources.r1.type = syslogtcp A1.sources.r1.port = 5140 A1.sources.r1.channels = C1 C2 A1.sources.r1.selector.type = replicating # Describe the sink A1.sinks.k1.type = Avro A1.sinks.k1.channel = C1 A  1.sinks.k1.hostname = M1 A1.sinks.k1.port = 5555 A1.sinks.k2.type = Avro A1.sinks.k2.channel = C2 A1.sinks.k2.hostname = M2 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory A1.channels.c1.type = Memory a1.channels.c1.c apacity = 1000 A1.channels.c1.transactioncapacity = A1.channels.c2.type = Memory A1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100

(b) Create Flume_sink_processors_avro configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf
 
a1.sources = R1
a1.sinks = K1
a1.channels = C1
 
# describe/configure the source
a1.sources.r1.type = Avro
A1.sources.r1.channels = C1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5555
 
# Describe the sink
A1.sinks.k1.type = Logger
 
# Use a channel which buffers events in memory
A1.channels.c1.type = memory
   
    a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
 
# Bind The source and sink to the Chann El
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

   

(c) Copy 2 profiles to m2

root@m1:/home/hadoop/flume-1.5.0-bin# scp-r/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf root@m2 :/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf
root@m1:/home/hadoop/flume-1.5.0-bin# scp-r/ Home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/ Flume_sink_processors_avro.conf

D open 4 windows and start two flume agents on M1 and M2 at the same time

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf-n A1-dflume.root.logger=info,console
root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf-n A1-dflume.root.logger=info,console

e) and then on any machine in M1 or M2, the test generates log

root@m1:/home/hadoop# echo "idoall.org test1 Failover" | NC localhost 5140

f) Because of the high priority of the M2, you can see the following information in the M2 sink window, and M1 not:

14/08/10 15:02:46 INFO IPC. Nettyserver:connection to/192.168.1.51:48692 disconnected.
14/08/10 15:03:12 INFO IPC. Nettyserver: [id:0x09a14036,/192.168.1.51:48704 =>/192.168.1.51:5555] OPEN
14/08/10 INFO IPC. Nettyserver: [id:0x09a14036,/192.168.1.51:48704 =>/192.168.1.51:5555] BOUND:/192.168.1.51:5555 14/08/10
15:03:12 INFO IPC. Nettyserver: [id:0x09a14036,/192.168.1.51:48704 =>/192.168.1.51:5555] CONNECTED:/192.168.1.51:48704 14/08/
Ten 15:03:26 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 Idoall.org Test1}

g) At this point we stop the sink (CTRL + C) on the M2 machine and output the test data again:

root@m1:/home/hadoop# echo "idoall.org test2 Failover" | NC localhost 5140

h) You can read the two test data that you just sent in the M1 sink window:

14/08/10 15:02:46 INFO IPC. Nettyserver:connection to/192.168.1.51:47036 disconnected.
14/08/10 15:03:12 INFO IPC. Nettyserver: [id:0xbcf79851,/192.168.1.51:47048 =>/192.168.1.50:5555] OPEN
14/08/10 INFO IPC. Nettyserver: [id:0xbcf79851,/192.168.1.51:47048 =>/192.168.1.50:5555] BOUND:/192.168.1.50:5555 14/08/10
15:03:12 INFO IPC. Nettyserver: [id:0xbcf79851,/192.168.1.51:47048 =>/192.168.1.50:5555] CONNECTED:/192.168.1.51:47048 14/08/
Ten 15:07:56 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 idoall.org test1}
14/08/10 15:07:56 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 Idoall.org Test2}

i) We then start the sink in the sink window of m2:

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf-n A1-dflume.root.logger=info,console

j) Enter two batches of test data:

root@m1:/home/hadoop# echo "idoall.org test3 Failover" | NC localhost 5140 && echo "idoall.org test4 Failover" | NC localhost 5140

K in the M2 sink window, we can see the following information, because the priority of the relationship, log messages will again fall on the M2:

14/08/10 15:09:47 INFO node. application:starting Sink K1 14/08/10 15:09:47 INFO node. Application:starting source R1 14/08/10 15:09:47 INFO source.
Avrosource:starting Avro source R1: {bindaddress:0.0.0.0, port:5555} ... 14/08/10 15:09:47 INFO Instrumentation.
monitoredcountergroup:monitored Counter Group for Type:source, name:r1:Successfully registered new MBean. 14/08/10 15:09:47 INFO Instrumentation. Monitoredcountergroup:component Type:source, Name:r1 started 14/08/10 INFO SOURCE.
Avrosource:avro source R1 started. 14/08/10 15:09:54 INFO IPC. Nettyserver: [id:0x96615732,/192.168.1.51:48741 =&gt;/192.168.1.51:5555] OPEN 14/08/10 INFO IPC. Nettyserver: [id:0x96615732,/192.168.1.51:48741 =&gt;/192.168.1.51:5555] BOUND:/192.168.1.51:5555 14/08/10 15:09:54 INFO IPC. Nettyserver: [id:0x96615732,/192.168.1.51:48741 =&gt;/192.168.1.51:5555] CONNECTED:/192.168.1.51:48741 14/08/10 15:09:57 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.stAtus=invalid, facility=0} body:69 $6F 6C 6C 2E 6F idoall.org test2} 14/08/10 15:10:43 INFO Ipc. Nettyserver: [id:0x12621f9a,/192.168.1.50:38166 =&gt;/192.168.1.51:5555] OPEN 14/08/10 INFO IPC. Nettyserver: [id:0x12621f9a,/192.168.1.50:38166 =&gt;/192.168.1.51:5555] BOUND:/192.168.1.51:5555 14/08/10 15:10:43 INFO IPC. Nettyserver: [id:0x12621f9a,/192.168.1.50:38166 =&gt;/192.168.1.51:5555] CONNECTED:/192.168.1.50:38166 14/08/10 15:10:43 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 idoall.org test3} 14/08/10 15:10:43 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20
 Idoall.org Test4}

&NBSP
11) Case 11:load balancing Sink Processor
Load balance type and failover different places are, the load balance has two configurations, one is polling, one One is random. In both cases, if the selected sink is not available, it will automatically attempt to send to the next available sink.
&NBSP
A to create the Load_balancing_sink_processors configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors.conf
 
a1.sources = R1
a1.sinks = K1 K2
a1.channels = C1
 
#这个是配置Load balancing key, need to have a sink group
a1.sinkgroups = G1
A1.sinkgroups.g1.sinks = K1 K2
a1.sinkgroups.g1.processor.type = load_balance
A1.sinkgroups.g1.processor.backoff = True
a1.sinkgroups.g1.processor.selector = Round_robin
 
# describe/ Configure the source
A1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.channels = C1
 
 
# Describe the sink
a1.sinks.k1.type = Avro A1.sinks.k1.channel = C1 a1.sinks.k1.hostname
= m1< C18/>a1.sinks.k1.port = 5555
 
a1.sinks.k2.type = Avro
A1.sinks.k2.channel = C1 a1.sinks.k2.hostname
= m2
A1.sinks.k2.port = 5555
 
# Use a channel which buffers events in memory
A1.channels.c1.type = Memory
A 1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

(b) Create Load_balancing_sink_processors_avro configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf
 
a1.sources = R1
a1.sinks = K1
a1.channels = C1
 
# describe/configure the source
a1.sources.r1.type = avro< C6/>a1.sources.r1.channels = C1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5555
 
# Describe the sink< C10/>a1.sinks.k1.type = Logger
 
# Use a channel which buffers events in memory
a1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity =
 
# Bind The source and sink to the Channe L
a1.sources.r1.channels = C1
A1.sinks.k1.channel = C1

(c) Copy 2 profiles to m2

root@m1:/home/hadoop/flume-1.5.0-bin# Scp-r/HOME/HADOOP/FLUME-1.5.0-BIN/CONF/LOAD_BALANCING_SINK_ processors.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors.conf
Root@m1:/home /hadoop/flume-1.5.0-bin# scp-r/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf root@m2 :/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf

D open 4 windows and start two flume agents on M1 and M2 at the same time

root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf-n A1-dflume.root.logger=info, Console
root@m1:/home/hadoop#/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c.-f/home/hadoop/ Flume-1.5.0-bin/conf/load_balancing_sink_processors.conf-n A1-dflume.root.logger=info,console

e then on any machine in M1 or M2, test produces log, one line input, input too fast, easy to fall on a machine

root@m1:/home/hadoop# echo "idoall.org test1" | NC localhost 5140
root@m1:/home/hadoop# echo "idoall.org test2" | nc localhost 5140 root@m1:/home/hadoop#
echo "I doall.org Test3 "| NC localhost 5140
root@m1:/home/hadoop# echo "idoall.org test4" | nc localhost 5140

f) In the M1 sink window, you can see the following information:

14/08/10 15:35:29 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 idoall.org test2}
14/08/10 15:35:33 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 Idoall.org Test4}

g) In the M2 sink window, you can see the following information:

14/08/10 15:35:27 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 idoall.org test1}
14/08/10 15:35:29 INFO sink. Loggersink:event: {headers:{severity=0, Flume.syslog.status=invalid, facility=0} body:69 6F 6C 6C 2E 6F 72 67 20 Idoall.org Test3}

Explains that polling mode works.

12) Case 12:hbase sink

(a) Before testing, please refer to "ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1 Distributed environment Deployment" to launch HBase

b) and then copy the following files into the flume:

 cp/home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar/home/hadoop/ Flume-1.5.0-bin/lib cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar/home/hadoop/ Flume-1.5.0-bin/lib cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar/home/hadoop/ Flume-1.5.0-bin/lib cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar/home/hadoop/ Flume-1.5.0-bin/lib cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar/home/hadoop/ Flume-1.5.0-bin/lib Cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar/home/hadoop /flume-1.5.0-bin/lib Cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar/home/hadoop /flume-1.5.0-bin/lib@@@ cp/home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar/home/hadoop/ Flume-1.5.0-bin/lib 

C) Ensure that the test_idoall_org table already exists in HBase, test_idoall_org table format and fields refer to the ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+ hbase0.96.2+hive0.13.1 the HBase section in Distributed environment deployment.
&NBSP
D to create the Hbase_simple configuration file in M1

root@m1:/home/hadoop# vi/home/hadoop/flume-1.5.0-bin/conf/hbase_simple.conf
 
a1.sources = R1
a1.sinks = K1
a1.channels = C1
 
# describe/configure the source
a1.sources.r1.type = syslogtcp
A1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = C1
 
# Describe the sink
a1.sinks.k1.type = Logger
A1.sinks.k1.type = hbase
a1.sinks.k1.table = test_idoall_org a1.sinks.k1.columnFamily
= name< C14/>a1.sinks.k1.column = Idoall
A1.sinks.k1.serializer = Org.apache.flume.sink.hbase.RegexHbaseEventSerializer
A1.sinks.k1.channel = memorychannel
 
# Use a channel Which buffers events in memory
a1.channels.c1.type = memory a1.channels.c1.capacity
=
1000 a1.channels.c1.transactionCapacity =
 
# Bind The source and sink to the channel
a1.sources.r1.channels = c1
   a1.sinks.k1.channel = C1

e) Start Flume agent

/home/hadoop/flume-1.5.0-bin/bin/flume-ng agent-c. -f/home/hadoop/flume-1.5.0-bin/conf/hbase_simple.conf-n A1-dflume.root.logger=info,console

f) Test produces syslog

root@m1:/home/hadoop# echo "Hello idoall.org from flume" | NC localhost 5140

g When you log in to HBase, you can find that the new data has been inserted

root@m1:/home/hadoop#/home/hadoop/hbase-0.96.2-hadoop2/bin/hbase Shell 2014-08-10 16:09:48,984 INFO [main] Configuration.deprecation:hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell;
Enter ' help&lt;return&gt; ' for list of supported commands. Type "exit&lt;return&gt;" to leave the HBase Shell Version 0.96.2-hadoop2, r1581096, Mon Mar 16:03:18 PDT 2014 HBase (                                                                                                         
Main):001:0&gt; list TABLE
Slf4j:class path contains multiple slf4j bindings. Slf4j:found Binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/ Staticloggerbinder.class] Slf4j:found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/ Slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class] Slf4j:see http://www.slf4j.org/codes.html#
Multiple_bindings for a explanation.                              Hbase2hive_idoall                                                                     
Hive2hbase_idoall                                                                                                    
test_idoall_org 3 row (s) in 2.6880 seconds =&gt; ["Hbase2hive_idoall", "hive2                                                                           
 Hbase_idoall "," test_idoall_org "] hbase (main):002:0&gt; scan" test_idoall_org "ROW Column+cell 10086 Column=name:idoall
 
, timestamp=1406424831473, Value=idoallvalue 1 row (s) in 0.0550 seconds                                                                           
 HBase (main):003:0&gt; scan "test_idoall_org" ROW Column+cell 10086 Column=name:idoall, timestamp=1406424831473, value=             Idoallvalue                                    
 1407658495588-xbqcozrkk8-0 Column=name:payload, timestamp=1407658498203 , Value=hello idoall.org from Flume 2 row (s) in 0.0200 seconds HBase (main): 004:
 0&gt; quit

After so many flume examples of testing, if you are all done, you will find that the flume function is really strong, you can carry out a variety of matching to complete the work you want, as the saying goes, self-cultivation in the individual, how can combine your product business, will flume better application, Go ahead and practice.
&NBSP
This article as a note, hoping to be able to help students just getting started.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.