Common cluster configuration cases of Flume data acquisition

Source: Internet
Author: User
Tags builtin log log hdfs dfs

[TOC]

Non-clustered configuration

This situation is not cluster configuration, relatively simple, you can directly refer to my collation of the "Flume notes", the basic structure of the following:

Flume multiple agents of a cluster a source structure description

The structure diagram is as follows:

The description is as follows:

即可以把我们的Agent部署在不同的节点上,上面是两个Agent的情况。其中Agent foo可以部署在日志产生的节点上,比如,可以是我们web服务器例如tomcat或者nginx的节点上,foo的source可以配置为监控日志文件数据的变化,channel则可以基于内存或基于文件进行存储,而sink即日志落地可以配置为avro,即输出到下一个Agent中。Agent bar可以部署在另一个节点上,当然跟foo在同一个节点也是没有问题,因为本身Flume是可以多个实例在同一个节点上运行的。bar主要作用是收集来自不同avro source的节点的日志数据,实际上,如果我们的web环境是集群的,那么web服务器就会有多个节点,这时就有多个web服务器节点产生日志,我们需要在这多个web服务器上都部署agent,此时,bar的source就会有多个,后面的案例正是如此,不过在这个小节中,只讨论多个agent一个source的情况。而对于agent bar的数据下沉方式,也是可以选择多种方式,详细可以参考官网文档,这里选择sink为HDFS。不过需要注意的是,在agent foo中,source只有一个,在后面的案例中,会配置多个source,即在这一个agent中,可以采集不同的日志文件,后面要讨论的多个source,指的是多个不同日志文件的来源,即foo中的多个source,例如data-access.log、data-ugctail.log、data-ugchead.log等等。
Configuring the Case Environment description

As follows:

That is, there are two nodes:

uplooking01:其中的日志文件 /home/uplooking/data/data-clean/data-access.log为web服务器生成的用户访问日志,并且每天会产生一个新的日志文件。在这个节点上,我们需要部署一个Flume的Agent,其source为该日志文件,sink为avro。uplooking03:这个节点的作用主要是收集来自不同Flume Agent的日志输出数据,例如上面的agent,然后输出到HDFS中。说明:在我的环境中,有uplooking01 uplooking02 uplooking03三个节点,并且三个节点配置了Hadoop集群。
Configuration
    • Uplooking01
#############################################################主要作用是监听文件中的新增数据,采集到数据之后,输出到avro##    注意:Flume agent的运行,主要就是配置source channel sink##  下面的a1就是agent的代号,source叫r1 channel叫c1 sink叫k1#########################################################a1.sources = r1a1.sinks = k1a1.channels = c1#对于source的配置描述 监听文件中的新增数据 execa1.sources.r1.type = execa1.sources.r1.command  = tail -F /home/uplooking/data/data-clean/data-access.log#对于sink的配置描述 使用avro日志做数据的消费a1.sinks.k1.type = avroa1.sinks.k1.hostname = uplooking03a1.sinks.k1.port = 44444#对于channel的配置描述 使用文件做数据的临时缓存 这种的安全性要高a1.channels.c1.type = filea1.channels.c1.checkpointDir = /home/uplooking/data/flume/checkpointa1.channels.c1.dataDirs = /home/uplooking/data/flume/data#通过channel c1将source r1和sink k1关联起来a1.sources.r1.channels = c1a1.sinks.k1.channel = c1
    • Uplooking03
############################################################ #主要作用是监听avro, after data acquisition, output to hdfs## Note: Flume agent operation, The main is to configure the source channel sink## below the A1 is the agent's code name, source called R1 Channel called C1 sink called k1########################################### ############# #a1. Sources = R1a1.sinks = K1a1.channels = c1# Configuration description for source listener Avroa1.sources.r1.type = Avroa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 44444# for sink configuration describes consumption A1.sinks.k1.type using log log to do data Hdfsa1.sinks.k1.hdfs.path =/input/data-clean/access/%y/%m/%da1.sinks.k1.hdfs.fileprefix = Flumea1.sinks.k1.hdfs.fileSuffix =. Loga1.sinks.k1.hdfs.inUsePrefix = TmpFlumea1.sinks.k1.hdfs.inUseSuffix =. Tmpa1.sinks.k1.hdfs.useLocalTimeStamp = Truea1.sinks.k1.hdfs.round = Truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundunit = second# After configuring the following two items, the data saved to HDFs is the text # Otherwise, when viewed through HDFs dfs-text, Shows a compressed 16 binary A1.sinks.k1.hdfs.serializer = TEXTa1.sinks.k1.hdfs.fileType = datastream# Configuration description for Channel Temporary caching of data using memory buffer area A1.channels.c1.type = Memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Associates source R1 and sink K1 with Channel C1 a1.sources.r1.channels = C1a1.sinks.k1.channel = C1
Test

The first thing to make sure is that the log is generated and its output is /home/uplooking/data/data-clean/data-access.log .

Start the flume Agent on uplooking03:

[[email protected] flume]$ flume-ng agent -n a1 -c conf --conf-file conf/flume-source-avro.conf -Dflume.root.logger=INFO,console

Start the flume Agent on UPLOOKING01:

flume-ng agent -n a1 -c conf --conf-file conf/flume-sink-avro.conf -Dflume.root.logger=INFO,console

After some time, you can see the log file written in HDFs:

[[email protected] ~]$ hdfs dfs -ls /input/data-clean/access/18/04/0718/04/07 08:52:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 26 items-rw-r--r--   3 uplooking supergroup       1131 2018-04-07 08:50 /input/data-clean/access/18/04/07/flume.1523062248369.log-rw-r--r--   3 uplooking supergroup       1183 2018-04-07 08:50 /input/data-clean/access/18/04/07/flume.1523062248370.log-rw-r--r--   3 uplooking supergroup       1176 2018-04-07 08:50 /input/data-clean/access/18/04/07/flume.1523062248371.log......

To view the data in a file:

[[email protected] ~]$ HDFs dfs-text/input/data-clean/access/18/04/07/flume.1523062248369.log18/04/07 08:55:07 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable100 0 220.194.55.244 null 40604 0 post/check/init http/1.1 null mozilla/5.0 (Windows NT 6.1;    WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.115 safari/537.3 15230622363681002 221.8.9.6 80 886a1533-38ca-466c-86e1-0b84022f781b 20201 1 get/top http/1.0 null mozilla/5.0 (Windows NT 6.1;   WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.115 safari/537.3 15230622368691002 61.172.249.96 99fb19c4-ec59-4abd-899c-4059dea39ead 0 0 post/updatebyid?id=21 http/1.1 408 null mozilla/5.0 (Win dows NT 6.1; WOW64; trident/7.0; rv:11.0) like Gecko 15230622373701003 61.172.249.96 886a1533-38ca-466c-86e1-0b84022f781b 10022 1 get/tologin http/1.1 null/update/pass mozilla/5.0 (Windows; U    Windows NT 5.1) gecko/20070309 firefox/2.0.0.3 15230622378711003 125.39.129.67 6839fff8-7b3a-48f5-90cd-0f45c7be1aeb 10022 1 get/tologin http/1.0 408 null mozilla/5.0 (Windows; U    Windows NT 5.1) gecko/20070309 firefox/2.0.0.3 15230622383721000 61.172.249.96 89019ae0-6140-4e5a-9061-e3af74f3e4a8 10022 1 post/stat http/1.1 null/passpword/getbyid?id=11 mozilla/4.0 (compatible; MSIE 5.0; WindowsNT) 1523062238873

If the flume agent in UPLOOKING03 does not configure Hdfs.serializer=text and Hdfs.filetype=datastream, then the data shown above will be 16 binary data.

Flume Multi-agent multiple source structure description of the cluster

As follows:

Configuring the Case Environment description

In our environment, the following:

即在我们的环境中,日志源有三份,分别是data-access.log、data-ugchead.log、data-ugctail.log不过在下面的实际配置中,日志源的agent我们只使用两个,uplooking01和uplooking02,它们的sink都输出到uplooking03的source中。
Configuration

uplooking01And uplooking02 the configuration is the same, as follows:

############################################################ #主要作用是监听文件中的新增数据, after collecting data, print in console # # Note: Flume agent is running, The main is to configure the source channel sink## below the A1 is the agent's code name, source called R1 Channel called C1 sink called k1########################################### ############# #a1. Sources = R1 r2 r3a1.sinks = K1a1.channels = c1# Configuration description for source R1 New data in the Listener file Execa1.sources.r1.type = Exec A1.sources.r1.command = tail-f/home/uplooking/data/data-clean/data-access.loga1.sources.r1.interceptors = I1 I2a1.sources.r1.interceptors.i1.type = static# #静态的在header中添加一个key value, the following is configured with two interceptors, I1 and I2a1.sources.r1.interceptors.i1.key = Typea1.sources.r1.interceptors.i1.value = Accessa1.sources.r1.interceptors.i2.type = timestamp## Timestamp: If configured here, the Flume agent responsible for collecting the logs centrally does not need to configure # # A1.sinks.k1.hdfs.useLocalTimeStamp = True also can get time information through these%y/%m/%d # # This way, you can reduce the burden of the flume agent that collects the logs centrally, Because the time information at this point can be obtained directly from source # for the configuration description of the source R2 the new data in the Listener file Execa1.sources.r2.type = Execa1.sources.r2.command = Tail-f/home/ Uplooking/data/data-clean/data-ugchead.loga1. sources.r2.interceptors = I1 I2a1.sources.r2.interceptors.i1.type = static# #静态的在header中添加一个key value, the following is a two interceptor configured, I1 and I2a1.sources.r2.interceptors.i1.key = Typea1.sources.r2.interceptors.i1.value = Ugcheada1.sources.r2.interceptors.i2.type = timestamp# for the configuration of the source R3 describes the new data in the Listener file Execa1.sources.r3.type = Execa1.sources.r3.command = tail-f/home/uplooking/data/data-clean/data-ugctail.loga1.sources.r3.interceptors = I1 I2a1.sources.r3.interceptors.i1.type = static# #静态的在header中添加一个key value, the following is configured with two interceptors, I1 and I2a1.sources.r3.interceptors.i1.key = Typea1.sources.r3.interceptors.i1.value = Ugctaila1.sources.r3.interceptors.i2.type = timestamp# for sink configuration describes consumption A1.sinks.k1.type using Avro log to do the data = Avroa1.sinks.k1.hostname = Uplooking03a1.sinks.k1.port = 44444# for channel configuration describes a temporary cache that uses files to do data This security is high a1.channels.c1.type = Filea1.channels.c1.checkpointDir =/home/uplooking/data/flume/ Checkpointa1.channels.c1.dataDirs =/home/uplooking/data/flume/data# via Channel C1 will source R1 R2 R3 and sink K1 is associated with a1.sources.r1.channels. = C1a1.sources.r2.channels = C1a1.sources.r3.channels = C1a1.sinks.k1.channel = C1 

The

uplooking03 is configured as follows:

############################################################ #主要作用是监听avro, after data acquisition, output to hdfs## Note: Flume agent operation, The main is to configure the source channel sink## below the A1 is the agent's code name, source called R1 Channel called C1 sink called k1########################################### ############# #a1. Sources = R1a1.sinks = K1a1.channels = c1# Configuration description for source listener Avroa1.sources.r1.type = Avroa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 44444# for sink configuration describes consumption A1.sinks.k1.type using log log to do data Hdfsa1.sinks.k1.hdfs.path =/input/data-clean/%{type}/%y/%m/%da1.sinks.k1.hdfs.fileprefix =%{type} A1.sinks.k1.hdfs.fileSuffix =. Loga1.sinks.k1.hdfs.inUseSuffix =. Tmpa1.sinks.k1.hdfs.round = Truea1.sinks.k1.hdfs.rollInterval = 0a1.sinks.k1.hdfs.rollcount = 0a1.sinks.k1.hdfs.rollsize = 10485760# If you want the log file scrolling policy configured above to take effect, you must configure the following item A1.sinks.k1.hdfs.minBlockReplicas = # configuration After two items, the data saved in HDFs is text # Otherwise when viewed through HDFs dfs-text , which shows a compressed 16 binary A1.sinks.k1.hdfs.serializer = TEXTa1.sinks.k1.hdfs.fileType = datastream# Configuration description for Channel Temporary caching of data using memory buffer area A1.channels.c1.type = Memorya1.channels. c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# link source R1 and sink K1 by Channel C1 a1.sources.r1.channels = C1a1.sinks.k1.channel = C1
Test

The first thing you need to do is to ensure that uplooking01 uplooking02 the logs are properly generated.

On the uplooking03 start agent:

[[email protected] flume]$ flume-ng agent -n a1 -c conf --conf-file conf/flume-source-avro.conf -Dflume.root.logger=INFO,console

Start the uplooking01 agent separately on and uplooking02 :

flume-ng agent -n a1 -c conf --conf-file conf/flume-sink-avro.conf -Dflume.root.logger=INFO,console

After some time, you can view the appropriate log files in HDFs:

$ hdfs dfs -ls /input/data-clean18/04/08 01:34:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 3 itemsdrwxr-xr-x   - uplooking supergroup          0 2018-04-07 22:00 /input/data-clean/accessdrwxr-xr-x   - uplooking supergroup          0 2018-04-07 22:00 /input/data-clean/ugcheaddrwxr-xr-x   - uplooking supergroup          0 2018-04-07 22:00 /input/data-clean/ugctail

To view a log file under a log directory:

[[email protected] data-clean]$ hdfs dfs -ls /input/data-clean/access/2018/04/0718/04/08 01:35:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 2 items-rw-r--r--   3 uplooking supergroup    2447752 2018-04-08 01:02 /input/data-clean/access/2018/04/08/access.1523116801502.log-rw-r--r--   3 uplooking supergroup       5804 2018-04-08 01:02 /input/data-clean/access/2018/04/08/access.1523120538070.log.tmp

You can see that the number of log files is very small, because the log files are scrolled in the same way that the previous agent was configured, with uplooking03 a single file full of 10M and then the Shard log file.

Common cluster configuration cases of Flume data acquisition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.