Flume-ng Configuration

Last Update:2014-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1) Introduction

Flume is a distributed, reliable, and highly available system for aggregating massive logs. It supports customization of various data senders in the system for data collection. Flume also provides simple data processing, and write the capabilities of various data receivers (customizable.

Design goals:
(1) Reliability
When a node fails, logs can be transferred to other nodes without being lost. Flume provides three levels of reliability assurance, from strong to weak: end-to-end (after receiving the data agent, the event is first written to the disk. when the data is transmitted successfully, delete the data. If the data fails to be sent, resend the data .), Store on failure (when the data receiver crashes, the data is written to the local device and continues to be sent after recovery), Best effort (the data is not confirmed after it is sent to the receiver ).
(2) scalability
Flume uses a three-tier architecture, namely agent, collector, and storage. Each layer can be horizontally expanded. Among them, all agents and collector are centrally managed by the master, which makes the system easy to monitor and maintain, and the master allows multiple (using ZooKeeper for management and load balancing ), this avoids spof.
(3) manageability
All agents and colletors are centrally managed by the master, which makes the system easy to maintain. In the case of multiple masters, Flume uses ZooKeeper and gossip to ensure dynamic configuration data consistency. You can view the execution of each data source or data stream on the master, and configure and dynamically load each data source. Flume provides two forms of web and shell script command to manage data streams.
(4) Functional scalability
You can add your own agent, collector, or storage as needed. In addition, Flume comes with many components, including various agents (such as file and syslog), collector and storage (such as File, HDFS, and HBase ).

2) Configuration

Hadoop and hbase have been configured before. Therefore, you need to start hadoop and hbase before writing files into hdfs and hbase. For the configuration of hadoop-2.2.0 and hbase-0.96.0, see distributed configuration Hadoop-2.2.0 in Ubuntu and CentOS distributed environment installation HBase-0.96.0.

The configuration environment is two test clusters with centos installed. The machine with the master host name is responsible for log collection, and the machine with the node host name is responsible for log writing. There are three write modes for this configuration: Write to common directory and write to hdfs.

First download the flume-ng binary compressed file. Address: http://flume.apache.org/download.html. After downloading the file, decompress the file. First, edit the/etc/profile file and add the following lines in it:

Export FLUME_HOME =/home/aaron/apache-flume-1.4.0-bin
Export FLUME_CONF_DIR = $ FLUME_HOME/conf
Export PATH = $ PATH: $ FLUME_HOME/bin

export FLUME_HOME=/home/aaron/apache-flume-1.4.0-binexport FLUME_CONF_DIR=$FLUME_HOME/confexport PATH=$PATH:$FLUME_HOME/bin

Run the $ souce/etc/profile command to make the modification take effect.

In the conf directory of the flume folder on the master, create a new flume-master.conf file with the following content:

Agent. sources = seqGenSrc
Agent. channels = memoryChannel
Agent. sinks = remoteSink
# For each one of the sources, the type is defined
Agent. sources. seqGenSrc. type = exec
Agent. sources. seqGenSrc. command = tail-F/home/aaron/test
# The channel can be defined as follows.
Agent. sources. seqGenSrc. channels = memoryChannel
# Each sink's type must be defined
Agent. sinks. loggerSink. type = logger
# Specify the channel the sink shoshould use
Agent. sinks. loggerSink. channel = memoryChannel
# Each channel's type is defined.
Agent. channels. memoryChannel. type = memory
# Other config values specific to each type of channel (sink or source)
# Can be defined as well
# In this case, it specifies the capacity of the memory channel
Agent. channels. memoryChannel. capacity = 100
Agent. channels. memoryChannel. keep-alive = 100
Agent. sinks. remoteSink. type = avro
Agent. sinks. remoteSink. hostname = node
Agent. sinks. remoteSink. port = 23004
Agent. sinks. remoteSink. channel = memoryChannel

agent.sources = seqGenSrcagent.channels = memoryChannelagent.sinks = remoteSink# For each one of the sources, the type is definedagent.sources.seqGenSrc.type = execagent.sources.seqGenSrc.command = tail -F /home/aaron/test# The channel can be defined as follows.agent.sources.seqGenSrc.channels = memoryChannel# Each sink's type must be definedagent.sinks.loggerSink.type = logger#Specify the channel the sink should useagent.sinks.loggerSink.channel = memoryChannel# Each channel's type is defined.agent.channels.memoryChannel.type = memory# Other config values specific to each type of channel(sink or source)# can be defined as well# In this case, it specifies the capacity of the memory channelagent.channels.memoryChannel.capacity = 100agent.channels.memoryChannel.keep-alive = 100agent.sinks.remoteSink.type = avroagent.sinks.remoteSink.hostname = nodeagent.sinks.remoteSink.port = 23004agent.sinks.remoteSink.channel = memoryChannel

Add the above configuration to the/etc/profile file on the node machine. Then, create a new flume-node.conf file in conf and modify it as follows:

Agent. sources = seqGenSrc1
Agent. channels = memoryChannel
# Agent. sinks = fileSink
Agent. sinks = <SPANstyle = "FONT-FAMILY: Arial, Helvetica, sans-serif"> fileSink </SPAN>
# For each one of the sources, the type is defined
Agent. sources. seqGenSrc1.type = avro
Agent. sources. seqGenSrc1.bind = node
Agent. sources. seqGenSrc1.port = 23004
# The channel can be defined as follows.
Agent. sources. seqGenSrc1.channels = memoryChannel
# Each sink's type must be defined
Agent. sinks. loggerSink. type = logger
# Specify the channel the sink shoshould use
Agent. sinks. loggerSink. channel = memoryChannel
# Each channel's type is defined.
Agent. channels. memoryChannel. type = memory
# Other config values specific to each type of channel (sink or source)
# Can be defined as well
# In this case, it specifies the capacity of the memory channel
Agent. channels. memoryChannel. capacity = 100
Agent. channels. memoryChannel. keep-alive = 100
Agent. sources. flieSink. type = avro
Agent. sources. fileSink. channel = memoryChannel
Agent. sources. fileSink. sink. directory =/home/aaron/
Agent. sources. fileSink. serializer. appendNewline = true

agent.sources = seqGenSrc1agent.channels = memoryChannel#agent.sinks = fileSinkagent.sinks = fileSink# For each one of the sources, the type is definedagent.sources.seqGenSrc1.type = avro agent.sources.seqGenSrc1.bind = nodeagent.sources.seqGenSrc1.port = 23004 # The channel can be defined as follows.agent.sources.seqGenSrc1.channels = memoryChannel# Each sink's type must be definedagent.sinks.loggerSink.type = logger#Specify the channel the sink should useagent.sinks.loggerSink.channel = memoryChannel# Each channel's type is defined.agent.channels.memoryChannel.type = memory# Other config values specific to each type of channel(sink or source)# can be defined as well# In this case, it specifies the capacity of the memory channelagent.channels.memoryChannel.capacity = 100agent.channels.memoryChannel.keep-alive = 100agent.sources.flieSink.type = avroagent.sources.fileSink.channel = memoryChannelagent.sources.fileSink.sink.directory = /home/aaron/agent.sources.fileSink.serializer.appendNewline = true

Run the following command on the master node:

$ Bin/flume-ng agent -- conf./conf/-f conf/flume-maste.conf-Dflume. root. logger = DEBUG, console-n agent

$ bin/flume-ng agent --conf ./conf/ -f conf/flume-maste.conf -Dflume.root.logger=DEBUG,console -n agent

Run the following command on node:

$ Bin/flume-ng agent -- conf./conf/-f conf/flume-node.conf-Dflume. root. logger = DEBUG, console-n agent

$ bin/flume-ng agent --conf ./conf/ -f conf/flume-node.conf -Dflume.root.logger=DEBUG,console -n agent

After the startup, you can find that the two can communicate with each other, and the files on the master can be sent to the node. Modify the test file on the master and append the content later, node can also receive messages.

If you want to write content to hadoop, make the following changes to the flume-node.conf file in node:

Agent. sinks = k2
Agent. sinks. k2.type = hdfs
Agent. sinks. k2.channel = memoryChannel
Agent. sinks. k2.hdfs. path = hdfs: // master: 8089/hbase
Agent. sinks. k2.hdfs. fileType = DataStream
Agent. sinks. k2.hdfs. writeFormat = Text

agent.sinks = k2agent.sinks.k2.type = hdfsagent.sinks.k2.channel = memoryChannelagent.sinks.k2.hdfs.path = hdfs://master:8089/hbaseagent.sinks.k2.hdfs.fileType = DataStreamagent.sinks.k2.hdfs.writeFormat = Text

Among them, hdfs: // master: 8089/hbase is the hdfs file path of hadoop.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Flume-ng Configuration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Flume-ng Configuration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support