Flume-ng Configuration

Source: Internet
Author: User

1) Introduction

Flume is a distributed, reliable, and highly available system for aggregating massive logs. It supports customization of various data senders in the system for data collection. Flume also provides simple data processing, and write the capabilities of various data receivers (customizable.

Design goals:
(1) Reliability
When a node fails, logs can be transferred to other nodes without being lost. Flume provides three levels of reliability assurance, from strong to weak: end-to-end (after receiving the data agent, the event is first written to the disk. when the data is transmitted successfully, delete the data. If the data fails to be sent, resend the data .), Store on failure (when the data receiver crashes, the data is written to the local device and continues to be sent after recovery), Best effort (the data is not confirmed after it is sent to the receiver ).
(2) scalability
Flume uses a three-tier architecture, namely agent, collector, and storage. Each layer can be horizontally expanded. Among them, all agents and collector are centrally managed by the master, which makes the system easy to monitor and maintain, and the master allows multiple (using ZooKeeper for management and load balancing ), this avoids spof.
(3) manageability
All agents and colletors are centrally managed by the master, which makes the system easy to maintain. In the case of multiple masters, Flume uses ZooKeeper and gossip to ensure dynamic configuration data consistency. You can view the execution of each data source or data stream on the master, and configure and dynamically load each data source. Flume provides two forms of web and shell script command to manage data streams.
(4) Functional scalability
You can add your own agent, collector, or storage as needed. In addition, Flume comes with many components, including various agents (such as file and syslog), collector and storage (such as File, HDFS, and HBase ).

2) Configuration

Hadoop and hbase have been configured before. Therefore, you need to start hadoop and hbase before writing files into hdfs and hbase. For the configuration of hadoop-2.2.0 and hbase-0.96.0, see distributed configuration Hadoop-2.2.0 in Ubuntu and CentOS distributed environment installation HBase-0.96.0.

The configuration environment is two test clusters with centos installed. The machine with the master host name is responsible for log collection, and the machine with the node host name is responsible for log writing. There are three write modes for this configuration: Write to common directory and write to hdfs.

First download the flume-ng binary compressed file. Address: http://flume.apache.org/download.html. After downloading the file, decompress the file. First, edit the/etc/profile file and add the following lines in it:

  1. Export FLUME_HOME =/home/aaron/apache-flume-1.4.0-bin
  2. Export FLUME_CONF_DIR = $ FLUME_HOME/conf
  3. Export PATH = $ PATH: $ FLUME_HOME/bin
export FLUME_HOME=/home/aaron/apache-flume-1.4.0-binexport FLUME_CONF_DIR=$FLUME_HOME/confexport PATH=$PATH:$FLUME_HOME/bin

Run the $ souce/etc/profile command to make the modification take effect.

In the conf directory of the flume folder on the master, create a new flume-master.conf file with the following content:

  1. Agent. sources = seqGenSrc
  2. Agent. channels = memoryChannel
  3. Agent. sinks = remoteSink
  4. # For each one of the sources, the type is defined
  5. Agent. sources. seqGenSrc. type = exec
  6. Agent. sources. seqGenSrc. command = tail-F/home/aaron/test
  7. # The channel can be defined as follows.
  8. Agent. sources. seqGenSrc. channels = memoryChannel
  9. # Each sink's type must be defined
  10. Agent. sinks. loggerSink. type = logger
  11. # Specify the channel the sink shoshould use
  12. Agent. sinks. loggerSink. channel = memoryChannel
  13. # Each channel's type is defined.
  14. Agent. channels. memoryChannel. type = memory
  15. # Other config values specific to each type of channel (sink or source)
  16. # Can be defined as well
  17. # In this case, it specifies the capacity of the memory channel
  18. Agent. channels. memoryChannel. capacity = 100
  19. Agent. channels. memoryChannel. keep-alive = 100
  20. Agent. sinks. remoteSink. type = avro
  21. Agent. sinks. remoteSink. hostname = node
  22. Agent. sinks. remoteSink. port = 23004
  23. Agent. sinks. remoteSink. channel = memoryChannel
agent.sources = seqGenSrcagent.channels = memoryChannelagent.sinks = remoteSink# For each one of the sources, the type is definedagent.sources.seqGenSrc.type = execagent.sources.seqGenSrc.command = tail -F /home/aaron/test# The channel can be defined as follows.agent.sources.seqGenSrc.channels = memoryChannel# Each sink's type must be definedagent.sinks.loggerSink.type = logger#Specify the channel the sink should useagent.sinks.loggerSink.channel = memoryChannel# Each channel's type is defined.agent.channels.memoryChannel.type = memory# Other config values specific to each type of channel(sink or source)# can be defined as well# In this case, it specifies the capacity of the memory channelagent.channels.memoryChannel.capacity = 100agent.channels.memoryChannel.keep-alive = 100agent.sinks.remoteSink.type = avroagent.sinks.remoteSink.hostname = nodeagent.sinks.remoteSink.port = 23004agent.sinks.remoteSink.channel = memoryChannel

Add the above configuration to the/etc/profile file on the node machine. Then, create a new flume-node.conf file in conf and modify it as follows:

  1. Agent. sources = seqGenSrc1
  2. Agent. channels = memoryChannel
  3. # Agent. sinks = fileSink
  4. Agent. sinks = <SPANstyle = "FONT-FAMILY: Arial, Helvetica, sans-serif"> fileSink </SPAN>
  5. # For each one of the sources, the type is defined
  6. Agent. sources. seqGenSrc1.type = avro
  7. Agent. sources. seqGenSrc1.bind = node
  8. Agent. sources. seqGenSrc1.port = 23004
  9. # The channel can be defined as follows.
  10. Agent. sources. seqGenSrc1.channels = memoryChannel
  11. # Each sink's type must be defined
  12. Agent. sinks. loggerSink. type = logger
  13. # Specify the channel the sink shoshould use
  14. Agent. sinks. loggerSink. channel = memoryChannel
  15. # Each channel's type is defined.
  16. Agent. channels. memoryChannel. type = memory
  17. # Other config values specific to each type of channel (sink or source)
  18. # Can be defined as well
  19. # In this case, it specifies the capacity of the memory channel
  20. Agent. channels. memoryChannel. capacity = 100
  21. Agent. channels. memoryChannel. keep-alive = 100
  22. Agent. sources. flieSink. type = avro
  23. Agent. sources. fileSink. channel = memoryChannel
  24. Agent. sources. fileSink. sink. directory =/home/aaron/
  25. Agent. sources. fileSink. serializer. appendNewline = true
agent.sources = seqGenSrc1agent.channels = memoryChannel#agent.sinks = fileSinkagent.sinks = fileSink# For each one of the sources, the type is definedagent.sources.seqGenSrc1.type = avro agent.sources.seqGenSrc1.bind = nodeagent.sources.seqGenSrc1.port = 23004 # The channel can be defined as follows.agent.sources.seqGenSrc1.channels = memoryChannel# Each sink's type must be definedagent.sinks.loggerSink.type = logger#Specify the channel the sink should useagent.sinks.loggerSink.channel = memoryChannel# Each channel's type is defined.agent.channels.memoryChannel.type = memory# Other config values specific to each type of channel(sink or source)# can be defined as well# In this case, it specifies the capacity of the memory channelagent.channels.memoryChannel.capacity = 100agent.channels.memoryChannel.keep-alive = 100agent.sources.flieSink.type = avroagent.sources.fileSink.channel = memoryChannelagent.sources.fileSink.sink.directory = /home/aaron/agent.sources.fileSink.serializer.appendNewline = true

Run the following command on the master node:

  1. $ Bin/flume-ng agent -- conf./conf/-f conf/flume-maste.conf-Dflume. root. logger = DEBUG, console-n agent
$ bin/flume-ng agent --conf ./conf/ -f conf/flume-maste.conf -Dflume.root.logger=DEBUG,console -n agent

Run the following command on node:

  1. $ Bin/flume-ng agent -- conf./conf/-f conf/flume-node.conf-Dflume. root. logger = DEBUG, console-n agent
$ bin/flume-ng agent --conf ./conf/ -f conf/flume-node.conf -Dflume.root.logger=DEBUG,console -n agent

After the startup, you can find that the two can communicate with each other, and the files on the master can be sent to the node. Modify the test file on the master and append the content later, node can also receive messages.

If you want to write content to hadoop, make the following changes to the flume-node.conf file in node:

  1. Agent. sinks = k2
  2. Agent. sinks. k2.type = hdfs
  3. Agent. sinks. k2.channel = memoryChannel
  4. Agent. sinks. k2.hdfs. path = hdfs: // master: 8089/hbase
  5. Agent. sinks. k2.hdfs. fileType = DataStream
  6. Agent. sinks. k2.hdfs. writeFormat = Text
agent.sinks = k2agent.sinks.k2.type = hdfsagent.sinks.k2.channel = memoryChannelagent.sinks.k2.hdfs.path = hdfs://master:8089/hbaseagent.sinks.k2.hdfs.fileType = DataStreamagent.sinks.k2.hdfs.writeFormat = Text

 

Among them, hdfs: // master: 8089/hbase is the hdfs file path of hadoop.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.