Distributed Log Collection system: Flume

Last Update:2015-05-30 Source: Internet

Author: User

Tags syslog hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume Knowledge Points:

Event is a row of data
1.flume is a distributed log collection system that transmits collected data to its destination.
2.flume has a core concept, called an agent. The agent is a Java process that runs on the Log collection node.
The 3.agent consists of 3 core components: source, channel, sink.
The 3.1 source component is dedicated to collecting logs and can handle various types of log data in various formats, including Avro, thrift, exec, JMS, spooling directory, netcat, sequence generator, Syslog, HTTP, Legacy, custom.
The source component collects the data and temporarily stores it in the channel.
The 3.2 channel component is used by the agent to temporarily store data, which can be stored in memory, JDBC, file, and custom.
The data in the channel is not deleted until the sink is sent successfully.
The 3.3 sink component is a component used to send data to a destination, including HDFs, logger, Avro, thrift, IPC, file, NULL, HBase, SOLR, and custom.
4. During the entire data transfer process, the event is flowing. The transaction guarantee is at the event level.
5.flume can support multi-level flume agent, support fan-in (fan-in), fan-out (fan-out).
Fan-in refers to: source can receive multiple inputs
Fan-out refers to: sink can output multiple destinations

Flume Installation:

1. Unzip each of these two files in the node:

2. Copy the SRC content to the bin:

Cp-ri apache-flume-1.4.0-src/* apache-flume-1.4.0-bin/

3.SRC useless can be erased:

RM-RF APACHE-FLUME-1.4.0-SRC

4. Rename Apache-flume-1.4.0-bin to Flume:

MV Apache-flume-1.4.0-bin/flume

Note: The flume installation is based on the premise that you have Hadoop installed because it uses the Hadoop jar

5. Writing the configuration file example

Agent1 represents the proxy name:

Agent1.sources=source1
Agent1.sinks=sink1
Agent1.channels=channel1

Spooling directory is the monitoring of changes to new files in the specified folder, and once a new file appears, the contents of the file are parsed and then written to Channle. When the write is complete, mark the file as completed or delete the file.

Configure Source1

Agent1.sources.source1.type=spooldir
Agent1.sources.source1.spooldir=/root/hmbbs
Agent1.sources.source1.channels=channel1
Agent1.sources.source1.fileHeader = False
Agent1.sources.source1.interceptors = I1
Agent1.sources.source1.interceptors.i1.type = Timestamp

Configure SINK1

Agent1.sinks.sink1.type=hdfs
Agent1.sinks.sink1.hdfs.path=hdfs://hadoop0:9000/hmbbs
Agent1.sinks.sink1.hdfs.filetype=datastream
Agent1.sinks.sink1.hdfs.writeformat=text
Agent1.sinks.sink1.hdfs.rollinterval=1//Specified time file is closed
Agent1.sinks.sink1.channel=channel1
agent1.sinks.sink1.hdfs.fileprefix=%y-%m-%d//prefix of generated files

Configure Channel1

Agent1.channels.channel1.type=file
Backup directory
Agent1.channels.channel1.checkpointdir=/root/hmbbs_tmp/123
agent1.channels.channel1.datadirs=/root/hmbbs_tmp/

Write the file to the Flume Conf folder and name it example

6. Create a folder in the root directory Hmbbs

[Email protected]/]# Cd/root
[[email protected] ~]# ls
Anaconda-ks.cfg Documents install.log Music public Videos
Desktop Downloads install.log.syslog Pictures Templates
[Email protected] ~]# mkdir Hmbbs

7. Create a folder under Hadoop

Hadoop Fs-mkdir/hmbbs

8. Executive Flume
Enter Flume Execute command

Bin/flume-ng agent-n agent1-c conf-f conf/example-dflume.root.logger=debug,console

9. Create

[Email protected] ~]# VI Hello
[[email protected] ~]# CP Hello Hmbbs

You'll see the file transfer in HDFs.

10.

[Email protected] ~]# CD Hmbbs
[[email protected] hmbbs]# ls
hello.completed

The red section indicates that the task is complete and has been transferred to the channel, suffix. Completed is the result of renaming.

[Email protected] ~]# CD hmbbs_tmp
[[email protected] hmbbs_tmp]# ls

HMBBS_TMP represents the directory used by the channel.

[[Email protected] hmbbs_tmp]# CD 123
[[email protected] 123]# ls
Checkpoint Checkpoint.meta inflightputs Inflighttakes

The data here is backup data, if the data loss in DataDir can be recovered from here.

in the actual production is multi-node configuration, more complex, you can refer to the official documents:
http://flume.apache.org

Distributed Log Collection system: Flume

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More