Distributed Log Collection system: Flume

Source: Internet
Author: User
Tags syslog hadoop fs

Flume Knowledge Points:

Event is a row of data
1.flume is a distributed log collection system that transmits collected data to its destination.
2.flume has a core concept, called an agent. The agent is a Java process that runs on the Log collection node.
The 3.agent consists of 3 core components: source, channel, sink.
The 3.1 source component is dedicated to collecting logs and can handle various types of log data in various formats, including Avro, thrift, exec, JMS, spooling directory, netcat, sequence generator, Syslog, HTTP, Legacy, custom.
The source component collects the data and temporarily stores it in the channel.
The 3.2 channel component is used by the agent to temporarily store data, which can be stored in memory, JDBC, file, and custom.
The data in the channel is not deleted until the sink is sent successfully.
The 3.3 sink component is a component used to send data to a destination, including HDFs, logger, Avro, thrift, IPC, file, NULL, HBase, SOLR, and custom.
4. During the entire data transfer process, the event is flowing. The transaction guarantee is at the event level.
5.flume can support multi-level flume agent, support fan-in (fan-in), fan-out (fan-out).
Fan-in refers to: source can receive multiple inputs
Fan-out refers to: sink can output multiple destinations

Flume Installation:

1. Unzip each of these two files in the node:

2. Copy the SRC content to the bin:

Cp-ri apache-flume-1.4.0-src/* apache-flume-1.4.0-bin/

3.SRC useless can be erased:

RM-RF APACHE-FLUME-1.4.0-SRC

4. Rename Apache-flume-1.4.0-bin to Flume:

MV Apache-flume-1.4.0-bin/flume

Note: The flume installation is based on the premise that you have Hadoop installed because it uses the Hadoop jar

5. Writing the configuration file example

Agent1 represents the proxy name:

Agent1.sources=source1
Agent1.sinks=sink1
Agent1.channels=channel1

Spooling directory is the monitoring of changes to new files in the specified folder, and once a new file appears, the contents of the file are parsed and then written to Channle. When the write is complete, mark the file as completed or delete the file.

Configure Source1

Agent1.sources.source1.type=spooldir
Agent1.sources.source1.spooldir=/root/hmbbs
Agent1.sources.source1.channels=channel1
Agent1.sources.source1.fileHeader = False
Agent1.sources.source1.interceptors = I1
Agent1.sources.source1.interceptors.i1.type = Timestamp

Configure SINK1

Agent1.sinks.sink1.type=hdfs
Agent1.sinks.sink1.hdfs.path=hdfs://hadoop0:9000/hmbbs
Agent1.sinks.sink1.hdfs.filetype=datastream
Agent1.sinks.sink1.hdfs.writeformat=text
Agent1.sinks.sink1.hdfs.rollinterval=1//Specified time file is closed
Agent1.sinks.sink1.channel=channel1
agent1.sinks.sink1.hdfs.fileprefix=%y-%m-%d//prefix of generated files

Configure Channel1

Agent1.channels.channel1.type=file
Backup directory
Agent1.channels.channel1.checkpointdir=/root/hmbbs_tmp/123
agent1.channels.channel1.datadirs=/root/hmbbs_tmp/

Write the file to the Flume Conf folder and name it example

6. Create a folder in the root directory Hmbbs

[Email protected]/]# Cd/root
[[email protected] ~]# ls
Anaconda-ks.cfg Documents install.log Music public Videos
Desktop Downloads install.log.syslog Pictures Templates
[Email protected] ~]# mkdir Hmbbs

7. Create a folder under Hadoop

Hadoop Fs-mkdir/hmbbs

8. Executive Flume
Enter Flume Execute command

Bin/flume-ng agent-n agent1-c conf-f conf/example-dflume.root.logger=debug,console

9. Create

[Email protected] ~]# VI Hello
[[email protected] ~]# CP Hello Hmbbs

You'll see the file transfer in HDFs.

10.

[Email protected] ~]# CD Hmbbs
[[email protected] hmbbs]# ls
hello.completed

The red section indicates that the task is complete and has been transferred to the channel, suffix. Completed is the result of renaming.

[Email protected] ~]# CD hmbbs_tmp
[[email protected] hmbbs_tmp]# ls

HMBBS_TMP represents the directory used by the channel.

[[Email protected] hmbbs_tmp]# CD 123
[[email protected] 123]# ls
Checkpoint Checkpoint.meta inflightputs Inflighttakes

The data here is backup data, if the data loss in DataDir can be recovered from here.

in the actual production is multi-node configuration, more complex, you can refer to the official documents:
http://flume.apache.org

Distributed Log Collection system: Flume

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.