Flume Framework Foundation

Last Update:2018-01-12 Source: Internet

Author: User

Tags log log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

* Flume Framework Foundation

Introduction to the framework:

* * Flume provides a distributed, reliable, and efficient collection, aggregation, and mobile service for large data volumes, flume can only be run in a UNIX environment.

* * Flume is based on streaming architecture, fault-tolerant, and flexible and simple, mainly for online real-time reference analysis.

Macro-Cognition:

* * Flume, Kafka for real-time data collection, Spark, Storm for real-time processing of data, Impala for real-time query.

Flume Frame Composition:

If shown, the flume schema has only one agent role node, which consists of source, Channel, sink.

Briefly describe the features of each component:

Source:source is used to collect data, where source is the data stream, and source transmits the resulting stream of data to the channel, which is somewhat similar to the channel in the Java IO section.

Channel: For bridging sources and sinks, similar to a queue.

Sink: Collects data from the channel, writes the data to the target source (can be the next source, or it can be HDFs or hbase)

Data Transmission Unit: Event

* * Event is the basic unit of Flume data transfer

* * Flume send data from source to destination as an event

* * The envent consists of an optional header and a byte array containing the data, which is opaque to flume, the header holds a collection of Key-value key-value pairs, the key is unique within a set, Headers can also be extended for use in context routing.

Flume Transfer process:

As shown, the source monitors a file, the file generates new data, gets the data, encapsulates the data in an event, puts it into the channel commit commit, the channel queue FIFO, sink to the channel queue to pull the data, It is then written to HDFs or HBase.

* Installation Flume

* * Portal: Link: http://pan.baidu.com/s/1eSOOKam Password: ll6r

* * Copy, unzip, don't repeat the

* * Configuration file

The Conf directory of flume-env.sh (rename template file on the line) file Java_home configuration, still do not repeat

* * command to use

$ bin/flume-ng, which appears as shown:

Explain:

--conf: Specifying the configuration directory

--name: Specify Agent Name

--conf-file: Specifying a specific configuration file

* Case 1: Using Flume to listen to a port, write the port to the data output

STEP1, modifying configuration files

$ cp-a conf/flume-conf.properties.template conf/flume-telnet.conf, changed to the following:

Note that the last two Channel one has s, one without

Explain:

R1: source, monitored data source, resource abbreviation

K1: sink abbreviation

C1: the channel abbreviation

STEP2, installing the telnet command

Since the command is not available by default, let's install it using the Yum command, taking care to enter the root user

# yum-y Install Telnet

STEP3, running Flume port monitoring

$ bin/flume-ng Agent--conf conf/--name A1--conf-file Conf/flume-telnet.conf-dflume.root.logger==info,console

Specify name, configuration file directory, configuration file, and output type and location, respectively.

Run

STEP4, testing

Another CRT-to-z01 interface

Execute command:

$ Netstat-an | grep 44444, used to check if Port 44444 has been successfully monitored by flume,

$ telnet localhost 44444, which is used to connect to the native 44444 port for data transmission (but also with other commands, such as netcat, etc.), where the Telnet command is made in another window, The original execution of the flume window to see if the data is successfully monitored, test

Send side:

Listening side:

, the test was successful. If you need to exit Telnet, use CTRL +], and then enter quit.

Example 2: Log file of a system framework to HDFS

STEP1, modifying configuration files

For more parameter configuration, see frustration: Http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

$ cp-a conf/flume-telnet.conf conf/flume-apache-log.conf, changed to the following:

Knowledge Supplement:

STEP2, installation httpd

# yum-y Install httpd

(Note: httpd is the main program of the Apache HTTP server.) is designed as a stand-alone background process that creates a pool of child processes or threads that process requests)

STEP3, start httpd service

CentOS 7:

# Systemctl Start Httpd.service

CentOS 6:

# service httpd Start

STEP4, modify the permissions of the HTTPd folder under the/var/log directory to facilitate access to the

# chmod 755/var/log/httpd/

# vi/var/www/html/index.html, write something,

STEP5, after executing the following command, use the browser to access the Web page to view the resulting log

$ tail-f/var/log/httpd/access_log, after multiple visits:

(Browser open: 192.168.122.200, according to its own configuration IP access can be.) ）

STEP6, copy the jar of Hadoop that the Flume relies on to its own Lib directory

cp/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/hadoop-auth-2.5.0-cdh5.3.6.jar/opt/modules/ Cdh/apache-flume-1.5.0-cdh5.3.6-bin/lib

Cp/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/commons-configuration-1.6.jar/opt/modules/cdh /apache-flume-1.5.0-cdh5.3.6-bin/lib

cp/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/mapreduce1/lib/hadoop-hdfs-2.5.0-cdh5.3.6.jar/opt/ Modules/cdh/apache-flume-1.5.0-cdh5.3.6-bin/lib

cp/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.6.jar/opt/modules/cdh/ Apache-flume-1.5.0-cdh5.3.6-bin/lib

After the copy is complete, the Flume Lib directory is as follows:

STEP7, after starting the Hadoop-related service, execute the flume-ng command

$ bin/flume-ng Agent--conf conf/--name A2--conf-file conf/flume-apache-log.conf

(Scream tip: If you want the Flume-ng command to run in the background and not continue to occupy the terminal, you can add the & symbol at the end of the command, namely:

$ bin/flume-ng Agent--conf conf/--name A2--conf-file conf/flume-apache-log.conf &)

After checking the log log of the flume, without confirming that there is no error or warn errors, refresh the index.html page to see that the log has been migrated to the HDFs cluster.

The above will achieve the Flume log collection, the other collection is similar, you can refer to the official documents in the parameters set.

* Summary

Flume is a streaming, log-capture framework, like a collector in the background, that listens to the files or directories you need to collect in real time.

Personal micro-Blog: http://weibo.com/seal13

QQ Big Data Technology Exchange Group (ADS do not enter): 476966007



Z The best
Links: https://www.jianshu.com/p/e71643e6546e
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.

Flume Framework Foundation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More