Flume Introduction and Installation

Source: Internet
Author: User
Tags syslog zookeeper hadoop fs

Build the Environment

The deployment node operating system for CentOS, firewall and SELinux disabled, created a Shiyanlou user and created the/app directory under the system root directory, for storing components such as Hadoop to run the package. Because this directory is used to install components such as Hadoop, the user must give RWX permissions to the Shiyanlou (the general practice is that the root user creates the/app directory under the root directory and modifies the directory owner to Shiyanlou (chown–r Shiyanlou: Shiyanlou/app).

Hadoop Build Environment:

    • Virtual machine Operating system: CentOS6.6 64-bit, single core, 1G memory
    • JDK:1.7.0_55 64-bit
    • hadoop:1.1.2

2 Flume Introduction

Flume is a log collection system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides the ability to simply process the data and write to a variety of data-receiving parties (customizable). Flume is a distributed, reliable, and highly available system for collecting, aggregating, and transmitting large volumes of logs.

Flume features reliability, Scalability, manageability and extensibility:

    1. The Reliability:flume provides 3 data reliability options, including End-to-end, Store on failure, and best effort. The end-to-end uses a disk log and a receive-side ACK to ensure that the data received by the Flume will eventually reach the destination. Store on Failure when the destination is not available, the data will remain on the local hard drive. Unlike End-to-end, Store on failure may lose some of the data if there is a problem with the process. Best effort does not make any QoS guarantees.
    2. Scalability:flume's 3 components: Collector, Master, and storage tier are scalable. It is important to note that the handling of events in Flume does not need to be stateful, and its scalability can be easily implemented.
    3. Manageability:flume uses zookeeper and gossip to ensure consistency and high availability of configuration data. At the same time, multiple master ensures that master can manage a large number of nodes.
    4. Extensibility: Based on Java, users can add a variety of new functions for flume, such as by inheriting source, users can implement their own data access method, the implementation of sink subclasses, users can write data to a specific target, while, through the Sinkdecorator , the user can preprocess the data in a certain degree.

2.1 Flume Architecture

The most important abstraction in the flume architecture is data flow, which describes a path from which data is generated, transmitted, processed, and eventually written to the target (in which the solid line describes data flow). The agent is used to collect data and the agent is the place where the data flow is generated in the flume, and the agent transmits the resulting data stream to the collector. Correspondingly, collector is used to aggregate data, often resulting in a larger stream.

Flume provides 2 modes from console (console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), syslog (syslog system, TCP and UDP support), The ability to collect data on data sources such as exec (command execution). At the same time, the Flume data receiver can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and Syslogtcp (TCP syslog system).

Among them, the collection of data has 2 main modes of operation, as follows:

    1. Push Sources: The external system proactively pushes data to flume, such as RPC, Syslog.
    2. Polling Sources:flume to the external system, typically using polling methods such as text and exec.

Note that in Flume, the agent and collector correspond, while source and sink correspond. Source and sink emphasize the characteristics of the sender, receiver (such as data format, encoding, etc.), while the agent and collector focus on the function.

2.2 Flume Management method

Flume Master is used to manage the configuration of the data flow, such as.

To ensure scalability, Flume uses a multi-master approach. To ensure consistency of configuration data, Flume introduces zookeeper for saving configuration data, zookeeper itself guarantees consistency and high availability of configuration data, and zookeeper can notify Flume master nodes when configuration data changes. Flume Master synchronizes data using the gossip protocol.

3 Installing the Deployment flume

3.1 Flume Deployment Process

3.1.1 Download flume

can go to Apache Fund Flume official website http://flume.apache.org/download.html, select mirror http://mirrors.hust.edu.cn/apache/flume/Download a stable version, Download the flume-1.5.2-bin.tar.gz as shown:

You can also locate the installation package in the/home/shiyanlou/install-pack directory, unzip the installation package, and copy the installation package to the/app directory

    • Cd/home/shiyanlou/install-pack
    • Tar-xzf flume-1.5.2-bin.tar.gz
    • MV apache-flume-1.5.2-bin/app/flume-1.5.2

3.1.2 Setting/etc/profile Parameters

Edit the/etc/profile file, declare the home path of the flume and the path to join the bin in path:

    • Export flume_home=/app/flume-1.5.2
    • Export flume_conf_dir= $FLUME _home/conf
    • Export path= $PATH: $FLUME _home/bin

Compile the configuration file/etc/profile, and verify that it takes effect

    • Source/etc/profile
    • Echo $PATH

3.1.3 Setting up the flume-env.sh configuration file

Under $flume_home/conf, copy the renaming flume-env.sh.template to flume-env.sh, modify the conf/flume-env.sh configuration file

    • Cd/app/flume-1.5.2/conf
    • CP Flume-env.sh.template flume-env.sh
    • sudo vi flume-env.sh

To modify the configuration file contents:

    • Java_home=/app/lib/jdk1.7.0_55
    • Java_opts= "-xms100m-xmx200m-dcom.sun.management.jmxremote"

3.2 Deployment of verification

3.2.1 Verifying the installation

1. Modify the flume-conf configuration file in the $flume_home/conf directory to modify the Flume-conf.properties.template file, copy and rename to Flume-conf,

    • Cd/app/flume-1.5.2/conf
    • CP Flume-conf.properties.template Flume-conf.properties
    • sudo vi flume-conf.properties

Modify the contents of the flume-conf configuration file

# The configuration file needs to define the sources, the channels and the sinks.# Sources, channels and sinks is defined per agent, in this case called ' A1 'a1.sources = R1a1.sinks = K1a1.channels = C1# For each one of the sources, the type is definedA1.sources.r1.type = Netcata1.sources.r1.Bind= Localhosta1.sources.r1.port =44444#The channel can be defined as follows.A1.sources.r1.channels = C1# Each sink ' s type must be definedA1.sinks.k1.type = Logger#Specify The channel the sink should useA1.sinks.k1.channel = C1# Each channel ' s type is defined.A1.channels.c1.type = Memory# Other config values specific to each type of channel (sink or source)# can is defined as well# In this case, it specifies the capacity of the memory channelA1.channels.c1.capacity = +A1.channels.c1.transactionCapacity = -

2. Run under the Flume installation directory/flume-1.5.2

    • cd/app/flume-1.5.2
    • ./bin/flume-ng agent--conf./conf/--conf-file-/conf/flume-conf.properties--name A1-dflume.root.logger=info, Console

3. Open a terminal and enter the following command:

    • telnet localhost 44444
    • Hello World

Note: Run Telnet prompt "command not found" in CentOS6.5, install using sudo yum install Telnet

4. View on the original terminal, you can receive messages from Telnet

3.2.2 Test collect logs to HDFs

1. Modify the Flume-conf.properties.template file in the $flume_home/conf directory, copy and rename it to Flume-conf2.properties

    • Cd/app/flume-1.5.2/conf
    • CP Flume-conf.properties.template Flume-conf2.properties
    • sudo vi flume-conf2.properties

a1.sources = R1a1.sinks = K1a1.channels = C1a1.sources.r1.type = Execa1.sources.r1.channels = C1a1.sources.r1.command = Ta Il-f/app/hadoop-1.1. 2/logs/hadoop-shiyanlou-namenode-b393a04554e1.LogA1.sinks.k1.type = Hdfsa1.sinks.k1.channel = C1a1.sinks.k1.hdfs.path = HDFs://hadoop:9000/class12/out_flumeA1.sinks.k1.hdfs.filePrefix = Events-a1.sinks.k1.hdfs.round =trueA1.sinks.k1.hdfs.roundValue =TenA1.sinks.k1.hdfs.roundUnit = Minutea1.sinks.k1.hdfs.rollSize =4000000A1.sinks.k1.hdfs.rollCount =0A1.sinks.k1.hdfs.writeFormat = Texta1.sinks.k1.hdfs.fileType = DataStreama1.sinks.k1.hdfs.batchSize =TenA1.channels.c1.type = Memorya1.channels.c1.capacity = +A1.channels.c1.transactionCapacity = -

2. Run under the Flume installation directory/flume-1.5.2

    • cd/app/flume-1.5.2
    • ./bin/flume-ng agent--conf./conf/--conf-file./conf/flume-conf2.properties--name A1-dflume.root.logger=info, Console

3. Continuous collection of Hadoop-hadoop-namenode-hadoop1.log data written in HDFs

4. View the files in/class12/out_flume in HDFs

    • Hadoop Fs-ls/class12/out_flume
    • Hadoop fs-cat/class12/out_flume/events-.1433921305493


Source: >  

Flume Introduction and Installation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.