Big data "Eight" flume deployment

Source: Internet
Author: User
Tags syslog hdfs dfs

If you say that the distributed collection logs in Big data are used, you can fully answer flume! (Interview be careful to ask OH)

First of all, a copy of this server file to the target server, the destination server needs the IP and password:

Command: SCP filename IP: Destination path

An overview

Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).

Flume provides the ability to simply process data and write to a variety of data recipients (customizable) flume provided from the console (console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), Syslog (syslog system, supports 2 modes such as TCP and UDP), the ability to collect data on data sources such as exec (command execution).

The current flume has two versions of the Flume 0.9X version of collectively known as the flume-og,flume1.x version of Flume-ng. Since Flume-ng has undergone significant refactoring, it is very different from the flume-og and should be differentiated when used.

The FLUME-OG uses a multi-master approach. To ensure consistency of configuration data, Flume introduces zookeeper for saving configuration data, zookeeper itself guarantees consistency and high availability of configuration data, and zookeeper can notify Flume master nodes when configuration data changes. Flume Master synchronizes data using the gossip protocol.

The most obvious change in Flume-ng is the removal of the Master and Zookeeper of the centrally managed configuration into a purely transfer tool. Flume-ng Another major difference is that reading data and writing out data are now handled by different worker threads (called Runner). In Flume-og, the read-in thread does the same job (except for failure retries). If written slowly (not completely failed), it will block Flume's ability to receive data. This asynchronous design allows the read-in thread to work smoothly without having to focus on any downstream problems.

Flume is the smallest independent operating unit of the agent. An agent is a JVM. The single agent consists of three components: source, sink, and channel.

Two-start flume cluster

1 ' first, start the Hadoop cluster (see the previous blog for details).

2 ' second, (all the rest of the steps need to be done on master ) to install and configure the Flume task, which reads as follows:

Unzip the flume installation package to the/usr/cstor directory and change the user of the Flume directory to Root:root.

        TAR-ZXVF flume-1.5.2.tar.gz-c/usr/cstor

Chown-r Root:root/usr/cstor/flume

      

3 ' go to the Extract directory, Create a new test.conf file in the Conf directory and add the following configuration content:

1 #定义agent中各组件名称2agent1.sources=Source13agent1.sinks=Sink14agent1.channels=Channel15 # Configuration parameters for Source1 components6Agent1.sources.source1.type=exec7#此处的文件/home/Source.log need to be generated manually, see follow-up instructions8Agent1.sources.source1.command=tail-n +0-f/home/Source.log9 # Configuration parameters for Channel1TenAgent1.channels.channel1.type=Memory Oneagent1.channels.channel1.capacity=1000 Aagent1.channels.channel1.transactioncapactiy=100 - # Configuration parameters for Sink1 -Agent1.sinks.sink1.type=HDFs theAgent1.sinks.sink1.hdfs.path=hdfs://Master:8020/flume/data -Agent1.sinks.sink1.hdfs.filetype=DataStream - #时间类型 -agent1.sinks.sink1.hdfs.uselocaltimestamp=true +agent1.sinks.sink1.hdfs.writeformat=TEXT - #文件前缀 +agent1.sinks.sink1.hdfs.fileprefix=%y-%m-%d-%h-%M A #60秒滚动生成一个文件 atAgent1.sinks.sink1.hdfs.rollinterval=60 - #HDFS块副本数 -Agent1.sinks.sink1.hdfs.minblockreplicas=1 - #不根据文件大小滚动文件 -Agent1.sinks.sink1.hdfs.rollsize=0 - #不根据消息条数滚动文件 inAgent1.sinks.sink1.hdfs.rollcount=0 - #不根据多长时间未收到消息滚动文件 toAgent1.sinks.sink1.hdfs.idletimeout=0 + # bind source and sink to channel -agent1.sources.source1.channels=Channel1 theAgent1.sinks.sink1.channel=channel1

4 ' Then, create the/flume/data directory on HDFs:

      Cd/usr/cstor/hadoop/bin

./hdfs Dfs-mkdir/flume

./hdfs Dfs-mkdir/flume/data

5 ' Finally, enter the bin directory of the Flume installation

      Cd/usr/cstor/flume/bin

6 ' Start flume and start collecting log information.

     ./flume-ng Agent--conf conf--conf-file/usr/cstor/flume/conf/test.conf--name Agent1-dflume.root.logger=debug, Console

      !!! >> Running this command sometimes comes with a permissions issue that requires a command chmod o+x flume-ng

If normal operation, the most regret display started,

Three collection logs

1 ' After successful start, you need to manually generate the message source is the/home/source.log in the configuration file , use the following command to write text to /home/source.log :

      

2 ' so you can see the results of the build:

      

Summary:

This is just the configuration of flume, and then a simple read-write log. To go deeper, collect more complex, larger logs .

      

Big data "Eight" flume deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.