Hive Getting Started--4.flume-data collection tool

Source: Internet
Author: User
Tags scp command

Flume Introduction


Flume installation 1. Unzip the flume installation package into the/itcast/directory


TAR-ZXVF/*flume Installation package *//itcast/

2. Modify the Flume configuration file: 2.1 flume-env.sh

Modify file Name:

MV Flume-env.sh.template flume-env.sh

Added java_home to ensure that the flume JDK used is the same as HDFS (you can use echo JAVA_HOME the path to view the javahome used by the current machine)

2.2 Write Agent configuration file a4.conf Define the name of the agent name, source, channel, sink


A4.sources = R1
A4.channels = C1
A4.sinks = K1

Specifically define the source


A4.sources.r1.type = Spooldir #具体实现类是通过反射加载的
A4.sources.r1.spoolDir =/root/logs #监听这个目录

Specifically define Channel


A4.channels.c1.type = memory #
A4.channels.c1.capacity = 10000 #多少条数据进行一次发送
A4.channels.c1.transactionCapacity = #事物的容量

Define interceptors to add timestamps to messages


A4.sources.r1.interceptors = I1
A4.sources.r1.interceptors.i1.type= Org.apache.flume.interceptor.timestampinterceptor$builder

Specific definition sink


A4.sinks.k1.type = HDFs
A4.sinks.k1.hdfs.path = hdfs://ns1/flume/%y%m%d #根据时间动态生成
A4.sinks.k1.hdfs.filePrefix = events-#产生日志的前缀
A4.sinks.k1.hdfs.fileType = DataStream #纯文本方式接收

Do not generate files according to the number of bars


A4.sinks.k1.hdfs.rollCount = 0 #多少条flush成1个文件

Generate a file when the file on HDFs reaches 128M


A4.sinks.k1.hdfs.rollSize = 134217728 #文件达到多大时flush成一个文件

The file on HDFs reaches 60 seconds to generate a file


A4.sinks.k1.hdfs.rollInterval = #flush成一个文件的时间间隔

Assemble source, channel, sink


A4.sources.r1.channels = C1
A4.sinks.k1.channel = C1

3. Start Flume


Switch to the /itcast/apache-flume-1.5.0-bin/ directory first:

Enter the command:

Bin/flume-ng agent-n a4-c conf-f conf a4.conf-dflume.root.logger=info,console

Command explanation:


After startup, you may encounter the following errors, listed here, the wrong children's shoes:


Error 1:


Workaround: description missing jar package, copy /itcast/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar to /itcast/apache-flume-1.5.0-bin/lib/ folder

To use the SCP command:


Error 2:


Workaround: Description Missing jar package, copy /itcast/hadoop-2.6.0/share/hadoop/common/lib/commons-configuration-.jar to /itcast/apache-flume-1.5.0-bin/lib/ folder


Error 3:

Workaround: Copy the/itcast/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar to the Flume/lib directory


Error 4:

FIX: Create logs directory under/root directory: Mkdir/root/logs


Error 5:


Resolution: Inform Flume ns1 of configuration information

1) Copy the Core-site.xml and Hdfs-site.xml to the Conf directory of the Flume

scp /itcast/hadoop-2.6.0/etc/hadoop/{core-site.xml, hdfs-site.xml}  192.168.1.204:/itcast/apache-flume-1.5.0-bin/conf

2) Modify the/etc/hosts file to let the host know the IP address of itcast01 and itcast02

Add mappings for itcast01 and itcast02 IP and host names

3) Copy Hadoop-hdfs-2.6.0.jar


If the following content appears and is displayed in a continuous scrolling, it is no problem, flume start successfully!
It should look like this after a successful start:
http://a1.qpic.cn/psb?/V10Zmdsw4YSoqq/fMmXfJ7K1qrritvBcVWq5Ol2M4fi3PAdfvWstPtcqSk!/b/dHQBAAAAAAAA&bo= 4qiuaqaaaaadaok!&rf=viewer_4

3.1 Write Test

Now if you drop the file into the/root/logs directory, Flume writes the contents of the file to HDFs

Execute the command first:

    bin/flume-ng-n-c-f conf/a4.conf     -Dflume.root.logger=INFO,console

After starting flume, place the log file Access_2013_05_30.log under the Logs folder:

Through the Web page to view HDFs, found a more directory /flume , in this directory file has 20160618 , the document is a time-named

Under /flume/20160618 this folder


question : Why is 3 files generated, and I am not writing 1? And these 3 file sizes add up just equal to the size of the log file Access_2013_05_30.log


cause : Here sink is set to scroll writes every 60 seconds or when the buffer file size reaches 134217728 bytes (128M).

By calculating the time, the write takes a few minutes, it is bound to be the second scrolling option is not sufficient, so the file is read into a portion every 60s, and then is written in HDFs.

Another type of 4.flume configuration


Source-exec
Channel-memory
Sink-logger


The startup mode is the same as before, except that the read-in configuration file is different:

Bin/flume-ng agent-n a2-f/home/hadoop/a2.conf-c conf-dflume.root.logger=info,console

a2.conf configuration file:

Define the name of the agent name, source, channel, sink


A2.sources = R1
A2.channels = C1
A2.sinks = K1

Specifically define the source


A2.sources.r1.type = Exec
A2.sources.r1.command = Tail-f/home/hadoop/a.log

Specifically define Channel


A2.channels.c1.type = Memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

Specific definition sink


A2.sinks.k1.type = Logger

Assemble source, channel, sink


A2.sources.r1.channels = C1
A2.sinks.k1.channel = C1

Monitoring when data is written to this log file, the flume data is collected and printed on the console
A) Just like the tail–f file command

    # echo 111111 >> log    # echo 222222 >> log    # echo 333333 >> log

Print in blocked form, append record to log file

b) Run flume with a2.conf configuration file
Copy the a2.conf file to the flume/conf directory
Create a file log under the/root directory

To run the command:

    bin/flume-ng-n-f /itcast/apache-flume-1.5.0-bin/conf/a2.conf     -c-Dflume.root.logger=INFO,console

Hive Getting Started--4.flume-data collection tool

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.