Flume Collaboration Framework

Last Update:2017-08-23 Source: Internet

Author: User

Tags tmp file

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. overview

-"three Functions of flume"
collecting, aggregating, and moving
Collect aggregation Moves

2. Block diagram

3. Architectural Features
-"on Streaming Data flows
streaming-based data
Data flow: job-"get Data continuously"
Task Flow: JOB1->JOB2->JOB3&JOB4

-"for Online analytic application.

-"flume is only running in the Linux environment
What if my log server is windows?

-"very Simple
Write a configuration file, run this configuration file
source, channel, Sink

-"real-time architecture"
Flume+kafka Spark/storm Impala

-"agent three Parts
-"source: collect data and send to channel

-"channel: pipe, used to connect source and sink
-"sink: send data to collect data in channel

4.Event

5.source/channel/sink

Two: Configuration

1. Download Unzip

The download is flume version 1.5.0

2. Enable flume-env.sh

3. Modify FLUME-ENV.SH

4. Increase Hadoop_home

Because there is no configuration in env.sh, the choice is to place the HDFs configuration under the Conf directory.

5. Insert the JAR package

6. Verification

7. Usage

Three: the use of flume

1. Case 1

Source:hive.log Channel:mem Sink:logger

2. configuration

CP Flume-conf.properties.template Hive-mem-log.properties

3. Configure Hive-mem-log.properties

4. Running

There's the log level.

5. Attention Points

This is a real-time acquisition, so the information on the console changes as the Hive.log Changes.

6. Case Two

Source:hive.log Channel:file Sink:logger

7. Configuration

CP Hive-mem-log.properties Hive-file-log.properties

8. Configure Hive-file-log.properties

New directory for file

Configuration

9. Running

10. Results

11. Case Three

Source:hive.log Channel:mem Sink:hdfs

12. Configuration

CP Hive-mem-log.properties Hive-mem-hdfs.properties

13. Configure Hive-mem-hdfs.properties

14. Running

Verify that this directory is not required in the configuration file and will be generated automatically.

Four: Enterprise Thinking A

15. Case Four

Because many small files are generated on hdfs, the size of the file is Set.

16. Configuration

CP Hive-mem-hdfs.properties Hive-mem-size.properties

17. Configure Hive-mem-size.properties

18. Running

19. Results

20. Case Five

Partitioning by Time

21. Configuration

CP Hive-mem-hdfs.properties Hive-mem-part.properties

22. Configure Hive-mem-part.properties

23. Running

Bin/flume-ng agent-c conf/-n a1-f conf/hive-mem-part.properties-dflume.root.logger=info,console

24. Running Results

25. Case Six

Custom file Start

26. Configure Hive-mem-part.properties

27. Operation effect

Five: Enterprise Thinking Two

1. Case Seven

Source: used to monitor folders

The file first Exists. tmp

A new. tmp file appears on the second day. The previous day's. tmp immediately becomes log end, when the monitoring folder, immediately found a new file, was uploaded into the HDFs

2. configuration

CP Hive-mem-hdfs.properties Dir-mem-hdfs.properties

3. Regular Expressions Ignore Uploaded. tmp files

3. Configure Dir-mem-hdfs.properties

New Folder

Configuration

4. Observation Results

5. Case Two

Source: constant dynamic append of files under the Monitoring folder

But now is not the time to monitor the newly-emerged files,

This configuration will be explained below

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

Six: the actual structure of the enterprise

1.flume Multi-sink

The same data collection to different frameworks

Collect source: a copy of the data

Channel Channel: two pipes used in a case

Target Sink: multiple channel-specific

2. Case Studies

Source:hive.log Channel:file Sink:hdfs

3. Configuration

CP Hive-mem-hdfs.properties Sinks.properties

4. Configure Sink.properties

Create a new stored file

Configuration

5. Effects

6.flume of Collect

7. Case Studies

Three machines were launched, of which two were agents and one collect.

　　192.168.134.241:collect

192.168.134.242:agent
192.168.134.243:agent

8. Situation

Because there is no CDH cluster, temporarily do not paste

9. Running

Run: Collect

Bin/flume-ng agent-c conf/-n a1-f conf/avro-collect.properties-dflume.root.logger=info,console

Run: Agent
Bin/flume-ng agent-c conf/-n a1-f conf/avro-agent.properties-dflume.root.logger=info,console

Seven: monitoring of files in folders in addition

1. Install git

2. Create a new file

3. Enter the directory in Git bash

4. Download the source code in this directory

5. Enter the Flume directory

6. See which branches of the source code

7. Switch Branches

8. Copy the Flume-taildir-source

Nine. Compile

1.pom file

2. Add a class from the 1.7.0 in the 1.5.0

Pollablesourceconstants

3. Remove override

4. Compiling

Maven build with Run as
goals, Skip TESTF

5. Place the jar package in the Lib directory

6. Use

Because this is the source code of the 1.7.0, it is not in the 1.5 Document.

So: you can see the source code

Or look at 1.7.0 's reference document. about tail introduction case

\flume\flume-ng-doc\sphinx\flumeuserguide

7. Configuration

Flume Collaboration Framework

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More