Flume Collaboration Framework

Source: Internet
Author: User
Tags tmp file

1. overview

-"three Functions of flume"
collecting, aggregating, and moving
Collect aggregation Moves

2. Block diagram

  


3. Architectural Features
-"on Streaming Data flows
streaming-based data
Data flow: job-"get Data continuously"
Task Flow: JOB1->JOB2->JOB3&JOB4


-"for Online analytic application.


-"flume is only running in the Linux environment
What if my log server is windows?


-"very Simple
Write a configuration file, run this configuration file
source, channel, Sink


-"real-time architecture"
Flume+kafka Spark/storm Impala


-"agent three Parts
-"source: collect data and send to channel

-"channel: pipe, used to connect source and sink
-"sink: send data to collect data in channel

4.Event

  

5.source/channel/sink

  

Two: Configuration

1. Download Unzip

The download is flume version 1.5.0

  

2. Enable flume-env.sh

  

3. Modify FLUME-ENV.SH

  

4. Increase Hadoop_home

Because there is no configuration in env.sh, the choice is to place the HDFs configuration under the Conf directory.

  

5. Insert the JAR package

  

6. Verification

  

7. Usage

  

Three: the use of flume

  

1. Case 1

Source:hive.log Channel:mem Sink:logger

2. configuration

CP Flume-conf.properties.template Hive-mem-log.properties

3. Configure Hive-mem-log.properties

  

4. Running

There's the log level.

 

5. Attention Points

This is a real-time acquisition, so the information on the console changes as the Hive.log Changes.

  

6. Case Two

Source:hive.log Channel:file Sink:logger

7. Configuration

CP Hive-mem-log.properties Hive-file-log.properties

8. Configure Hive-file-log.properties

New directory for file

  

Configuration

  

9. Running

  

  

10. Results

  

11. Case Three

Source:hive.log Channel:mem Sink:hdfs

12. Configuration

CP Hive-mem-log.properties Hive-mem-hdfs.properties

13. Configure Hive-mem-hdfs.properties

  

14. Running

  

Verify that this directory is not required in the configuration file and will be generated automatically.

Four: Enterprise Thinking A

15. Case Four

Because many small files are generated on hdfs, the size of the file is Set.

16. Configuration

CP Hive-mem-hdfs.properties Hive-mem-size.properties

17. Configure Hive-mem-size.properties

  

18. Running

  

19. Results

  

20. Case Five

Partitioning by Time

21. Configuration

CP Hive-mem-hdfs.properties Hive-mem-part.properties

22. Configure Hive-mem-part.properties

  

23. Running

Bin/flume-ng agent-c conf/-n a1-f conf/hive-mem-part.properties-dflume.root.logger=info,console

24. Running Results

  

25. Case Six

Custom file Start

26. Configure Hive-mem-part.properties

  

27. Operation effect

  

Five: Enterprise Thinking Two

1. Case Seven

Source: used to monitor folders

The file first Exists. tmp

A new. tmp file appears on the second day. The previous day's. tmp immediately becomes log end, when the monitoring folder, immediately found a new file, was uploaded into the HDFs

2. configuration

CP Hive-mem-hdfs.properties Dir-mem-hdfs.properties

3. Regular Expressions Ignore Uploaded. tmp files

  

3. Configure Dir-mem-hdfs.properties

New Folder

  

Configuration

  

4. Observation Results

  

5. Case Two

Source: constant dynamic append of files under the Monitoring folder

But now is not the time to monitor the newly-emerged files,

This configuration will be explained below

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

Six: the actual structure of the enterprise

1.flume Multi-sink

The same data collection to different frameworks

Collect source: a copy of the data

Channel Channel: two pipes used in a case

Target Sink: multiple channel-specific

  

2. Case Studies

Source:hive.log Channel:file Sink:hdfs

3. Configuration

CP Hive-mem-hdfs.properties Sinks.properties

4. Configure Sink.properties

Create a new stored file

  

Configuration

  

5. Effects

  

6.flume of Collect

  

7. Case Studies

Three machines were launched, of which two were agents and one collect.

  192.168.134.241:collect

192.168.134.242:agent
192.168.134.243:agent

  

8. Situation

Because there is no CDH cluster, temporarily do not paste

9. Running

Run: Collect

Bin/flume-ng agent-c conf/-n a1-f conf/avro-collect.properties-dflume.root.logger=info,console

Run: Agent
Bin/flume-ng agent-c conf/-n a1-f conf/avro-agent.properties-dflume.root.logger=info,console

  

  

Seven: monitoring of files in folders in addition

1. Install git

2. Create a new file

3. Enter the directory in Git bash

4. Download the source code in this directory

  

5. Enter the Flume directory

6. See which branches of the source code

  

7. Switch Branches

  

8. Copy the Flume-taildir-source

Nine. Compile

1.pom file

  

2. Add a class from the 1.7.0 in the 1.5.0

Pollablesourceconstants

3. Remove override

  

4. Compiling

Maven build with Run as
goals, Skip TESTF

5. Place the jar package in the Lib directory

6. Use

Because this is the source code of the 1.7.0, it is not in the 1.5 Document.

So: you can see the source code

Or look at 1.7.0 's reference document. about tail introduction case

\flume\flume-ng-doc\sphinx\flumeuserguide

7. Configuration

  

Flume Collaboration Framework

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.