1. overview
-"three Functions of flume"
collecting, aggregating, and moving
Collect aggregation Moves
2. Block diagram
3. Architectural Features
-"on Streaming Data flows
streaming-based data
Data flow: job-"get Data continuously"
Task Flow: JOB1->JOB2->JOB3&JOB4
-"for Online analytic application.
-"flume is only running in the Linux environment
What if my log server is windows?
-"very Simple
Write a configuration file, run this configuration file
source, channel, Sink
-"real-time architecture"
Flume+kafka Spark/storm Impala
-"agent three Parts
-"source: collect data and send to channel
-"channel: pipe, used to connect source and sink
-"sink: send data to collect data in channel
4.Event
5.source/channel/sink
Two: Configuration
1. Download Unzip
The download is flume version 1.5.0
2. Enable flume-env.sh
3. Modify FLUME-ENV.SH
4. Increase Hadoop_home
Because there is no configuration in env.sh, the choice is to place the HDFs configuration under the Conf directory.
5. Insert the JAR package
6. Verification
7. Usage
Three: the use of flume
1. Case 1
Source:hive.log Channel:mem Sink:logger
2. configuration
CP Flume-conf.properties.template Hive-mem-log.properties
3. Configure Hive-mem-log.properties
4. Running
There's the log level.
5. Attention Points
This is a real-time acquisition, so the information on the console changes as the Hive.log Changes.
6. Case Two
Source:hive.log Channel:file Sink:logger
7. Configuration
CP Hive-mem-log.properties Hive-file-log.properties
8. Configure Hive-file-log.properties
New directory for file
Configuration
9. Running
10. Results
11. Case Three
Source:hive.log Channel:mem Sink:hdfs
12. Configuration
CP Hive-mem-log.properties Hive-mem-hdfs.properties
13. Configure Hive-mem-hdfs.properties
14. Running
Verify that this directory is not required in the configuration file and will be generated automatically.
Four: Enterprise Thinking A
15. Case Four
Because many small files are generated on hdfs, the size of the file is Set.
16. Configuration
CP Hive-mem-hdfs.properties Hive-mem-size.properties
17. Configure Hive-mem-size.properties
18. Running
19. Results
20. Case Five
Partitioning by Time
21. Configuration
CP Hive-mem-hdfs.properties Hive-mem-part.properties
22. Configure Hive-mem-part.properties
23. Running
Bin/flume-ng agent-c conf/-n a1-f conf/hive-mem-part.properties-dflume.root.logger=info,console
24. Running Results
25. Case Six
Custom file Start
26. Configure Hive-mem-part.properties
27. Operation effect
Five: Enterprise Thinking Two
1. Case Seven
Source: used to monitor folders
The file first Exists. tmp
A new. tmp file appears on the second day. The previous day's. tmp immediately becomes log end, when the monitoring folder, immediately found a new file, was uploaded into the HDFs
2. configuration
CP Hive-mem-hdfs.properties Dir-mem-hdfs.properties
3. Regular Expressions Ignore Uploaded. tmp files
3. Configure Dir-mem-hdfs.properties
New Folder
Configuration
4. Observation Results
5. Case Two
Source: constant dynamic append of files under the Monitoring folder
But now is not the time to monitor the newly-emerged files,
This configuration will be explained below
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
Six: the actual structure of the enterprise
1.flume Multi-sink
The same data collection to different frameworks
Collect source: a copy of the data
Channel Channel: two pipes used in a case
Target Sink: multiple channel-specific
2. Case Studies
Source:hive.log Channel:file Sink:hdfs
3. Configuration
CP Hive-mem-hdfs.properties Sinks.properties
4. Configure Sink.properties
Create a new stored file
Configuration
5. Effects
6.flume of Collect
7. Case Studies
Three machines were launched, of which two were agents and one collect.
192.168.134.241:collect
192.168.134.242:agent
192.168.134.243:agent
8. Situation
Because there is no CDH cluster, temporarily do not paste
9. Running
Run: Collect
Bin/flume-ng agent-c conf/-n a1-f conf/avro-collect.properties-dflume.root.logger=info,console
Run: Agent
Bin/flume-ng agent-c conf/-n a1-f conf/avro-agent.properties-dflume.root.logger=info,console
Seven: monitoring of files in folders in addition
1. Install git
2. Create a new file
3. Enter the directory in Git bash
4. Download the source code in this directory
5. Enter the Flume directory
6. See which branches of the source code
7. Switch Branches
8. Copy the Flume-taildir-source
Nine. Compile
1.pom file
2. Add a class from the 1.7.0 in the 1.5.0
Pollablesourceconstants
3. Remove override
4. Compiling
Maven build with Run as
goals, Skip TESTF
5. Place the jar package in the Lib directory
6. Use
Because this is the source code of the 1.7.0, it is not in the 1.5 Document.
So: you can see the source code
Or look at 1.7.0 's reference document. about tail introduction case
\flume\flume-ng-doc\sphinx\flumeuserguide
7. Configuration
Flume Collaboration Framework