Flume協作架構

最後更新：2017-08-23 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：修改 art uid 結果不同的正則架構 ica over

1.概述　　

　　-》flume的三大功能
　　　　collecting, aggregating, and moving
　　　　　　收集彙總移動

2.框圖

3.架構特點
　　-》on streaming data flows
　　　　基於流式的資料
　　　　資料流：job-》不斷擷取資料
　　　　任務流：job1->job2->job3&job4

　　-》for online analytic application.

　　-》Flume僅僅運行在linux環境下
　　　　如果我的Log Service器是Windows？

　　-》非常簡單
　　　　寫一個設定檔，運行這個設定檔
　　　　source、channel、sink

　　-》即時架構
　　　　flume+kafka spark/storm impala

　　-》agent三大部分
　　　　-》source：採集資料，並發送給channel

　　　　-》channel：管道，用於串連source和sink的
　　　　-》sink：發送資料，用於採集channel中的資料

4.Event

5.Source/Channel/Sink

二：配置

1.下載解壓

　　下載的是Flume版本1.5.0

2.啟用flume-env.sh

3.修改flume-env.sh

4.增加HADOOP_HOME

　　因為在env.sh中沒有配置，選擇的方式是將hdfs的配置放到conf目錄下。

5.放入jar包

6.驗證

7.用法

三：Flume的使用

1.案例1

　　source：hive.log 　　channel:mem　　 sink:logger

2.配置

　　cp flume-conf.properties.template hive-mem-log.properties

3.配置hive-mem-log.properties

4.運行

　　那邊是記錄層級

5.注意點

　　這邊的屬於即時採集，所以在控制台上的資訊隨著hive.log的變化在變化

6.案例二

　　source：hive.log 　　channel:file　　 sink:logger

7.配置

　　cp hive-mem-log.properties hive-file-log.properties

8.配置hive-file-log.properties

　　建立file的目錄

　　配置

9.運行

10.結果

11.案例三

　　source：hive.log 　　channel:mem　　 sink:hdfs

12.配置

　　cp hive-mem-log.properties hive-mem-hdfs.properties

13.配置hive-mem-hdfs.properties

14.運行

　　驗證了，在設定檔中不需要有這個目錄，會自動產生。

四：企業思考一

15.案例四

　　因為在hdfs上會產生許多小檔案，檔案的大小的設定。

16.配置

　　 cp hive-mem-hdfs.properties hive-mem-size.properties

17.配置hive-mem-size.properties

18.運行

19.結果

20.案例五

　　按時間進行分區

21.配置

　　cp hive-mem-hdfs.properties hive-mem-part.properties

22.配置hive-mem-part.properties

23.運行

　　bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-part.properties -Dflume.root.logger=INFO,console

24.運行結果

25.案例六

　　自訂檔案開頭

26.配置hive-mem-part.properties

　27.運行效果

五：企業思考二

1.案例七

　　source：用來監控檔案夾

　　檔案中先存在.tmp

　　到第二日出現新的.tmp檔案。前一天的.tmp馬上變成log結尾，這時監控檔案夾時，馬上發現出現一個新的檔案，就被上傳進HDFS

2.配置

　　cp hive-mem-hdfs.properties dir-mem-hdfs.properties

3.Regex忽略上傳的.tmp檔案

3.配置dir-mem-hdfs.properties

　　建立檔案夾

　　配置

4.觀察結果

5.案例二

　　source：監控檔案夾下檔案的不斷動態追加

　　但是現在不是監控新出現的檔案下，

　　這個配置將在下面講解

　　。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

六：企業實際架構

1.flume多sink

　　同一份資料擷取到不同的架構

　　採集source：一份資料

　　管道channel：案例中使用兩個管道

　　目標sink：多個針對於多個channel

2.案例

　　source：hive.log 　　channel:file　　 sink:hdfs

3.配置

　　cp hive-mem-hdfs.properties sinks.properties

4.配置sink.properties

　　建立儲存的檔案

　　配置

5.效果

6.flume的collect

7.案例

　　啟動三台機器，其中兩台為agent，一台collect。

　　192.168.134.241：collect

　　192.168.134.242：agent
　　192.168.134.243：agent

8.情況

　　因為沒有搭建cdh叢集，暫時不粘貼

9.運行

　　運行：collect

　　　　bin/flume-ng agent -c conf/ -n a1 -f conf/avro-collect.properties -Dflume.root.logger=INFO,console

　　運行：agent
　　　　bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent.properties -Dflume.root.logger=INFO,console

七：關於檔案夾中檔案處於追加的監控

1.安裝git

2.建立一個檔案下

3.在git bash 中進入目錄

4.在此目錄下下載源碼

5.進入flume目錄

6.查看源碼有哪些分支

7.切換分支

8.複製出flume-taildir-source

九。編譯

1.pom檔案

2.在1.5.0中添加一個1.7.0中的類

　　PollableSourceConstants

3.刪除override

4.編譯

　　run as -> maven build
　　goals -> skip testf

5.將jar包放在lib目錄下

6.使用

　　因為這是1.7.0的源碼，所以在1.5的文檔中沒有。

　　所以：可以看源碼

　　　　或者看1.7.0的參考文檔關於Tail的介紹案例

　　　　　　\flume\flume-ng-doc\sphinx\FlumeUserGuide

7.配置

Flume協作架構

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More