Flume_常見的幾個問題

來源:互聯網
上載者:User

標籤:agent   拷貝   官方   size   info   scp   time   stream   nfs   

在HDFS的檔案預設組建檔案大小1K,如何設定檔案大小和數量

拷貝一份flume-conf.properties.template改名為hive-mem-size.propertieshive-mem-size.properties  a1.sources = s1  a1.channels = c1  a1.sinks = k1  # defined the source  a1.sources.s1.type = exec  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log  a1.sources.s1.shell = /bin/sh -c  # defined the channel  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 1000  # defined the sink  a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.path = /flume/hdfs/  a1.sinks.k1.hdfs.fileType = DataStream   a1.sinks.k1.hdfs.rollInterval = 0 # 依據時間進行roll,設定為0表示不啟用  a1.sinks.k1.hdfs.rollSize = 10240 # 依據大小進行roll,設定為10240表示檔案大小在10k左右  a1.sinks.k1.hdfs.rollCount = 0    # 依據event數目進行roll,設定為0表示不啟用  # The channel can be defined as follows.  a1.sources.s1.channels = c1  a1.sinks.k1.channel = c1flmue目錄下執行  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-size.properties -Dflume.root.logger=INFO,console

 使用Flume是為了將最新的資料或檔案上傳到HDFS上,那如果遇到分區表該如何解決

拷貝一份flume-conf.properties.template改名為hive-mem-part.propertieshive-mem-part.properties  a1.sources = s1  a1.channels = c1  a1.sinks = k1  # defined the source  a1.sources.s1.type = exec  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log  a1.sources.s1.shell = /bin/sh -c  # defined the channel  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 1000  # defined the sink  a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.useLocalTimeStamp = true    # 注意使用時間時,本地時間戳記設定為true  a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/  a1.sinks.k1.hdfs.fileType = DataStream   # The channel can be defined as follows.  a1.sources.s1.channels = c1  a1.sinks.k1.channel = c1flmue目錄下執行  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-part.properties -Dflume.root.logger=INFO,console  這裡與上面的檔案大小有衝突,即設定了時間分區,肯定不能在特定時間內滿足檔案大小

Flume上傳檔案預設是以FlumeData開頭,如何更改開頭資訊

拷貝一份flume-conf.properties.template改名為hive-mem-pre.propertieshive-mem-pre.properties  a1.sources = s1  a1.channels = c1  a1.sinks = k1  # defined the source  a1.sources.s1.type = exec  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log  a1.sources.s1.shell = /bin/sh -c  # defined the channel  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 1000  # defined the sink  a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.useLocalTimeStamp = true    # 注意使用時間時,本地時間戳記設定為true  a1.sinks.k1.hdfs.filePrefix = hive-log  a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/  a1.sinks.k1.hdfs.fileType = DataStream   # The channel can be defined as follows.  a1.sources.s1.channels = c1  a1.sinks.k1.channel = c1flmue目錄下執行  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-pre.properties -Dflume.root.logger=INFO,console

 企業中多台Flume如何解決磁碟IO問題

 

啟動一個hadoop叢集(官方圖示為4台,這裡使用三台),分別部署和配置flume機器  hadoop09-linux-01.ibeifeng.com 10.0.0.108 collenct  hadoop09-linux-02.ibeifeng.com 10.0.0.109 agent   hadoop09-linux-03.ibeifeng.com 10.0.0.110 agent選擇一個agent,進入flume目錄拷貝一份flume-conf.properties.template改名為avro-agent-hive-file-hdfs.propertiesavro-agent-hive-file-hdfs.properties  a1.sources = s1  a1.channels = c1  a1.sinks = k1  # defined the source  a1.sources.s1.type = exec  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log  a1.sources.s1.shell = /bin/sh -c  # defined the channel  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 1000  # defined the sink  a1.sinks.k1.type = avro  a1.sinks.k1.hostname = hadoop09-linux-01.ibeifeng.com # 接收方的IP或hostname  a1.sinks.k1.port = 50505  # The channel can be defined as follows.  a1.sources.s1.channels = c1  a1.sinks.k1.channel = c1  scp發送到另一台agent  scp conf/avro-agent-hive-file-hdfs.properties  hadoop09-linux-03.ibeifeng.com:/opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/conf/  進入collenct機器下的flume下  拷貝一份flume-conf.properties.template改名為avro-collenct-hive-file-hdfs.properties  a1.sources = s1  a1.channels = c1  a1.sinks = k1  # defined the source  a1.sources.s1.type = avro  a1.sources.s1.bind = hadoop09-linux-01.ibeifeng.com  a1.sources.s1.port = 50505  a1.sources.s1.  # defined the channel  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 1000  # defined the sink  a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.filePrefix = avro  a1.sinks.k1.hdfs.useLocalTimeStamp = true  a1.sinks.k1.hdfs.path = /flume/hdfs  a1.sinks.k1.hdfs.fileType = DataStream   a1.sinks.k1.hdfs.rollInterval = 0  a1.sinks.k1.hdfs.rollSize = 20480  a1.sinks.k1.hdfs.rollCount = 0  # The channel can be defined as follows.  a1.sources.s1.channels = c1  a1.sinks.k1.channel = c1啟動rpcbind服務  再分別啟動:    bin/flume-ng agent -c conf/ -n a1 -f conf/avro-collenct-hive-file-hdfs.properties -Dflume.root.logger=INFO,console    bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console    bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console測試

 如何解決不同作業系統下Flume  

搭建nfs伺服器,掛載不同系統中的目錄,直接使用

 

Flume_常見的幾個問題

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.