標籤:agent 拷貝 官方 size info scp time stream nfs
在HDFS的檔案預設組建檔案大小1K,如何設定檔案大小和數量
拷貝一份flume-conf.properties.template改名為hive-mem-size.propertieshive-mem-size.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = exec a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log a1.sources.s1.shell = /bin/sh -c # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /flume/hdfs/ a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 0 # 依據時間進行roll,設定為0表示不啟用 a1.sinks.k1.hdfs.rollSize = 10240 # 依據大小進行roll,設定為10240表示檔案大小在10k左右 a1.sinks.k1.hdfs.rollCount = 0 # 依據event數目進行roll,設定為0表示不啟用 # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1flmue目錄下執行 bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-size.properties -Dflume.root.logger=INFO,console
使用Flume是為了將最新的資料或檔案上傳到HDFS上,那如果遇到分區表該如何解決
拷貝一份flume-conf.properties.template改名為hive-mem-part.propertieshive-mem-part.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = exec a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log a1.sources.s1.shell = /bin/sh -c # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.useLocalTimeStamp = true # 注意使用時間時,本地時間戳記設定為true a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/ a1.sinks.k1.hdfs.fileType = DataStream # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1flmue目錄下執行 bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-part.properties -Dflume.root.logger=INFO,console 這裡與上面的檔案大小有衝突,即設定了時間分區,肯定不能在特定時間內滿足檔案大小
Flume上傳檔案預設是以FlumeData開頭,如何更改開頭資訊
拷貝一份flume-conf.properties.template改名為hive-mem-pre.propertieshive-mem-pre.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = exec a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log a1.sources.s1.shell = /bin/sh -c # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.useLocalTimeStamp = true # 注意使用時間時,本地時間戳記設定為true a1.sinks.k1.hdfs.filePrefix = hive-log a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/ a1.sinks.k1.hdfs.fileType = DataStream # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1flmue目錄下執行 bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-pre.properties -Dflume.root.logger=INFO,console
企業中多台Flume如何解決磁碟IO問題
啟動一個hadoop叢集(官方圖示為4台,這裡使用三台),分別部署和配置flume機器 hadoop09-linux-01.ibeifeng.com 10.0.0.108 collenct hadoop09-linux-02.ibeifeng.com 10.0.0.109 agent hadoop09-linux-03.ibeifeng.com 10.0.0.110 agent選擇一個agent,進入flume目錄拷貝一份flume-conf.properties.template改名為avro-agent-hive-file-hdfs.propertiesavro-agent-hive-file-hdfs.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = exec a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log a1.sources.s1.shell = /bin/sh -c # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop09-linux-01.ibeifeng.com # 接收方的IP或hostname a1.sinks.k1.port = 50505 # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1 scp發送到另一台agent scp conf/avro-agent-hive-file-hdfs.properties hadoop09-linux-03.ibeifeng.com:/opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/conf/ 進入collenct機器下的flume下 拷貝一份flume-conf.properties.template改名為avro-collenct-hive-file-hdfs.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = avro a1.sources.s1.bind = hadoop09-linux-01.ibeifeng.com a1.sources.s1.port = 50505 a1.sources.s1. # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.filePrefix = avro a1.sinks.k1.hdfs.useLocalTimeStamp = true a1.sinks.k1.hdfs.path = /flume/hdfs a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.rollSize = 20480 a1.sinks.k1.hdfs.rollCount = 0 # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1啟動rpcbind服務 再分別啟動: bin/flume-ng agent -c conf/ -n a1 -f conf/avro-collenct-hive-file-hdfs.properties -Dflume.root.logger=INFO,console bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console測試
如何解決不同作業系統下Flume
搭建nfs伺服器,掛載不同系統中的目錄,直接使用
Flume_常見的幾個問題