標籤:tor channel config check nbsp log local name 源碼
企業中的日誌存放_1
201611/20161112.log.tmp 第二天檔案變為20161112.log與20161113.log.tmp拷貝一份flume-conf.properties.template改名為dir-mem-hdfs.properties實現監控某一目錄,如有新檔案產生則上傳至hdfs,另外過濾掉新檔案中tmp檔案dir-mem-hdfs.properties a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = spooldir a1.sources.s1.spoolDir = /opt/data/log_hive/20161109 a1.sources.s1.includePattern = ([^ ]*\.log$) # 包含某些欄位 a1.sources.s1.ignorePattern = ([^ ]*\.tmp$) # 忽略某些欄位 # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.useLocalTimeStamp = true a1.sinks.k1.hdfs.path = /flume/spdir a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.rollSize = 20480 a1.sinks.k1.hdfs.rollCount = 0 # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1flmue目錄下執行 bin/flume-ng agent -c conf/ -n a1 -f conf/dir-mem-hdfs.properties -Dflume.root.logger=INFO,console 這裡使用了memory channel,可以使用file channel更加安全
企業中的日誌存放_2
201611/20161112.log 第二天檔案繼續往20161112.log寫這樣,既要使用exec和spoolingdir,如何處理編譯flume1.7版tail dir source,並整合到我們已有的flume環境 1. window上下載安裝git 2. 在某個目錄下加一個空的檔案夾(檔案夾路徑盡量不要有中文),例GitHub 3. 使用github常用命令 $ pwd $ ls $ cd /C/Users/Administrator/Desktop/GitHub $ git clone (https|git)://github.com/apache/flume.git $ cd flume $ git branch -r # 查看有哪些分支 $ git branch -r # 查看當前屬於哪個分支 $ git checkout origin/flume-1.7 #別換分支 拷貝flume\flume-ng-sources\flume-taildir-source 使用eclipse匯入flume-taildir-source項目 修改pom.xml <repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> <modelVersion>4.0.0</modelVersion> <groupId>org.apache.flume.flume-ng-sources</groupId> <artifactId>flume-taildir-source</artifactId> <version>1.5.0-cdh5.3.6</version> <name>Flume Taildir Source</name> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>1.5.0-cdh5.3.6</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.10</version> <scope>test</scope> </dependency> </dependencies> 4. MAVEN_BULID項目,擷取jar包並放到當前flume的環境中(lib目錄) 5. 建立檔案夾和檔案 $ mkdir -p /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/position $ mkdir -p /opt/data/tail/hadoop-dir/ $ echo "" > /opt/data/tail/hadoop.log 拷貝一份flume-conf.properties.template改名為tail-mem-hdfs.properties 可從源碼看出需要的參數 a1.sources = s1 a1.channels = c1 a1.sinks = k1 # defined the source a1.sources.s1.type = org.apache.flume.source.taildir.TaildirSource a1.sources.s1.positionFile = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/position/taildir_position.json a1.sources.s1.filegroups = f1 f2 a1.sources.s1.filegroups.f1 = /opt/data/tail/hadoop.log a1.sources.s1.filegroups.f2 = /opt/data/tail/hadoop-dir/.* a1.sources.s1.headers.f1.headerKey1 = value1 a1.sources.s1.headers.f2.headerKey1 = value2-1 a1.sources.s1.headers.f2.headerKey2 = value2-2 a1.sources.s1.fileHeader = true # defined the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 1000 # defined the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.useLocalTimeStamp = true a1.sinks.k1.hdfs.path = /flume/spdir a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.rollSize = 20480 a1.sinks.k1.hdfs.rollCount = 0 # The channel can be defined as follows. a1.sources.s1.channels = c1 a1.sinks.k1.channel = c1 flmue目錄下執行 bin/flume-ng agent -c conf/ -n a1 -f conf/tail-mem-hdfs.properties -Dflume.root.logger=INFO,console 測試檔案或新資料
企業中常用架構 Flume多sink
同一份資料擷取到不同架構處理採集source: 一份資料管道channel: 多個目標sink: 多個如果多個sink從一個channel取資料將取不完整,而source會針對channel分別發送設計: source--hive.log channel--file sink--hdfs(不同路徑)拷貝一份flume-conf.properties.template改名為hive-file-sinks.propertieshive-file-sinks.properties a1.sources = s1 a1.channels = c1 c2 a1.sinks = k1 k2 # defined the source a1.sources.s1.type = exec a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log a1.sources.s1.shell = /bin/sh -c # defined the channel 1 a1.channels.c1.type = file a1.channels.c1.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp1 a1.channels.c1.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data1 # defined the channel 2 a1.channels.c2.type = file a1.channels.c2.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp2 a1.channels.c2.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data2 # defined the sink 1 a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /flume/hdfs/sink1 a1.sinks.k1.hdfs.fileType = DataStream # defined the sink 2 a1.sinks.k2.type = hdfs a1.sinks.k2.hdfs.path = /flume/hdfs/sink2 a1.sinks.k2.hdfs.fileType = DataStream # The channel can be defined as follows. a1.sources.s1.channels = c1 c2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c2flmue目錄下執行 bin/flume-ng agent -c conf/ -n a1 -f conf/hive-file-sinks.properties -Dflume.root.logger=INFO,consolehive目錄下執行 bin/hive -e "show databases"
Flume_企業中Tlog