Flume_企業中Tlog

來源:互聯網
上載者:User

標籤:tor   channel   config   check   nbsp   log   local   name   源碼   

企業中的日誌存放_1

201611/20161112.log.tmp  第二天檔案變為20161112.log與20161113.log.tmp拷貝一份flume-conf.properties.template改名為dir-mem-hdfs.properties實現監控某一目錄,如有新檔案產生則上傳至hdfs,另外過濾掉新檔案中tmp檔案dir-mem-hdfs.properties  a1.sources = s1  a1.channels = c1  a1.sinks = k1  # defined the source  a1.sources.s1.type = spooldir  a1.sources.s1.spoolDir = /opt/data/log_hive/20161109  a1.sources.s1.includePattern = ([^ ]*\.log$) # 包含某些欄位  a1.sources.s1.ignorePattern = ([^ ]*\.tmp$)  # 忽略某些欄位  # defined the channel  a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 1000  # defined the sink  a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.useLocalTimeStamp = true  a1.sinks.k1.hdfs.path = /flume/spdir  a1.sinks.k1.hdfs.fileType = DataStream   a1.sinks.k1.hdfs.rollInterval = 0  a1.sinks.k1.hdfs.rollSize = 20480  a1.sinks.k1.hdfs.rollCount = 0  # The channel can be defined as follows.  a1.sources.s1.channels = c1  a1.sinks.k1.channel = c1flmue目錄下執行  bin/flume-ng agent -c conf/ -n a1 -f conf/dir-mem-hdfs.properties -Dflume.root.logger=INFO,console  這裡使用了memory channel,可以使用file channel更加安全

 企業中的日誌存放_2

201611/20161112.log  第二天檔案繼續往20161112.log寫這樣,既要使用exec和spoolingdir,如何處理編譯flume1.7版tail dir source,並整合到我們已有的flume環境  1. window上下載安裝git   2. 在某個目錄下加一個空的檔案夾(檔案夾路徑盡量不要有中文),例GitHub  3. 使用github常用命令    $ pwd    $ ls    $ cd /C/Users/Administrator/Desktop/GitHub    $ git clone (https|git)://github.com/apache/flume.git    $ cd flume    $ git branch -r # 查看有哪些分支    $ git branch -r # 查看當前屬於哪個分支    $ git checkout origin/flume-1.7 #別換分支  拷貝flume\flume-ng-sources\flume-taildir-source  使用eclipse匯入flume-taildir-source項目  修改pom.xml  <repositories>    <repository>      <id>cloudera</id>      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>    </repository>  </repositories>  <modelVersion>4.0.0</modelVersion>  <groupId>org.apache.flume.flume-ng-sources</groupId>  <artifactId>flume-taildir-source</artifactId>  <version>1.5.0-cdh5.3.6</version>  <name>Flume Taildir Source</name>  <build>    <plugins>      <plugin>        <groupId>org.apache.maven.plugins</groupId>        <artifactId>maven-compiler-plugin</artifactId>        <version>2.3.2</version>        <configuration>          <source>1.7</source>          <target>1.7</target>        </configuration>      </plugin>    </plugins>  </build>  <dependencies>    <dependency>      <groupId>org.apache.flume</groupId>      <artifactId>flume-ng-core</artifactId>      <version>1.5.0-cdh5.3.6</version>    </dependency>    <dependency>      <groupId>junit</groupId>      <artifactId>junit</artifactId>      <version>4.10</version>      <scope>test</scope>    </dependency>  </dependencies>  4. MAVEN_BULID項目,擷取jar包並放到當前flume的環境中(lib目錄)   5. 建立檔案夾和檔案    $ mkdir -p /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/position    $ mkdir -p /opt/data/tail/hadoop-dir/    $ echo "" > /opt/data/tail/hadoop.log    拷貝一份flume-conf.properties.template改名為tail-mem-hdfs.properties    可從源碼看出需要的參數      a1.sources = s1      a1.channels = c1      a1.sinks = k1      # defined the source      a1.sources.s1.type = org.apache.flume.source.taildir.TaildirSource      a1.sources.s1.positionFile = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/position/taildir_position.json      a1.sources.s1.filegroups = f1 f2      a1.sources.s1.filegroups.f1 = /opt/data/tail/hadoop.log      a1.sources.s1.filegroups.f2 = /opt/data/tail/hadoop-dir/.*      a1.sources.s1.headers.f1.headerKey1 = value1      a1.sources.s1.headers.f2.headerKey1 = value2-1      a1.sources.s1.headers.f2.headerKey2 = value2-2      a1.sources.s1.fileHeader = true      # defined the channel      a1.channels.c1.type = memory      a1.channels.c1.capacity = 1000      a1.channels.c1.transactionCapacity = 1000      # defined the sink      a1.sinks.k1.type = hdfs      a1.sinks.k1.hdfs.useLocalTimeStamp = true      a1.sinks.k1.hdfs.path = /flume/spdir      a1.sinks.k1.hdfs.fileType = DataStream       a1.sinks.k1.hdfs.rollInterval = 0      a1.sinks.k1.hdfs.rollSize = 20480      a1.sinks.k1.hdfs.rollCount = 0      # The channel can be defined as follows.      a1.sources.s1.channels = c1      a1.sinks.k1.channel = c1  flmue目錄下執行    bin/flume-ng agent -c conf/ -n a1 -f conf/tail-mem-hdfs.properties -Dflume.root.logger=INFO,console    測試檔案或新資料

 企業中常用架構 Flume多sink

同一份資料擷取到不同架構處理採集source: 一份資料管道channel: 多個目標sink: 多個如果多個sink從一個channel取資料將取不完整,而source會針對channel分別發送設計: source--hive.log channel--file sink--hdfs(不同路徑)拷貝一份flume-conf.properties.template改名為hive-file-sinks.propertieshive-file-sinks.properties  a1.sources = s1  a1.channels = c1 c2  a1.sinks = k1 k2  # defined the source  a1.sources.s1.type = exec  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log  a1.sources.s1.shell = /bin/sh -c  # defined the channel 1  a1.channels.c1.type = file  a1.channels.c1.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp1  a1.channels.c1.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data1  # defined the channel 2  a1.channels.c2.type = file  a1.channels.c2.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp2  a1.channels.c2.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data2  # defined the sink 1  a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.path = /flume/hdfs/sink1  a1.sinks.k1.hdfs.fileType = DataStream   # defined the sink 2  a1.sinks.k2.type = hdfs  a1.sinks.k2.hdfs.path = /flume/hdfs/sink2  a1.sinks.k2.hdfs.fileType = DataStream   # The channel can be defined as follows.  a1.sources.s1.channels = c1 c2  a1.sinks.k1.channel = c1  a1.sinks.k2.channel = c2flmue目錄下執行  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-file-sinks.properties -Dflume.root.logger=INFO,consolehive目錄下執行  bin/hive -e "show databases"

Flume_企業中Tlog

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.