Collecting logs through Flume-ng

Last Update:2014-12-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently received a log collection of requirements, after testing and modification, the basic implementation of the desired function, recorded.

Let's talk about the requirements of log collection, collect log logs every 1 hours, generate different Lzo compressed files by category, and generate logs to be placed in the first one hours of the directory. Get this demand first think of using flume to log collection, and then filter with interceptor, you can filter by the Regexfilteringinterceptor, but the number of log columns to process too much, written in regular expressions is more troublesome. So I simply write a interceptor good, but also more conducive to future expansion.

First download flume (the old version also through zookeeper), in the latest flume-ng can not rely on zookeeper, I use flume1.5.0.

After downloading, copy the Hadoop-lzo package to $flume_home/lib, or add it to the flume-env.sh script Flume_classpath.

If you encounter oom problems when using memory channel, you also need to increase the flume boot memory by expanding the JVM in flume-env.sh. For example

Java_opts= "-xms1500m-xmx2000m-dcom.sun.management.jmxremote"

Please note: If you want to use flume-env.sh this way, to flume-ng agent--conf conf command referenced in the Conf path to write the whole, online have directly written flume-ng agent--conf conf Such, is not working. To write the full path of the--conf conf. For example, my path is/home/max/flume/conf/, so I will write

Writing Intercepter

Import Java.text.simpledateformat;import java.util.calendar;import Java.util.iterator;import Java.util.List;import Java.util.map;import Org.apache.flume.context;import Org.apache.flume.event;import Org.apache.flume.interceptor.interceptor;public class Platforminterceptor implements interceptor {private final String Header;private string Prehour;private string Finalname;private string Platform;private platforminterceptor ( String header,string prehour,string finalname) {this.header = Header;this.prehour = Prehour;this.finalname = FinalName;}  @Overridepublic void Close () {} @Overridepublic void Initialize () {}private string[] split (String s) {return s.split ("\ t", -1); Be sure to write this-1, otherwise if you end up with \ t and the empty field, Java will split the wrong length} @Overridepublic Event intercept (event event) {String line = new string (Event.getbody ()); map<string, string> headers = Event.getheaders (); string[] Arrline = Split (line), if (arrline.length! =) {return null;} else {headers.put ("platform", arrline[3]);//third column industry Service category Headers.put ("Prehour", Prehour), Headers.put ("Finalname", finalname); return event;}} @Overridepublic list<event> Intercept (list<event> events) {//here to generate the first one hours of string format Calendar cal = Calendar.getinstance (); Cal.add (Calendar.hour_of_day,-1); SimpleDateFormat formatter = new SimpleDateFormat ("Yyyy/mm/dd/hh");p Rehour = Formatter.format (Cal.gettime ()); Formatter = new SimpleDateFormat ("YYYYMMDDHH"); finalname = Formatter.format (Cal.gettime ()); for (iterator<event > iterator = Events.iterator (); Iterator.hasnext ();) {Event next = intercept (Iterator.next ()); if (next = = null) {Iterator.remove ();}} return events;} public static class Builder implements Interceptor.builder {private string header = "platform";p rivate String prehour = "" ;p rivate String finalname = ""; @Overridepublic void Configure (Context context) {Header = context.getstring ("platform", "D Efault_platform ");} @Overridepublic Interceptor Build () {return new platforminterceptor (Header,prehour,finalname);}}

Then write the configuration file that the flume executes on:

#agent1agent1. sources=source1agent1.sinks=sink1agent1.channels=channel1#source1agent1.sources.source1.type= Spooldir#agent1.sources.source1.type=regex_extractoragent1.sources.source1.spooldir=/hadoop/flume/flumelog # Gets the log directory Agent1.sources.source1.channels=channel1#agent1.sources.source1.deletepolicy=immediate #立即删除. compeleted file agent1.sources.source1.fileheader=falseagent1.sources.source1.interceptors=platform# Agent1.sources.source1.interceptors.a1.type=regex_extractor #正则interceptors # Agent1.sources.source1.interceptors.a1.regex= (\\s+) (\\s+) # Here is the content of the regular expression, how many columns are written in the column. (I am such a dumb way) #agent1. Sources.source1.interceptors.a1.serializers=s1 s2# agent1.sources.source1.interceptors.a1.serializers.s1.name=key1# agent1.sources.source1.interceptors.a1.serializers.s2.name=key2# agent1.sources.source1.interceptors.a1.platform=a2#agent1.sources.source1.interceptors.a1.key=a1# agent1.sources.source1.interceptors.a2.platform=a2#agent1.sources.source1.interceptors.a2.key=a2# Agent1.sources.source1.interceptors.a1.value=a1#agent1.sources.source1.interceptors.platforminterceptor2.type=platforminterceptor2$builder# sink1agent1.sinks.sink1.type=hdfsagent1.sinks.sink1.channel=channel1agent1.sinks.sink1.hdfs.path=/user/max/ Flume-input/%{prehour} #生成前一个小时目录agent1. sinks.sink1.hdfs.filetype= compressedstreamagent1.sinks.sink1.hdfs.rollinterval=0#131072000 128M 67108864 64m#393216000 128*3 compression ratio is probably 1:3#157286 4000 128*3*4 agent1.sinks.sink1.hdfs.rollsize=1572864000agent1.sinks.sink1.hdfs.rollcount= 0agent1.sinks.sink1.hdfs.idletimeout=1800 # If there is no event to write, wait until this time to close the channel, that is, the. tmp The file name is modified to the final file name agent1.sinks.sink1.hdfs.codec=lzopcodecagent1.sinks.sink1.hdfs.fileprefix=boo_%{platform}_%{ Finalname}_40.seqagent1.sinks.sink1.hdfs.uselocaltimestamp = True agent1.sinks.sink1.hdfs.threadspoolsize= 30agent1.channels.channel1.type=memoryagent1.channels.channel1.capacity = 1000000agent1.channels.channel1.transactioncapacity = 1000000

Write sh

Flume-ng Agent--conf/hadoop/flume/flume/conf-n agent1-c/home/max/platforminterceptor-0.0. 1-snapshot.jar-f/hadoop/flume/project/new-Dflume.root.logger=info,console

Reference:

Http://flume.apache.org/FlumeUserGuide.html

If there is any mistake, please correct it, I appreciate it.

Collecting logs through Flume-ng

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More