Collecting logs through Flume-ng

Source: Internet
Author: User

Recently received a log collection of requirements, after testing and modification, the basic implementation of the desired function, recorded.

Let's talk about the requirements of log collection, collect log logs every 1 hours, generate different Lzo compressed files by category, and generate logs to be placed in the first one hours of the directory. Get this demand first think of using flume to log collection, and then filter with interceptor, you can filter by the Regexfilteringinterceptor, but the number of log columns to process too much, written in regular expressions is more troublesome. So I simply write a interceptor good, but also more conducive to future expansion.

First download flume (the old version also through zookeeper), in the latest flume-ng can not rely on zookeeper, I use flume1.5.0.

After downloading, copy the Hadoop-lzo package to $flume_home/lib, or add it to the flume-env.sh script Flume_classpath.

If you encounter oom problems when using memory channel, you also need to increase the flume boot memory by expanding the JVM in flume-env.sh. For example

Java_opts= "-xms1500m-xmx2000m-dcom.sun.management.jmxremote"

Please note: If you want to use flume-env.sh this way, to flume-ng agent--conf conf command referenced in the Conf path to write the whole, online have directly written flume-ng agent--conf conf Such, is not working. To write the full path of the--conf conf. For example, my path is/home/max/flume/conf/, so I will write

Writing Intercepter

Import Java.text.simpledateformat;import java.util.calendar;import Java.util.iterator;import Java.util.List;import Java.util.map;import Org.apache.flume.context;import Org.apache.flume.event;import Org.apache.flume.interceptor.interceptor;public class Platforminterceptor implements interceptor {private final String Header;private string Prehour;private string Finalname;private string Platform;private platforminterceptor ( String header,string prehour,string finalname) {this.header = Header;this.prehour = Prehour;this.finalname = FinalName;}  @Overridepublic void Close () {} @Overridepublic void Initialize () {}private string[] split (String s) {return s.split ("\ t", -1); Be sure to write this-1, otherwise if you end up with \ t and the empty field, Java will split the wrong length} @Overridepublic Event intercept (event event) {String line = new string (Event.getbody ()); map<string, string> headers = Event.getheaders (); string[] Arrline = Split (line), if (arrline.length! =) {return null;} else {headers.put ("platform", arrline[3]);//third column industry Service category Headers.put ("Prehour", Prehour), Headers.put ("Finalname", finalname); return event;}} @Overridepublic list<event> Intercept (list<event> events) {//here to generate the first one hours of string format Calendar cal = Calendar.getinstance (); Cal.add (Calendar.hour_of_day,-1); SimpleDateFormat formatter = new SimpleDateFormat ("Yyyy/mm/dd/hh");p Rehour = Formatter.format (Cal.gettime ()); Formatter = new SimpleDateFormat ("YYYYMMDDHH"); finalname = Formatter.format (Cal.gettime ()); for (iterator<event > iterator = Events.iterator (); Iterator.hasnext ();) {Event next = intercept (Iterator.next ()); if (next = = null) {Iterator.remove ();}} return events;} public static class Builder implements Interceptor.builder {private string header = "platform";p rivate String prehour = "" ;p rivate String finalname = ""; @Overridepublic void Configure (Context context) {Header = context.getstring ("platform", "D Efault_platform ");} @Overridepublic Interceptor Build () {return new platforminterceptor (Header,prehour,finalname);}}

  

Then write the configuration file that the flume executes on:

#agent1agent1. sources=source1agent1.sinks=sink1agent1.channels=channel1#source1agent1.sources.source1.type= Spooldir#agent1.sources.source1.type=regex_extractoragent1.sources.source1.spooldir=/hadoop/flume/flumelog # Gets the log directory Agent1.sources.source1.channels=channel1#agent1.sources.source1.deletepolicy=immediate #立即删除. compeleted file agent1.sources.source1.fileheader=falseagent1.sources.source1.interceptors=platform# Agent1.sources.source1.interceptors.a1.type=regex_extractor #正则interceptors # Agent1.sources.source1.interceptors.a1.regex= (\\s+) (\\s+) # Here is the content of the regular expression, how many columns are written in the column. (I am such a dumb way) #agent1. Sources.source1.interceptors.a1.serializers=s1 s2# agent1.sources.source1.interceptors.a1.serializers.s1.name=key1# agent1.sources.source1.interceptors.a1.serializers.s2.name=key2# agent1.sources.source1.interceptors.a1.platform=a2#agent1.sources.source1.interceptors.a1.key=a1# agent1.sources.source1.interceptors.a2.platform=a2#agent1.sources.source1.interceptors.a2.key=a2# Agent1.sources.source1.interceptors.a1.value=a1#agent1.sources.source1.interceptors.platforminterceptor2.type=platforminterceptor2$builder# sink1agent1.sinks.sink1.type=hdfsagent1.sinks.sink1.channel=channel1agent1.sinks.sink1.hdfs.path=/user/max/ Flume-input/%{prehour} #生成前一个小时目录agent1. sinks.sink1.hdfs.filetype= compressedstreamagent1.sinks.sink1.hdfs.rollinterval=0#131072000 128M 67108864 64m#393216000 128*3 compression ratio is probably 1:3#157286 4000 128*3*4 agent1.sinks.sink1.hdfs.rollsize=1572864000agent1.sinks.sink1.hdfs.rollcount= 0agent1.sinks.sink1.hdfs.idletimeout=1800 # If there is no event to write, wait until this time to close the channel, that is, the. tmp The file name is modified to the final file name agent1.sinks.sink1.hdfs.codec=lzopcodecagent1.sinks.sink1.hdfs.fileprefix=boo_%{platform}_%{ Finalname}_40.seqagent1.sinks.sink1.hdfs.uselocaltimestamp = True agent1.sinks.sink1.hdfs.threadspoolsize= 30agent1.channels.channel1.type=memoryagent1.channels.channel1.capacity = 1000000agent1.channels.channel1.transactioncapacity = 1000000

  

Write sh

Flume-ng Agent--conf/hadoop/flume/flume/conf-n agent1-c/home/max/platforminterceptor-0.0. 1-snapshot.jar-f/hadoop/flume/project/new-Dflume.root.logger=info,console

Reference:

Http://flume.apache.org/FlumeUserGuide.html

If there is any mistake, please correct it, I appreciate it.

Collecting logs through Flume-ng

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.