Flume according to the log time to write HDFS implementation

Source: Internet
Author: User

Flume write HDFs operation in the Hdfseventsink.process method, the path creation is done by Bucketpath

Analyze its source code (ref.: http://caiguangguang.blog.51cto.com/1652935/1619539)

Can be implemented using%{} variable substitution, only need to get the time field in the event (the Nginx log of the local times) incoming Hdfs.path can be

The specific implementation is as follows:

1. In the Kafkasource process method, add:

DT = Kafkasourceutil.getdatemessage (new String (kafkamessage));          hour = Kafkasourceutil.gethourmessage (new String (kafkamessage));          Headers.put ("Eventdate", DT);          Headers.put ("Eventhour", hour); Log.debug ("Source get one Event header info");

Add two headers to log the day and hour, respectively

Methods in 2.KafkaSourceUtil

Because our message body is JSON, we use Java's json-lib package, such as the day that cancels the interest:

  public static string getdatemessage (string message)  {     String dt = null;    JSONObject json =  Jsonobject.fromobject (message);     string[] splitmessage = json.getstring (" Message "). Split (" \ t ");     string logtime = splitmessage[3].trim ();     log.debug ("in getdatemessage logtime is: "  + logTime);     String format =  "[Dd/mmm/yyyy:hh:mm:ss z]";     simpledateformat rawdateformat = null;    date date = null;     simpledateformat dateformat1 = new simpledateformat ("Yyyy-MM-dd  hh:mm:ss. SSS ");     simpledateformat dateformat2 = new simpledateformat (" YyyyMMdd ");    &nbsP;rawdateformat = new simpledateformat (format,locale.english);     try{         date = rawdateformat.parse (logTime);         dt = dateformat2.format (date);         log.debug ("in getdatemessage dt is: "  + dt);     }catch (Exception ex) {        dt =  "Empty";     }    return dt;  }

2.hdfs.path set the head can be

Agent-server4.sinks.hdfs-sink2.type = Hdfsagent-server4.sinks.hdfs-sink2.hdfs.path = hdfs://xxx:8020/data/flume/ Mobile-ubt-all/%{eventdate}/%{eventhour}

Final log:

FLUME-SERVER4.LOG.3:09 APR 2015 15:18:49,966 DEBUG [HDFS-HDFS-SINK1-ROLL-TIMER-0]   (org.apache.flume.sink.hdfs.bucketwriter$2.call:276)   - Rolling file  (hdfs:// XXX:8020/DATA/FLUME/MOBILE-UBT-ALL/20150409/12/192.168.101.52-04-01-.1428563869866.TMP): Roll  scheduled after 60 sec elapsed.flume-server4.log.3:09 apr 2015 15:18:49,969  INFO  [hdfs-hdfs-sink1-roll-timer-0]  (org.apache.flume.sink.hdfs.bucketwriter.close:363)   - closing hdfs://xxx:8020/data/flume/mobile-ubt-all/20150409/12/192.168.101.52-04-01-. 1428563869866.tmpflume-server4.log.3:09 apr 2015 15:18:49,990 info  [ hdfs-hdfs-sink1-call-runner-2]  (org.apache.flume.sink.hdfs.bucketwriter$8.call:629)   -  Renaming hdfs://xxx:8020/data/flume/mobile-ubt-all/20150409/12/192.168.101.52-04-01-.1428563869866.tmp  to hdfs://192.168.101.6:8020/data/flume/mobile-ubt-all/20150409/12/192.168.101.52-04-01-.1428563869866 

This article is from the "Food and Light Blog" blog, please make sure to keep this source http://caiguangguang.blog.51cto.com/1652935/1635772

Flume according to the log time to write HDFS implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.