Flume write HDFs operation in the Hdfseventsink.process method, the path creation is done by Bucketpath
Analyze its source code (ref.: http://caiguangguang.blog.51cto.com/1652935/1619539)
Can be implemented using%{} variable substitution, only need to get the time field in the event (the Nginx log of the local times) incoming Hdfs.path can be
The specific implementation is as follows:
1. In the Kafkasource process method, add:
DT = Kafkasourceutil.getdatemessage (new String (kafkamessage)); hour = Kafkasourceutil.gethourmessage (new String (kafkamessage)); Headers.put ("Eventdate", DT); Headers.put ("Eventhour", hour); Log.debug ("Source get one Event header info");
Add two headers to log the day and hour, respectively
Methods in 2.KafkaSourceUtil
Because our message body is JSON, we use Java's json-lib package, such as the day that cancels the interest:
public static string getdatemessage (string message) { String dt = null; JSONObject json = Jsonobject.fromobject (message); string[] splitmessage = json.getstring (" Message "). Split (" \ t "); string logtime = splitmessage[3].trim (); log.debug ("in getdatemessage logtime is: " + logTime); String format = "[Dd/mmm/yyyy:hh:mm:ss z]"; simpledateformat rawdateformat = null; date date = null; simpledateformat dateformat1 = new simpledateformat ("Yyyy-MM-dd hh:mm:ss. SSS "); simpledateformat dateformat2 = new simpledateformat (" YyyyMMdd "); &nbsP;rawdateformat = new simpledateformat (format,locale.english); try{ date = rawdateformat.parse (logTime); dt = dateformat2.format (date); log.debug ("in getdatemessage dt is: " + dt); }catch (Exception ex) { dt = "Empty"; } return dt; }
2.hdfs.path set the head can be
Agent-server4.sinks.hdfs-sink2.type = Hdfsagent-server4.sinks.hdfs-sink2.hdfs.path = hdfs://xxx:8020/data/flume/ Mobile-ubt-all/%{eventdate}/%{eventhour}
Final log:
FLUME-SERVER4.LOG.3:09 APR 2015 15:18:49,966 DEBUG [HDFS-HDFS-SINK1-ROLL-TIMER-0] (org.apache.flume.sink.hdfs.bucketwriter$2.call:276) - Rolling file (hdfs:// XXX:8020/DATA/FLUME/MOBILE-UBT-ALL/20150409/12/192.168.101.52-04-01-.1428563869866.TMP): Roll scheduled after 60 sec elapsed.flume-server4.log.3:09 apr 2015 15:18:49,969 INFO [hdfs-hdfs-sink1-roll-timer-0] (org.apache.flume.sink.hdfs.bucketwriter.close:363) - closing hdfs://xxx:8020/data/flume/mobile-ubt-all/20150409/12/192.168.101.52-04-01-. 1428563869866.tmpflume-server4.log.3:09 apr 2015 15:18:49,990 info [ hdfs-hdfs-sink1-call-runner-2] (org.apache.flume.sink.hdfs.bucketwriter$8.call:629) - Renaming hdfs://xxx:8020/data/flume/mobile-ubt-all/20150409/12/192.168.101.52-04-01-.1428563869866.tmp to hdfs://192.168.101.6:8020/data/flume/mobile-ubt-all/20150409/12/192.168.101.52-04-01-.1428563869866
This article is from the "Food and Light Blog" blog, please make sure to keep this source http://caiguangguang.blog.51cto.com/1652935/1635772
Flume according to the log time to write HDFS implementation