Here's an example of customizing the header configuration (write different home directories depending on some business)
Configuration:
Source
Interceptors = I1interceptors.i1.type = Regex_extractor Interceptors.i1.regex =/apps/logs/(. *?) /interceptors.i1.serializers = S1interceptors.i1.serializers.s1.name = Logtypename
Sink
Hdfs.path = Hdfs://xxxxxx/%{logtypename}/%y%m%d/%hhdfs.round = Truehdfs.roundvalue = 30hdfs.roundUnit = Minutehdfs.fileprefix = xxxxx1-
The Interceptor of the Regex_extractor type is defined in source, Use the Org.apache.flume.interceptor.RegexExtractorInterceptor class to build the Interceptor object, which extracts the string based on a regular expression, the interceptor. and using serializers to use the string as the header value, this header can get the corresponding value in the sink to do further operation.
For example, in the process method of sink Hdfseventsink, which writes HDFs
Reconstruct the path name by substituting place holders String Realpath = bucketpath.escapestring (FilePath, even T.getheaders (), TimeZone, needrounding, Roundunit, Roundvalue, uselocaltime); String realname = bucketpath.escapestring (FileName, Event.getheaders (), TimeZone, needrounding, Roundunit, round Value, useLocalTime);
Several parameter items:
useLocalTime is the Hdfs.uselocaltimestamp setting, the default is False
FilePath is set to Hdfs.path, cannot be empty
FileName is the Hdfs.fileprefix setting and defaults to Flumedata
The setting of the rounding (approximate value) is related to:
needrounding = context.getboolean ( "Hdfs.round", false ); //hdfs.round settings, default to False if (needrounding) { string unit = context.getstring ( "Hdfs.roundUnit", " Second " ) //hdfs.roundunit, default = second if (Unit.equalsignorecase ( "Hour")) { this.roundunit = calendar.hour_of_day; } else if (Unit.equalsignorecase ("Minute" )) { this.roundunit = calendar.minute; } else if ( Unit.equalsignorecase ("second" )) { this.roundunit = calendar.second; } else { log.warn ("Rounding unit is not valid, please set one of " + "Minute, hour, or second. rounding will be disabled " ) needrounding = false ; } this.roundValue = Context.getinteger ("Hdfs.roundvalue" , 1); // The setting of the Hdfs.roundvalue value, which defaults to 1 if (roundunit == calendar. second | | roundunit == calendar.minute) { // The following is a reasonable preconditions.checkargument to detect if the value of the Roundvalue is set (roundValue > 0 && roundvalue <= 60, "Round value" + "must be > 0 and <= 60 "); } else if (roundUnit == calendar.hour_of_day) { preconditions.checkargument ( roundvalue > 0 && roundvalue <= 24, "Round value" + "must be > 0 and <= 24"); } }
The specific path of the
HDFs is primarily implemented by the Escapestring method of the Org.apache.flume.formatter.output.BucketPath class
Bucketpath class method Analysis:
1. Escapestring is used to replace the settings of%{yyy} with the settings of%x, which need to be set to%x or%{yyy}, yyy can be a word character, and. Or-its call Replaceshorthand
final public static string tag_regex = "\\% (\\w|\\%) |\\%\\{([\\w\\.-]+) \ \}";// Regular Expression final public static pattern tagpattern = pattern.compile (TAG_ REGEX); .... public static string escapestring (String in, Map <String, String> headers, TimeZone timeZone, boolean needrounding, int unit, int rounddown, boolean Uselocaltimestamp) { long ts = clock.currenttimemillis (); // Gets the current timestamp matcher matcher = tagpattern.matcher (in); //Matcher The input string, returning the Matcher object, such as this in can be a hdfs.path setting &NBSP;&NBSP;&NBSP;&NBSP;STRINGBUFFER&NBSP;SB = new stringbuffer (); while (Matcher.find ()) { //used to see if a substring in a string can match a regular expressionType, some words into the loop String replacement = ""; if (Matcher.group (2) != null) { //matching%{...} Setting replacement = headers.get (Matcher.group (2)); //get the corresponding header value if (replacement == null) { replacement = ""; } } else { //setting for Match%x preconditions.checkstate (Matcher.group (1) != null && matcher.group (1). Length () == 1, "expected to match single character tag in string " + in); char c = Matcher.group (1). charAt (0); replacement = Replaceshorthand (c, headers, timezone, needrounding, unit, rounddown, uselocaltimestamp, ts); //calling Replaceshorthand methods on characters } replacement = replacement.replaceall ("\\\\", "\\\\\\ \ \ "); replacement = replacement.replaceall (" \\$ ", " \\\\\\$ ") ; matcher.appendreplacement (sb, replacement); } matcher.appendtail (SB); return sb.tostring (); //return string }
The
2.replaceShorthand method is used to return the corresponding date string based on the value of the timestamp header and the settings of the round and path settings, such as a date in the form of%y generation yyyyy (20150310). The RoundDown method is called
protected static string replaceshorthand (char c, map<string, string> headers, timezone timezone, boolean needrounding, int unit, int rounddown, boolean Uselocaltimestamp, long ts) { String timestampHeader = Null; try { if (!useLocalTimestamp) { Hdfs.uselocaltimestamp when set to False (default) timestampheader = headers.get ("timestamp"); //gets the value of timestamp Preconditions.checknotnull (timestampheader, "expected timestamp in " + "The flume event headers, but it was null "); //detect if the value of Timestamp header is empty ts = long.valueof (Timestampheader); } else { timestampheader = string.valueof (TS); } } ... if ( needrounding) { //If Hdfs.round is set to True (default is False) ts = RoundDown (rounddown, unit, ts); //calls RoundDown down, generates a new ts } // it ' s a date string formatstring = ""; switch (c) { //matches the string to produce a date format, such as%y%m%d the last generated date format is yyyymmdd case '% ': return "%"; case ' a ': &NBSP;&NBSP;&NBSP;&Nbsp; formatstring = "EEE"; break; case ' z ': formatstring = "ZZZ"; break; default:// log.warn ("unrecognized escape in event format string: %" + c); return ""; } Simpledateformat format = new simpledateformat (formatString); //generate SimpleDateFormat objects based on format if (timezone != null) { format.settimezone (TimeZone); } Date date = new date (TS); //Date object generated by TS return Format.format (date); //generates a time string based on a Date object }
3.roundDown for rounding down
private static long rounddown (Int rounddown, int unit, long ts) { long timestamp = ts; if (RoundDown <= 0) { rounddown = 1; } switch (unit) { case Calendar. second: timestamp = timestamprounddownutil. Rounddowntimestampseconds ( ts, RoundDown); //If Hdfs.roundunit is second call timestamprounddownutil.rounddowntimestampseconds method break; .... default: timestamp = ts; break; } return timestamp; }
This article is from the "Food and Light Blog" blog, please make sure to keep this source http://caiguangguang.blog.51cto.com/1652935/1619539
Hdfseventsink Directory setup function for source code analysis