Hdfseventsink Directory setup function for source code analysis

Source: Internet
Author: User

Here's an example of customizing the header configuration (write different home directories depending on some business)
Configuration:
Source

Interceptors = I1interceptors.i1.type = Regex_extractor Interceptors.i1.regex =/apps/logs/(. *?) /interceptors.i1.serializers = S1interceptors.i1.serializers.s1.name = Logtypename

Sink

Hdfs.path = Hdfs://xxxxxx/%{logtypename}/%y%m%d/%hhdfs.round = Truehdfs.roundvalue = 30hdfs.roundUnit = Minutehdfs.fileprefix = xxxxx1-

The Interceptor of the Regex_extractor type is defined in source, Use the Org.apache.flume.interceptor.RegexExtractorInterceptor class to build the Interceptor object, which extracts the string based on a regular expression, the interceptor. and using serializers to use the string as the header value, this header can get the corresponding value in the sink to do further operation.

For example, in the process method of sink Hdfseventsink, which writes HDFs

Reconstruct the path name by substituting place holders String Realpath = bucketpath.escapestring (FilePath, even        T.getheaders (), TimeZone, needrounding, Roundunit, Roundvalue, uselocaltime); String realname = bucketpath.escapestring (FileName, Event.getheaders (), TimeZone, needrounding, Roundunit, round Value, useLocalTime);

Several parameter items:
useLocalTime is the Hdfs.uselocaltimestamp setting, the default is False
FilePath is set to Hdfs.path, cannot be empty
FileName is the Hdfs.fileprefix setting and defaults to Flumedata
The setting of the rounding (approximate value) is related to:

    needrounding = context.getboolean (  "Hdfs.round", false );      //hdfs.round settings, default to False    if (needrounding)  {       string unit = context.getstring (  "Hdfs.roundUnit",  " Second " )        //hdfs.roundunit, default = second       if  (Unit.equalsignorecase (  "Hour"))  {         this.roundunit = calendar.hour_of_day;      } else if   (Unit.equalsignorecase ("Minute"  ))  {         this.roundunit = calendar.minute;      } else if  ( Unit.equalsignorecase ("second"  )) {        this.roundunit =  calendar.second;       } else {        log.warn ("Rounding unit  is not valid, please set one of " +              "Minute, hour, or second. rounding will  be disabled " )         needrounding = false  ;      }      this.roundValue =  Context.getinteger ("Hdfs.roundvalue"  , 1);        // The setting of the Hdfs.roundvalue value, which defaults to 1      if (roundunit == calendar. second  | |  roundunit == calendar.minute) {       // The following is a reasonable         preconditions.checkargument to detect if the value of the Roundvalue is set (roundValue  > 0 && roundvalue <= 60,             "Round value"  +             "must be > 0  and <= 60 ");      } else if  (roundUnit ==  calendar.hour_of_day) {        preconditions.checkargument ( roundvalue > 0 && roundvalue <= 24,              "Round value"  +              "must be > 0 and <= 24");       }    }

The specific path of the

HDFs is primarily implemented by the Escapestring method of the Org.apache.flume.formatter.output.BucketPath class
Bucketpath class method Analysis:
1. Escapestring is used to replace the settings of%{yyy} with the settings of%x, which need to be set to%x or%{yyy}, yyy can be a word character, and. Or-its call Replaceshorthand

final public static string tag_regex =  "\\% (\\w|\\%) |\\%\\{([\\w\\.-]+) \ \}";// Regular Expression   final public static pattern tagpattern = pattern.compile (TAG_ REGEX);   ....  public static string escapestring (String in, Map <String, String> headers,    TimeZone timeZone, boolean  needrounding, int unit, int rounddown,    boolean  Uselocaltimestamp)  {    long ts = clock.currenttimemillis ();  // Gets the current timestamp     matcher matcher = tagpattern.matcher (in);      //Matcher The input string, returning the Matcher object, such as this in can be a hdfs.path setting &NBSP;&NBSP;&NBSP;&NBSP;STRINGBUFFER&NBSP;SB  = new stringbuffer ();    while  (Matcher.find ())  {      //used to see if a substring in a string can match a regular expressionType, some words into the loop       String replacement =  "";       if  (Matcher.group (2)  != null)  { //matching%{...} Setting         replacement = headers.get (Matcher.group (2));  //get the corresponding header value         if  (replacement == null)  {          replacement =  "";         }      } else { //setting for Match%x         preconditions.checkstate (Matcher.group (1)  != null             && matcher.group (1). Length ()  == 1,             "expected  to match single character tag in string  " + in);        char c =  Matcher.group (1). charAt (0);        replacement =  Replaceshorthand (c, headers, timezone,             needrounding, unit, rounddown, uselocaltimestamp, ts);              //calling Replaceshorthand methods on characters        }      replacement = replacement.replaceall ("\\\\",  "\\\\\\ \ \ ");       replacement = replacement.replaceall (" \\$ ", " \\\\\\$ ") ;       matcher.appendreplacement (sb, replacement);     }     matcher.appendtail (SB);     return sb.tostring ();  //return string   }

The

2.replaceShorthand method is used to return the corresponding date string based on the value of the timestamp header and the settings of the round and path settings, such as a date in the form of%y generation yyyyy (20150310). The RoundDown method is called

  protected static string replaceshorthand (char c, map<string,  string> headers,      timezone timezone, boolean  needrounding, int unit, int rounddown,      boolean  Uselocaltimestamp, long ts)  {    String timestampHeader =  Null;    try {      if (!useLocalTimestamp)  {  Hdfs.uselocaltimestamp when set to False (default)         timestampheader =  headers.get ("timestamp");  //gets the value of timestamp          Preconditions.checknotnull (timestampheader,  "expected timestamp in "  +            "The flume event headers, but it  was null ");            //detect if the value of Timestamp header is empty          ts = long.valueof (Timestampheader);      } else  {        timestampheader = string.valueof (TS);       }    }     ...    if ( needrounding) { //If Hdfs.round is set to True (default is False)       ts =  RoundDown (rounddown, unit, ts);  //calls RoundDown down, generates a new ts    }     // it ' s a date    string formatstring =  "";     switch  (c)  { //matches the string to produce a date format, such as%y%m%d  the last generated date format is yyyymmdd     case  '% ':      return  "%";     case  ' a ': &NBSP;&NBSP;&NBSP;&Nbsp;  formatstring =  "EEE";       break;       case  ' z ':      formatstring =  "ZZZ";       break;    default://       log.warn ("unrecognized escape in event format string: %"  + c);       return  "";    }     Simpledateformat format = new simpledateformat (formatString);      //generate SimpleDateFormat objects based on format     if  (timezone != null)  {       format.settimezone (TimeZone);    }     Date date = new date (TS);  //Date object generated by TS     return  Format.format (date);  //generates a time string based on a Date object   }

3.roundDown for rounding down

  private static long rounddown (Int rounddown, int unit, long  ts) {    long timestamp = ts;    if (RoundDown  <= 0) {      rounddown = 1;    }     switch  (unit)  {      case Calendar.  second:        timestamp = timestamprounddownutil.  Rounddowntimestampseconds (            ts,  RoundDown);  //If Hdfs.roundunit is second call timestamprounddownutil.rounddowntimestampseconds method          break; ....      default:         timestamp = ts;         break;    }    return timestamp;  } 

This article is from the "Food and Light Blog" blog, please make sure to keep this source http://caiguangguang.blog.51cto.com/1652935/1619539

Hdfseventsink Directory setup function for source code analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.