All the contents of the sink about HBase are under the Org.apache.flume.sink.hbase package.
Each sink includes its own custom, all extends Abstractsink implements configurable.
First, the Configure (context) method. The method is initialized to the parameters of the Hbasesink. Mainly includes the following several:
TableName: The name of the HBase data table to write cannot be empty;
Columnfamily: the column cluster name corresponding to the data table, this sink currently supports only one column cluster and cannot be empty;
BatchSize: The maximum number of event transactions that can be processed per transaction, default is 100;
Eventserializertype: Used to write an event to HBase and turn the event into put. The default is Org.apache.flume.sink.hbase.SimpleHbaseEventSerializer, there is a regexhbaseeventserializer, that is suitable for hbasesink serializer only these two, otherwise their Custom
Serializercontext: Is eventserializertype configuration information, that is, the configuration file contains "serializer." of the item;
Kerberoskeytab and Kerberosprincipal are used for access control, the default is empty, that is, not set.
and to generate Eventserializertype corresponding instances and configure, two serializer each have a different purpose is to write only one column, one can write multiple columns:
1 class<? Extends hbaseeventserializer> clazz =
2 (class<? extends hbaseeventserializer>)
3 Class.forName (eventserializertype);
4 serializer = Clazz.newinstance ();
5 serializer.configure (serializercontext); Configure the serialization component to be configured first. Default is Simplehbaseeventserializer
1, Simplehbaseeventserializer.configure (context context): This serializer can only write data to a column
public void Configure {Rowprefix = context.getstring ("Rowprefix", "Default"); Gets the prefix of the Rowkey, the fixed part, and the default prefix is Incrementrow = context.getstring ("Incrementrow", "Incrow"). GetBytes (charsets. UTF_8)//Get counter corresponding to the row key String suffix = context.getstring ("suffix", "uuid"); Rowkey type (there are four kinds of Uuid/random/timestamp/nano you can specify), the default is uuid String Payloadcolumn = context.getstring ("Payloadcolumn") ; The column name to write to the hbase String inccolumn = context.getstring ("Incrementcolumn"); The counter corresponds to the column if (payloadcolumn!= null &&!payloadcolumn.isempty ()) {//depending on the suffix determines the Rowkey type if (suffix.equals ("timestamp"))
{keyType = keytype.ts;
else if (suffix.equals ("random")) {keyType = Keytype.random;
else if (suffix.equals ("nano")) {keyType = Keytype.tsnano;
else {keyType = Keytype.uuid; } Plcol = Payloadcolumn.getbytes (charsets.utf_8); Column name} if (inccolumn!= null &&!inccolumn.isempty (){///exists counter column Inccol = Inccolumn.getbytes (charsets.utf_8); }
}
2, Regexhbaseeventserializer.configure (context): This serializer can write multiple columns according to the regular
public void Configure {
String regex = context.getstring (Regex_config, Regex_default); Gets the regular expression in the configuration file, by default "(. *)"
regexignorecase = Context.getboolean (Ignore_case_config,
ingore_case_default); Ignore case
Inputpattern = Pattern.compile (regex, Pattern.dotall
+ (regexignorecase?) pattern.case_insensitive:0)); Compiles the given regular expression into a pattern with the given flags
String colnamestr = context.getstring (Col_name_config, Column_name_default); Gets the column name in the configuration file S
string[] columnnames = Colnamestr.split (","); Split column name get column array group for
(String s:columnnames) {
colNames.Add (s.getbytes (charsets.utf_8));
}
Second, the start () method. The method first constructs a Htable object and Table.setautoflush (false) to activate the buffer (the default large hour 2MB), followed by some checks.
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/
Third, then the process () method is used to take the data from the channel, and the serializer is then written to HBase.
Public status process () throws Eventdeliveryexception {
status status = Status.ready;
Channel Channel = Getchannel ();
Transaction Txn = Channel.gettransaction ();
list<row> actions = new linkedlist<row> ();
list<increment> Incs = new linkedlist<increment> ();
Txn.begin ();
for (long i = 0; i < batchsize i++) {
Event event = Channel.take ();
if (event = = null) {
status = Status.backoff;
Countergroup.incrementandget ("Channel.underflow");
break;
else {
serializer.initialize (event, columnfamily);
Actions.addall (Serializer.getactions ());
Incs.addall (Serializer.getincrements ());
}
Puteventsandcommit (Actions, Incs, TXN);
return status;
}
1, the actions and Incs is to write hbase data, actions correspond to the data; Incs corresponds to the counter.
2, Serializer.initialize (event, columnfamily), two serializer initialize the same purpose:
1 public void Initialize (event event, byte[] columnfamily) {
2 this.payload = Event.getbody (); Get the data to process
3 this.cf = columnfamily; Get the cluster of columns to write
4}
3, Serializer.getactions ()
The Simplehbaseeventserializer.getactions () method obtains rowkey based on the type of Rowkey set in the Configure context, which can be a millisecond timestamp, random number, The nanosecond timestamp and the number of UUID128 digits are four types. Then construct a Put object, add (column, column, data) into this putting, and return to list<row> actions.
Regexhbaseeventserializer.getactions () method, first will do some judgment match success? Is the number of matches the same as the specified number of columns? And then is to get Rowkey, where the Rowkey is a string of three parts [time in Millis]-[random key]-[nonce]. The rest is to sequentially match the column composition put, return list<row> actions.
4, Serializer.getincrements ()
Simplehbaseeventserializer.getincrements () If Incrementcolumn is configured in the configuration file, the corresponding counter is added, otherwise a list<increment> without data is returned.
Regexhbaseeventserializer.getincrements () directly returns a list<increment> without data, that is, the counter is not set.
5, Puteventsandcommit (Actions, Incs, Txn) method. First, the Table.batch (actions) are submitted list<put>; then the counter table.increment (i), Txn.commit () commits the transaction, and if there is an exception txn.rollback () rollback ; Txn.close () transaction closed.
Four, Stop () method. Table.close (); table = null;
There are two problems with Satan:
1, we are in the development of the HBase program is always to specify "Hbase.zookeeper.quorum" corresponding to the zookeeper address, but read Hbasesink also did not find the setting of the place, is not in the HBase cluster any node does not need to set, Unless the node is set outside the cluster?
2, also found in the use of zookeeper installed on the node to run Flume error, delete zookeeper after normal operation, did not install Zookeeper node on the operation of normal, this is why??
Want to know can answer ha ... Hbasesink is also relatively simple.