Flume custom hbasesink class
Reference (to the original author) http://ydt619.blog.51cto.com/316163/1230586
Https://blogs.apache.org/flume/entry/streaming_data_into_apache_hbase
Sample configuration file of flume 1.5
# Name the components on this agenta1.sources = r1a1. sinks = k1a1. channels = c1 # Describe/configure the sourcea1.sources. r1.type = spooldira1.sources. r1.spoolDir =/home/scut/Downloads/testFlume # Describe the sinka1.sinks. k1.type = org. apache. flume. sink. hbase. asyncHBaseSinka1.sinks. k1.table = Router # Set hbase table name a1.sinks. k1.columnFamily = log # Set columnFamilya1.sinks in hbase. k1.serializer. payloadColumn = serviceTime, browerOS, clientTime, screenHeight, screenWidth, url, userAgent, mobileDevice, gwId, mac # Set columna1.sinks of hbase. k1.serializer = org. apache. flume. sink. hbase. baimiAsyncHbaseEventSerializer # sets the processing class of serializer # Use a channel which buffers events in memorya1.channels. c1.type = memorya1.channels. c1.capacity = 1000a1. channels. c1.transactionCapacity = 100 # Bind the source and sink to the channela1.sources. r1.channels = c1a1. sinks. k1.channel = c1
The key attribute a1.sinks. k1.serializer. payloadColumn lists all column names. A1.sinks. k1.serializer sets the flume serializer processing class. In the BaimiAsyncHbaseEventSerializer class, the content of payloadColumn is obtained and separated by commas (,) to obtain all column names. BaimiAsyncHbaseEventSerializer class
/** Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. see the NOTICE file * distributed with this work for additional information * regarding copyright ownership. the ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except T in compliance * with the License. you may obtain a copy of t He License at ** http://www.apache.org/licenses/LICENSE-2.0 ** Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "as is" BASIS, without warranties or conditions of any * KIND, either express or implied. see the License for the * specific language governing permissions and limitations * under the License. */package org. apach E. flume. sink. hbase; import java. util. arrayList; import java. util. list; import org. apache. flume. context; import org. apache. flume. event; import org. apache. flume. flumeException; import org. hbase. async. atomicIncrementRequest; import org. hbase. async. putRequest; import org. apache. flume. conf. componentConfiguration; import org. apache. flume. sink. hbase. simpleHbaseEventSerializer. keyType; import com. google. common. base. Charsets; public class implements miasynchbaseeventserializer implements AsyncHbaseEventSerializer {private byte [] table; private byte [] cf; private byte [] [] payload; private byte [] [] payloadColumn; private final String payloadColumnSplit = "\\^ A"; private byte [] incrementColumn; private String rowSuffix; private String rowSuffixCol; private byte [] incrementRow; private KeyType keyType; @ Override public Void initialize (byte [] table, byte [] cf) {this. table = table; this. cf = cf ;}@ Override public List <PutRequest> getActions () {List <PutRequest> actions = new ArrayList <PutRequest> (); if (payloadColumn! = Null) {byte [] rowKey; try {switch (keyType) {case TS: rowKey = SimpleRowKeyGenerator. getTimestampKey (rowSuffix); break; case TSNANO: rowKey = SimpleRowKeyGenerator. getNanoTimestampKey (rowSuffix); break; case RANDOM: rowKey = SimpleRowKeyGenerator. getRandomKey (rowSuffix); break; default: rowKey = SimpleRowKeyGenerator. getUUIDKey (rowSuffix); break;} // for Loop, submit all columns and put requests for data. For (int I = 0; I <this. payload. length; I ++) {PutRequest putRequest = new PutRequest (table, rowKey, cf, payloadColumn [I], payload [I]); actions. add (putRequest) ;}} catch (Exception e) {throw new FlumeException ("cocould not get row key! ", E) ;}}return actions;} public List <AtomicIncrementRequest> getIncrements () {List <AtomicIncrementRequest> actions = new ArrayList <AtomicIncrementRequest> (); if (incrementColumn! = Null) {AtomicIncrementRequest inc = new AtomicIncrementRequest (table, incrementRow, cf, incrementColumn); actions. add (inc);} return actions;} @ Override public void cleanUp () {// TODO Auto-generated method stub} @ Override public void configure (Context context) {String pCol = context. getString ("payloadColumn", "pCol"); String iCol = context. getString ("incrementColumn", "iCol"); rowSuffixCol = Context. getString ("rowPrefixCol", "mac"); String suffix = context. getString ("suffix", "uuid"); if (pCol! = Null &&! PCol. isEmpty () {if (suffix. equals ("timestamp") {keyType = KeyType. TS;} else if (suffix. equals ("random") {keyType = KeyType. RANDOM;} else if (suffix. equals ("nano") {keyType = KeyType. TSNANO;} else {keyType = KeyType. UUID;} // read the column from the configuration file. String [] pCols = pCol. replace ("",""). split (","); payloadColumn = new byte [pCols. length] []; for (int I = 0; I <pCols. length; I ++) {// convert the column name to lower case payloadColumn [I] = pCols [I]. toLowerCase (). getBytes (Charsets. UTF_8) ;}} if (iCol! = Null &&! ICol. isEmpty () {incrementColumn = iCol. getBytes (Charsets. UTF_8);} incrementRow = context. getString ("incrementRow", "incRow "). getBytes (Charsets. UTF_8) ;}@ Override public void setEvent (Event event) {String strBody = new String (event. getBody (); String [] subBody = strBody. split (this. payloadColumnSplit); if (subBody. length = this. payloadColumn. length) {this. payload = new byte [subBody. length] []; for (int I = 0; I <subBody. length; I ++) {this. payload [I] = subBody [I]. getBytes (Charsets. UTF_8); if (new String (this. payloadColumn [I]). equals (this. rowSuffixCol) {// The rowkey prefix is the value of a column. The default value is the mac address this. rowSuffix = subBody [I] ;}}@ Override public void configure (ComponentConfiguration conf) {// TODO Auto-generated method stub }}
Focus on the setEent, configure, and getActions functions. Configure function: Read the content of the flume configuration file, including the column name and rowkey suffix. setEvent function: Get the content of the flume event and save it to the payload array. GetActions function: Creates a PutRequest instance and writes rowkey, columnfamily, column, and value to the putrequest instance. After compiling and executing the custom BaimiAsyncHbaseEventSerializer function in the source code, compile the source code to generate flume-ng-hbase-sink. *. jar package, replace the original flume-ng-hbase-sink in flume. *. jar package. Download flume 1.5 source code, decompress the package and enter the directory flume-1.5.0-src/flume-ng-sinks/flume-ng-hbase-sinks/src/main/java/org/apache/flume/sink/hbase/copy the above BaimiAsyncHbaseEventSerializer class to the directory above. Go to the flume-1.5.0-src/flume-ng-sinks/flume-ng-hbase-sinks/and run the mvn compilation Command [mvn install-Dmaven. test. skip = true the mvn will generate a flume-1.5.0-src in the flume-ng-hbase-sink-1.5.0.jar/flume-ng-sinks/flume-ng-hbase-sinks/target directory after compilation, replace the jar package with the jar package under $ FLUME_HOME/lib and run the flume Command [flume-ng agent-c. -f conf/spoolDir. conf-n a1-Dflume. root. logger = INFO, console]