Using flume to sink data to HBase

Source: Internet
Author: User

===========> create hbase tables and column families first <================
Case 1: One row of source data corresponding to HBase (hbase-1.12 no problem)
================================================================================
#说明: The case is flume listening directory/home/hadoop/flume_hbase capture to HBase; You must first create the table and column families in HBase

Data Catalog:
Vi/home/hadoop/flume_hbase/word.txt
1001 Pan Nan
2200 Lili NV

Create ' tb_words ', ' cf_wd '

VI flume-hbase.conf
#Name The components of the This agent
A1.sources = R1
A1.sinks = K1
A1.channels = C1
#Describe/configure the source
A1.sources.r1.type = spooldir//When monitoring a folder, you do not need to execute the file, you can listen to the message by simply working under the folder
A1.sources.r1.spooldir=/home/hadoop/flume_hbase

# Describe The sink
A1.sinks.k1.type =asynchbase
a1.sinks.k1.table = tb_words
a1.sinks.k1.columnFamily = cf_wd
#目前自己处理到支持一个列名的, multiple column names failed, Multiple column names Consider a regular expression that uses the following case to match the
A1.sinks.k1.serializer.payloadcolumn=wd
a1.sinks.k1.serializer.incrementcolumn= Last
a1.sinks.k1.serializer.rowprefix=qm
A1.sinks.k1.serializer.suffix=timestamp
A1.sinks.k1.serializer =org.apache.flume.sink.hbase.simpleasynchbaseeventserializer

# Use a channel which buffers events in memory
A1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
A1.sources.r1.channels = C1
A1.sinks.k1.channel = C1


Case 2: Use regular expressions to divide rows by multiple column values
Description: Apache-flume-1.7.0-bin.tar.gz and hbase-1.12+
================================================================================
Create ' tb_words2 ', ' words '

Data Catalog:
Vi/home/hadoop/flume_hbase/data.txt
1001,panzong,nan
2200,lili,nv

Flume configuration file:
VI flume_2_hbase.conf
#Name the components in this agent
A1.sources = R1
A1.sinks = K1
A1.channels = C1

#Describe/configure The source
A1.sources.r1.type = cn.qm.flume.source.MySource//replaceable to Spooldir
A1.sources.r1.spooldir=/home/hadoop/flume_hbase

# Describe The sink
#a1. Sinks.k1.type =org.apache.flume.sink.hbase.hbasesink
A1.sinks.k1.type =hbase
A1.sinks.k1.table = Tb_words2
a1.sinks.k1.columnFamily = words
a1.sinks.k1.serializer.enablewal= true
A1.sinks.k1.serializer = Org.apache.flume.sink.hbase.RegexHbaseEventSerializer
#查看RegexHbaseEventSerializer类源码, you can quickly understand the Rowkeyindex/colnames property
a1.sinks.k1.serializer.regex= ^ ([0-9]+), ([a-z]+), ([a-z]+) $
# Specify a column to be the primary key, not a randomly generated key, #第一列为Hbase的rowkey
#RegexHbaseEventSerializer Source View
A1.sinks.k1.serializer.rowKeyIndex =0
#ROW_KEY为系统指定列名
a1.sinks.k1.serializer.colnames= Row_key,name,sex
A1.sinks.k1.zookeeperQuorum =hdp-qm-05:2181,hdp-qm-06:2181,hdp-qm-07:2181

# Use a channel which buffers events in memory
A1.channels.c1.type = Memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
A1.sources.r1.channels = C1
A1.sinks.k1.channel = C1


#第二列为Hbase的rowkey
#a1. Sinks.k1.serializer.rowKeyIndex = 1
#a1. sinks.k1.serializer.regex= ^ ([0-9]+), ([a-z]+), ([a-z]+) $
#a1. sinks.k1.serializer.colnames= Id,row_key,sex

Using flume to sink data to HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.