Real-time data transfer to Hadoop in RDBMS under Kafka

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Now let's dive into the details of this solution and I'll show you how you can import data into Hadoop in just a few steps.

1. Extract data from RDBMS

All relational databases have a log file to record the latest transaction information. The first step in our flow solution is to get these transaction data and enable Hadoop to parse these transaction formats. (about how to parse these transaction logs, the original author did not introduce, may involve business information.) ）

2, start Kafka Producer

The process of sending messages to the Kafka topic becomes a producer. Topic writes the same message to the Kafka. Transactional messages in the RDBMS are converted to the Kafka topic. For our example, we have a database of sales teams where transaction information is published to Kafka topic, and the following steps are necessary to start Kafka Producer:
$ cd/usr/hdp/2.4.0.0-169/kafka
$ bin/kafka-topics.sh--create--zookeeper www.iteblog.com:2181--replication-factor 1--partitions 1--topic Salesdbtransactions
Created topic "Salesdbtransactions".
$ bin/kafka-topics.sh--list--zookeeper www.iteblog.com:2181
Salesdbtransactions

3, set Hive

We create a table in hive to receive transaction information from the sales team database. In this example we will reconstruct a table named Customers:
[Iteblog@sandbox ~]$ beeline-u jdbc:hive2://-n hive-p Hive
0:jdbc:hive2://> use Raj;
CREATE TABLE Customers (ID string, name string, email string, street_address string, company string)
Partitioned by (Time string)
Clustered by (ID) into 5 buckets stored as Orc
Location '/user/iteblog/salescust '
Tblproperties (' transactional ' = ' true ');

In order to enable transactions in hive, we need to configure the following in hive:

Hive.txn.manager = Org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

4, the launch will be a flume agent used to write Kafka data to hive

Below we will create a flume Agent that will send the data in the Kafka topic to the corresponding table in hive. Follow these steps to set up the relevant environment variables before starting the flume agent:

$ pwd
/home/iteblog/streamingdemo
$ mkdir Flume/checkpoint
$ mkdir Flume/data
$ chmod 777-r Flume
$ Export Hive_home=/usr/hdp/current/hive-server2
$ Export Hcat_home=/usr/hdp/current/hive-webhcat

$ pwd
/home/iteblog/streamingdemo/flume
$ mkdir Logs

Then create a log4j properties file:

[Iteblog@sandbox conf]$ VI log4j.properties

Flume.root.logger=info,logfile
Flume.log.dir=/home/iteblog/streamingdemo/flume/logs
Flume.log.file=flume.log

Finally, our flume agent is configured as follows:

$ VI flumetohive.conf
Flumeagent1.sources = Source_from_kafka
Flumeagent1.channels = Mem_channel
Flumeagent1.sinks = Hive_sink
# define/configure Source
Flumeagent1.sources.source_from_kafka.type = Org.apache.flume.source.kafka.KafkaSource
Flumeagent1.sources.source_from_kafka.zookeeperConnect = sandbox.hortonworks.com:2181
Flumeagent1.sources.source_from_kafka.topic = Salesdbtransactions
Flumeagent1.sources.source_from_kafka.groupID = Flume
Flumeagent1.sources.source_from_kafka.channels = Mem_channel
Flumeagent1.sources.source_from_kafka.interceptors = I1
Flumeagent1.sources.source_from_kafka.interceptors.i1.type = Timestamp
flumeagent1.sources.source_from_kafka.consumer.timeout.ms = 1000

# Hive Sink
Flumeagent1.sinks.hive_sink.type = Hive
Flumeagent1.sinks.hive_sink.hive.metastore = thrift://sandbox.hortonworks.com:9083
Flumeagent1.sinks.hive_sink.hive.database = Raj
flumeagent1.sinks.hive_sink.hive.table = Customers
Flumeagent1.sinks.hive_sink.hive.txnsPerBatchAsk = 2
Flumeagent1.sinks.hive_sink.hive.partition =%y-%m-%d-%h-%m
Flumeagent1.sinks.hive_sink.batchSize = 10
Flumeagent1.sinks.hive_sink.serializer = Delimited
Flumeagent1.sinks.hive_sink.serializer.delimiter =,
Flumeagent1.sinks.hive_sink.serializer.fieldnames = Id,name,email,street_address,company
# Use a channel which buffers events in memory
Flumeagent1.channels.mem_channel.type = Memory
Flumeagent1.channels.mem_channel.capacity = 10000
flumeagent1.channels.mem_channel.transactionCapacity = 100
# Bind the source and sink to the channel
Flumeagent1.sources.source_from_kafka.channels = Mem_channel
Flumeagent1.sinks.hive_sink.channel = Mem_channel

5, start flume Agent

Use the following command to start the flume Agent:

$/usr/hdp/apache-flume-1.6.0/bin/flume-ng agent-n flumeagent1-f ~/streamingdemo/flume/conf/flumetohive.conf
6, start Kafka Stream

As an example, the following is a simulation of the transaction information that will be generated by the database in the actual system:
$ cd/usr/hdp/2.4.0.0-169/kafka
$ bin/kafka-console-producer.sh--broker-list sandbox.hortonworks.com:6667--topic SalesDBTransactions
1, "Nero Morris", "porttitor.interdum@Sedcongue.edu", "P.O. Box 871, 5313 quis Ave", "Sodales Company"
2, "Cody Bond", "ante.lectus.convallis@antebibendumullamcorper.ca", "232-513 molestie Road", "Aenean eget Incorporated "
3, "Holmes Cannon", "a@metusAliquam.edu", "P.O. Box 726, 7682 bibendum Rd.", Velit CRAs LLP
4, "Alexander Lewis", "risus@urna.edu", "Ap #375 -9675 lacus Av.", "Ut Aliquam Iaculis Inc."
5, "Gavin Ortiz", "sit.amet@aliquameu.net", "Ap #453 -1440 Urna. St. "," Libero Nec Ltd "
6, "Ralph Fleming", "sociis.natoque.penatibus@quismassaMauris.edu", "363-6976 lacus." St. "," Quisque fringilla PC "
7, "Merrill Norton", "at.sem@elementum.net", "P.O Box 452, 6951 egestas." St. "," Nec metus Institute "
8, "Nathaniel Carrillo", "eget@massa.co.uk", "Ap #438 -604 tellus St.", "Blandit Viverra Corporation"
9, "Warren Valenzuela", "tempus.scelerisque.lorem@ornare.co.uk", "Ap #590 -320 Nulla Av.", "Ligula aliquam erat Incorporated "
"Donovan Hill", "facilisi@augue.org", "979-6729 Donec Road", "Turpis in Condimentum Associates"
One, "Kamal Matthews", "augue.ut@necleoMorbi.org", "Ap #530 -8214 convallis, St.", "Tristique senectus Et Foundation"
7. Receive Hive Data

After completing all of the above steps, now you send the data to Kafka and you will see the data sent to the hive in a few seconds.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Real-time data transfer to Hadoop in RDBMS under Kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Real-time data transfer to Hadoop in RDBMS under Kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support