Real-time data transfer to Hadoop in RDBMS under Kafka

Source: Internet
Author: User
Tags email string mkdir zookeeper hortonworks log4j


Now let's dive into the details of this solution and I'll show you how you can import data into Hadoop in just a few steps.

1. Extract data from RDBMS

All relational databases have a log file to record the latest transaction information. The first step in our flow solution is to get these transaction data and enable Hadoop to parse these transaction formats. (about how to parse these transaction logs, the original author did not introduce, may involve business information.) )

2, start Kafka Producer

The process of sending messages to the Kafka topic becomes a producer. Topic writes the same message to the Kafka. Transactional messages in the RDBMS are converted to the Kafka topic. For our example, we have a database of sales teams where transaction information is published to Kafka topic, and the following steps are necessary to start Kafka Producer:
$ cd/usr/hdp/2.4.0.0-169/kafka
$ bin/kafka-topics.sh--create--zookeeper www.iteblog.com:2181--replication-factor 1--partitions 1--topic Salesdbtransactions
Created topic "Salesdbtransactions".
$ bin/kafka-topics.sh--list--zookeeper www.iteblog.com:2181
Salesdbtransactions

3, set Hive

We create a table in hive to receive transaction information from the sales team database. In this example we will reconstruct a table named Customers:
[Iteblog@sandbox ~]$ beeline-u jdbc:hive2://-n hive-p Hive
0:jdbc:hive2://> use Raj;
CREATE TABLE Customers (ID string, name string, email string, street_address string, company string)
Partitioned by (Time string)
Clustered by (ID) into 5 buckets stored as Orc
Location '/user/iteblog/salescust '
Tblproperties (' transactional ' = ' true ');

In order to enable transactions in hive, we need to configure the following in hive:

Hive.txn.manager = Org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

4, the launch will be a flume agent used to write Kafka data to hive

Below we will create a flume Agent that will send the data in the Kafka topic to the corresponding table in hive. Follow these steps to set up the relevant environment variables before starting the flume agent:

$ pwd
/home/iteblog/streamingdemo
$ mkdir Flume/checkpoint
$ mkdir Flume/data
$ chmod 777-r Flume
$ Export Hive_home=/usr/hdp/current/hive-server2
$ Export Hcat_home=/usr/hdp/current/hive-webhcat

$ pwd
/home/iteblog/streamingdemo/flume
$ mkdir Logs

Then create a log4j properties file:

[Iteblog@sandbox conf]$ VI log4j.properties

Flume.root.logger=info,logfile
Flume.log.dir=/home/iteblog/streamingdemo/flume/logs
Flume.log.file=flume.log

Finally, our flume agent is configured as follows:

$ VI flumetohive.conf
Flumeagent1.sources = Source_from_kafka
Flumeagent1.channels = Mem_channel
Flumeagent1.sinks = Hive_sink
# define/configure Source
Flumeagent1.sources.source_from_kafka.type = Org.apache.flume.source.kafka.KafkaSource
Flumeagent1.sources.source_from_kafka.zookeeperConnect = sandbox.hortonworks.com:2181
Flumeagent1.sources.source_from_kafka.topic = Salesdbtransactions
Flumeagent1.sources.source_from_kafka.groupID = Flume
Flumeagent1.sources.source_from_kafka.channels = Mem_channel
Flumeagent1.sources.source_from_kafka.interceptors = I1
Flumeagent1.sources.source_from_kafka.interceptors.i1.type = Timestamp
flumeagent1.sources.source_from_kafka.consumer.timeout.ms = 1000

# Hive Sink
Flumeagent1.sinks.hive_sink.type = Hive
Flumeagent1.sinks.hive_sink.hive.metastore = thrift://sandbox.hortonworks.com:9083
Flumeagent1.sinks.hive_sink.hive.database = Raj
flumeagent1.sinks.hive_sink.hive.table = Customers
Flumeagent1.sinks.hive_sink.hive.txnsPerBatchAsk = 2
Flumeagent1.sinks.hive_sink.hive.partition =%y-%m-%d-%h-%m
Flumeagent1.sinks.hive_sink.batchSize = 10
Flumeagent1.sinks.hive_sink.serializer = Delimited
Flumeagent1.sinks.hive_sink.serializer.delimiter =,
Flumeagent1.sinks.hive_sink.serializer.fieldnames = Id,name,email,street_address,company
# Use a channel which buffers events in memory
Flumeagent1.channels.mem_channel.type = Memory
Flumeagent1.channels.mem_channel.capacity = 10000
flumeagent1.channels.mem_channel.transactionCapacity = 100
# Bind the source and sink to the channel
Flumeagent1.sources.source_from_kafka.channels = Mem_channel
Flumeagent1.sinks.hive_sink.channel = Mem_channel

5, start flume Agent

Use the following command to start the flume Agent:

$/usr/hdp/apache-flume-1.6.0/bin/flume-ng agent-n flumeagent1-f ~/streamingdemo/flume/conf/flumetohive.conf
6, start Kafka Stream

As an example, the following is a simulation of the transaction information that will be generated by the database in the actual system:
$ cd/usr/hdp/2.4.0.0-169/kafka
$ bin/kafka-console-producer.sh--broker-list sandbox.hortonworks.com:6667--topic SalesDBTransactions
1, "Nero Morris", "porttitor.interdum@Sedcongue.edu", "P.O. Box 871, 5313 quis Ave", "Sodales Company"
2, "Cody Bond", "ante.lectus.convallis@antebibendumullamcorper.ca", "232-513 molestie Road", "Aenean eget Incorporated "
3, "Holmes Cannon", "a@metusAliquam.edu", "P.O. Box 726, 7682 bibendum Rd.", Velit CRAs LLP
4, "Alexander Lewis", "risus@urna.edu", "Ap #375 -9675 lacus Av.", "Ut Aliquam Iaculis Inc."
5, "Gavin Ortiz", "sit.amet@aliquameu.net", "Ap #453 -1440 Urna. St. "," Libero Nec Ltd "
6, "Ralph Fleming", "sociis.natoque.penatibus@quismassaMauris.edu", "363-6976 lacus." St. "," Quisque fringilla PC "
7, "Merrill Norton", "at.sem@elementum.net", "P.O Box 452, 6951 egestas." St. "," Nec metus Institute "
8, "Nathaniel Carrillo", "eget@massa.co.uk", "Ap #438 -604 tellus St.", "Blandit Viverra Corporation"
9, "Warren Valenzuela", "tempus.scelerisque.lorem@ornare.co.uk", "Ap #590 -320 Nulla Av.", "Ligula aliquam erat Incorporated "
"Donovan Hill", "facilisi@augue.org", "979-6729 Donec Road", "Turpis in Condimentum Associates"
One, "Kamal Matthews", "augue.ut@necleoMorbi.org", "Ap #530 -8214 convallis, St.", "Tristique senectus Et Foundation"
7. Receive Hive Data

After completing all of the above steps, now you send the data to Kafka and you will see the data sent to the hive in a few seconds.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.