Sync MySQL data to Hadoop using tungsten

Last Update:2014-12-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background

There are many databases running on the line, and a data warehouse for analyzing user behavior is needed in the background. The MySQL and Hadoop platforms are now popular.

The question now is how to synchronize the online MySQL data in real time to Hadoop for analysis. This article is the use of tungsten-replicator to achieve.

Environment

Because Tungsten-replicator relies on Ruby and gem. Need to install

Yum Install Rubyyum install Rubygemsgem install JSON

Where the JSON module may need to be manually downloaded to local for gfw reasons, and then locally installed using the Gem

Yum Install Ruby-devel
Gem Install--local Json-xxx.gem

Install MySQL, address is 192.168.12.223:3306, database configuration good permissions

Install Hadoop 2.4, the address of HDFs is 192.168.12.221:9000

Configuration

First on the MySQL machine, into the Tungsten-replicator directory to execute, and start tungsten, you can use Trepctl THL and other commands to view the status of the service

./TOOLS/TPM Install MYSQL1--master=192.168.12.223--INSTALL-DIRECTORY=/USER/APP/TUNGSTEN/MYSQL1-- DATASOURCE-MYSQL-CONF=/USER/DATA/MYSQL_DATA/MY-3306.CNF--replication-user=stats--replication-password=stats_ Dh5--enable-heterogenous-master=true --net-ssh-option=port=20460  --property= Replicator.filter.pkey.addcolumnstodeletes=true --property=replicator.filter.pkey.addpkeytoinserts= true

To a Hadoop machine, go into the Tungsten-replicator directory and start tungsten, and you can use commands such as Trepctl THL to view the status of the service

./TOOLS/TPM Install Hadoop1--batch-enabled=true

On the Hadoop file system, you can see if the corresponding directory is generating a MySQL library. As shown below:

└──user .......     └──tungsten        └──staging            └──hadoop1                └──db1                    ├──x1                    │├──x1-14. csv                    │└──x1-3
   
    .csv                    └──x2                        ├──x2-115
    . csv                        ├──x2-15
    . csv                        ├──x2-16
    . csv                        ├──x2-17
    . csv                        └──x2-18.csv

Finally, you need to merge the staging data into hive, build the hive table structure, and let the data be queried by hive, using the Load-reduce-check script inside the Continuent-tools-hadoop tool. Before using, you need to configure the environment variables for hive and start Hiveservice on port 10000. Copy the following jar package to Bristlecone's Lib-ext directory

Cp-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-jdbc-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-exec-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-service-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/httpclient-4.2.5.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/commons-httpclient-3.0.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/httpcore-4.2.5.jar/user/app/tungsten/hadoop1/tungsten/bristlecone/ lib-ext/CP-v/user/app/hadoop/hadoop-2.4.0-onenode/share/hadoop/common/hadoop-common-2.4.0.jar/user/app/tungsten/hadoop1/ tungsten/bristlecone/lib-ext/CP-v/user/app/hadoop/hadoop-2.4.0-onenode/share/hadoop/common/lib/slf4j-*/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/

Then execute the following command:

The first time, or later, the table has been added, or the table structure has changed./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-compare

If the table structure does not change, only need to reload the data, you can execute the following command./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-base-dd L--NO-STAGING-DDL--no-meta

just want to compare data, but seemingly compare very card./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-base-dd L--NO-STAGING-DDL--no-meta--no-materialize

Reference

Tungsten-replicator-3.0.pdf in 3.4. Deploying MySQL to Hadoop Replication

Https://github.com/continuent/continuent-tools-hadoop

Sync MySQL data to Hadoop using tungsten

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sync MySQL data to Hadoop using tungsten

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sync MySQL data to Hadoop using tungsten

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support