Sync MySQL data to Hadoop using tungsten

Source: Internet
Author: User

    • Background

There are many databases running on the line, and a data warehouse for analyzing user behavior is needed in the background. The MySQL and Hadoop platforms are now popular.

The question now is how to synchronize the online MySQL data in real time to Hadoop for analysis. This article is the use of tungsten-replicator to achieve.

    • Environment

Because Tungsten-replicator relies on Ruby and gem. Need to install

Yum Install Rubyyum install Rubygemsgem install JSON

Where the JSON module may need to be manually downloaded to local for gfw reasons, and then locally installed using the Gem
Yum Install Ruby-devel
Gem Install--local Json-xxx.gem

Install MySQL, address is 192.168.12.223:3306, database configuration good permissions

Install Hadoop 2.4, the address of HDFs is 192.168.12.221:9000

    • Configuration

First on the MySQL machine, into the Tungsten-replicator directory to execute, and start tungsten, you can use Trepctl THL and other commands to view the status of the service

./TOOLS/TPM Install MYSQL1--master=192.168.12.223--INSTALL-DIRECTORY=/USER/APP/TUNGSTEN/MYSQL1-- DATASOURCE-MYSQL-CONF=/USER/DATA/MYSQL_DATA/MY-3306.CNF--replication-user=stats--replication-password=stats_ Dh5--enable-heterogenous-master=true --net-ssh-option=port=20460  --property= Replicator.filter.pkey.addcolumnstodeletes=true --property=replicator.filter.pkey.addpkeytoinserts= true

To a Hadoop machine, go into the Tungsten-replicator directory and start tungsten, and you can use commands such as Trepctl THL to view the status of the service

./TOOLS/TPM Install Hadoop1--batch-enabled=true

On the Hadoop file system, you can see if the corresponding directory is generating a MySQL library. As shown below:

└──user .......     └──tungsten        └──staging            └──hadoop1                └──db1                    ├──x1                    │├──x1-14. csv                    │└──x1-3
   
    .csv                    └──x2                        ├──x2-115
    . csv                        ├──x2-15
    . csv                        ├──x2-16
    . csv                        ├──x2-17
    . csv                        └──x2-18.csv
   

Finally, you need to merge the staging data into hive, build the hive table structure, and let the data be queried by hive, using the Load-reduce-check script inside the Continuent-tools-hadoop tool. Before using, you need to configure the environment variables for hive and start Hiveservice on port 10000. Copy the following jar package to Bristlecone's Lib-ext directory

Cp-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-jdbc-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-exec-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-service-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/httpclient-4.2.5.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/commons-httpclient-3.0.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/httpcore-4.2.5.jar/user/app/tungsten/hadoop1/tungsten/bristlecone/ lib-ext/CP-v/user/app/hadoop/hadoop-2.4.0-onenode/share/hadoop/common/hadoop-common-2.4.0.jar/user/app/tungsten/hadoop1/ tungsten/bristlecone/lib-ext/CP-v/user/app/hadoop/hadoop-2.4.0-onenode/share/hadoop/common/lib/slf4j-*/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/

Then execute the following command:

The first time, or later, the table has been added, or the table structure has changed./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-compare

If the table structure does not change, only need to reload the data, you can execute the following command./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-base-dd L--NO-STAGING-DDL--no-meta

just want to compare data, but seemingly compare very card./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-base-dd L--NO-STAGING-DDL--no-meta--no-materialize

    • Reference

Tungsten-replicator-3.0.pdf in 3.4. Deploying MySQL to Hadoop Replication

Https://github.com/continuent/continuent-tools-hadoop

Sync MySQL data to Hadoop using tungsten

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.