There are many databases running on the line, and a data warehouse for analyzing user behavior is needed in the background. The MySQL and Hadoop platforms are now popular.
The question now is how to synchronize the online MySQL data in real time to Hadoop for analysis. This article is the use of tungsten-replicator to achieve.
Because Tungsten-replicator relies on Ruby and gem. Need to install
Yum Install Rubyyum install Rubygemsgem install JSON
Where the JSON module may need to be manually downloaded to local for gfw reasons, and then locally installed using the Gem
Yum Install Ruby-devel
Gem Install--local Json-xxx.gem
Install MySQL, address is 192.168.12.223:3306, database configuration good permissions
Install Hadoop 2.4, the address of HDFs is 192.168.12.221:9000
First on the MySQL machine, into the Tungsten-replicator directory to execute, and start tungsten, you can use Trepctl THL and other commands to view the status of the service
./TOOLS/TPM Install MYSQL1--master=192.168.12.223--INSTALL-DIRECTORY=/USER/APP/TUNGSTEN/MYSQL1-- DATASOURCE-MYSQL-CONF=/USER/DATA/MYSQL_DATA/MY-3306.CNF--replication-user=stats--replication-password=stats_ Dh5--enable-heterogenous-master=true --net-ssh-option=port=20460 --property= Replicator.filter.pkey.addcolumnstodeletes=true --property=replicator.filter.pkey.addpkeytoinserts= true
To a Hadoop machine, go into the Tungsten-replicator directory and start tungsten, and you can use commands such as Trepctl THL to view the status of the service
./TOOLS/TPM Install Hadoop1--batch-enabled=true
On the Hadoop file system, you can see if the corresponding directory is generating a MySQL library. As shown below:
└──user ....... └──tungsten └──staging └──hadoop1 └──db1 ├──x1 │├──x1-14. csv │└──x1-3
.csv └──x2 ├──x2-115
. csv ├──x2-15
. csv ├──x2-16
. csv ├──x2-17
. csv └──x2-18.csv
Finally, you need to merge the staging data into hive, build the hive table structure, and let the data be queried by hive, using the Load-reduce-check script inside the Continuent-tools-hadoop tool. Before using, you need to configure the environment variables for hive and start Hiveservice on port 10000. Copy the following jar package to Bristlecone's Lib-ext directory
Cp-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-jdbc-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-exec-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/hive-service-0.13.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/httpclient-4.2.5.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/commons-httpclient-3.0.1.jar/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/CP-v/user/app/hive/apache-hive-0.13.1-bin/lib/httpcore-4.2.5.jar/user/app/tungsten/hadoop1/tungsten/bristlecone/ lib-ext/CP-v/user/app/hadoop/hadoop-2.4.0-onenode/share/hadoop/common/hadoop-common-2.4.0.jar/user/app/tungsten/hadoop1/ tungsten/bristlecone/lib-ext/CP-v/user/app/hadoop/hadoop-2.4.0-onenode/share/hadoop/common/lib/slf4j-*/user/app/tungsten/hadoop1/tungsten/ bristlecone/lib-ext/
Then execute the following command:
The first time, or later, the table has been added, or the table structure has changed./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-compare
If the table structure does not change, only need to reload the data, you can execute the following command./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-base-dd L--NO-STAGING-DDL--no-meta
just want to compare data, but seemingly compare very card./bin/load-reduce-check-v-U Jdbc:mysql:thin://192.168.12.223:3306/-u stats-p stats_dh5--schema db1--service=hadoop1-r/user/app/tungsten/hadoop1--no-base-dd L--NO-STAGING-DDL--no-meta--no-materialize
Tungsten-replicator-3.0.pdf in 3.4. Deploying MySQL to Hadoop Replication
Https://github.com/continuent/continuent-tools-hadoop
Sync MySQL data to Hadoop using tungsten