The MySQL replication operation can copy data from one MySQL Server (master) to one or more other MySQL servers (slave ). Imagine,If the slave server is no longer limited to a MySQL server, but any other database server or platform, and the replication event requires real-time execution, can it be implemented?
The latest MySQL applier for hadoop (hadoop applier) released by the MySQL Team aims to solve this problem.
Purpose
For example, the slave server in the replication event may be a data warehouse system, such as Apache hive, which uses the hadoop Distributed File System (HDFS) as the data storage zone. If you have a hive MetaStore related to HDFS, hadoop applier can populate the hive data table in real time. Data is exported to HDFS as a text file from MySQL, and then populated to hive.
The operation is very simple. You only need to run the hiveql statement 'create table' in hive, define the table structure similar to MySQL, and then run hadoop applier to start real-time data replication.
Advantages
Before hadoop applier, no tools can be used for real-time transmission. The previous solution is through Apache
Sqoop exports data to HDFS. Although data can be transferred in batches, it is necessary to import the results repeatedly to keep the data updated. When a large amount of data is transmitted, other queries become slow. In addition
If you make only one change, sqoop may take a long time to load.
WhileHadoop applier reads binary logs, only applies events on the MySQL server, and inserts data. It does not require batch transmission, making operations faster.So it does not affect the execution speed of other queries.
Implementation
Applier uses an API provided by libhdfs (C library used to operate files in HDFS. Shows the real-time import process:
The database is mapped as a separate directory, and their tables are mapped as subdirectories and a hive Data Warehouse directory. The data inserted into each table is written into a local file (such as datafile1.txt). The data is separated by commas (,) or other symbols (which can be configured through the command line ).
Details: MySQL applier for hadoop
: Mysql-hadoop-applier-0.1.0-alpha.tar.gz (alpha version, not available in production environments)