Real-time data synchronization between MySQL Databases and HDFS

Source: Internet
Author: User
It is still the central link for batch delivery to ApacheHadoop through MapReduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as ApacheDrill, ClouderaImpala, and StingerInitiative

It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative

It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. The development of technology allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative, which are supported by the next-generation Resource Management Apache YARN.

To support such increasingly demanding real-time operations, we are releasing a new MySQL Applier for Hadoop (MySQL Applier for Hadoop) component. It can copy changed transactions in MySQL to Hadoop/Hive/HDFS. The Applier component complements existing connectivity based on batch processing Apache Sqoop.

The replication of this component (MySQL Applier for Hadoop) is achieved by connecting to the MySQL master service. Once binary logs are committed, binary log transactions are read and written to HDFS.

This component uses the API provided by libhdfs to operate files in HDFS in a C library. This library is pre-compiled by the Hadoop version.

It connects to the MySQL main service to Read Binary logs, and then:

? Extract the row insertion event that occurs on the primary service

? Decode the event, extract the data inserted into each field of the row, and use a satisfactory processing program to obtain the required format data.

? Append it to a text file in HDFS.

Databases are mapped to separate directories. Their tables are mapped to subdirectories and stored in the data warehouse directory. The data of each table is written to the example file (datafile1.txt) in Hive/HDFS ). Data can be separated by commas (,), or other formats, which can be configured using command line parameters.

Original article address: Synchronize real-time data between MySQL database and HDFS. Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.