Topic Center

Contact Sales

Home > Developer > PHP

Real-time data synchronization between MySQL Databases and HDFS

Last Update:2018-06-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

It is still the central link for batch delivery to ApacheHadoop through MapReduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as ApacheDrill, ClouderaImpala, and StingerInitiative

It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative

It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. The development of technology allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative, which are supported by the next-generation Resource Management Apache YARN.

To support such increasingly demanding real-time operations, we are releasing a new MySQL Applier for Hadoop (MySQL Applier for Hadoop) component. It can copy changed transactions in MySQL to Hadoop/Hive/HDFS. The Applier component complements existing connectivity based on batch processing Apache Sqoop.

The replication of this component (MySQL Applier for Hadoop) is achieved by connecting to the MySQL master service. Once binary logs are committed, binary log transactions are read and written to HDFS.

This component uses the API provided by libhdfs to operate files in HDFS in a C library. This library is pre-compiled by the Hadoop version.

It connects to the MySQL main service to Read Binary logs, and then:

? Extract the row insertion event that occurs on the primary service

? Decode the event, extract the data inserted into each field of the row, and use a satisfactory processing program to obtain the required format data.

? Append it to a text file in HDFS.

Databases are mapped to separate directories. Their tables are mapped to subdirectories and stored in the data warehouse directory. The data of each table is written to the example file (datafile1.txt) in Hive/HDFS ). Data can be separated by commas (,), or other formats, which can be configured using command line parameters.

Original article address: Synchronize real-time data between MySQL database and HDFS. Thank you for sharing it with the original author.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Real-time data synchronization between MySQL Databases and HDFS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support