It is still the central link for batch delivery to ApacheHadoop through MapReduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as ApacheDrill, ClouderaImpala, and StingerInitiative
It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative
It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. The development of technology allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative, which are supported by the next-generation Resource Management Apache YARN.
To support such increasingly demanding real-time operations, we are releasing a new MySQL Applier for Hadoop (MySQL Applier for Hadoop) component. It can copy changed transactions in MySQL to Hadoop/Hive/HDFS. The Applier component complements existing connectivity based on batch processing Apache Sqoop.
The replication of this component (MySQL Applier for Hadoop) is achieved by connecting to the MySQL master service. Once binary logs are committed, binary log transactions are read and written to HDFS.
This component uses the API provided by libhdfs to operate files in HDFS in a C library. This library is pre-compiled by the Hadoop version.
It connects to the MySQL main service to Read Binary logs, and then:
? Extract the row insertion event that occurs on the primary service
? Decode the event, extract the data inserted into each field of the row, and use a satisfactory processing program to obtain the required format data.
? Append it to a text file in HDFS.
Databases are mapped to separate directories. Their tables are mapped to subdirectories and stored in the data warehouse directory. The data of each table is written to the example file (datafile1.txt) in Hive/HDFS ). Data can be separated by commas (,), or other formats, which can be configured using command line parameters.
Original article address: Synchronize real-time data between MySQL database and HDFS. Thank you for sharing it with the original author.