It is still the central link for batch delivery to ApacheHadoop through MapReduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as ApacheDrill, ClouderaImpala, and StingerInitiati
It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, Hadoop (Distributed File System) has experienced significant development as the pressure to obtain competitive advantages from the "super-thinking speed" analysis increases. Technology development allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiati
It is still the central link to deliver batch processing to Apache Hadoop through Map/Reduce ., However, as the pressure to obtain competitive advantages from the analysis of "super-thinking speed" increases, Hadoop (Distributed File System. The development of technology allows real-time queries, such as Apache Drill, Cloudera Impala, and Stinger Initiative, which are supported by the next-generation Resource Management Apache YARN.
To support such increasingly demanding real-time operations, we are releasing a new MySQL Applier for Hadoop (MySQL Applier for Hadoop) component. It can copy changed transactions in MySQL to Hadoop/Hive/HDFS. The Applier component complements existing connectivity based on batch processing Apache Sqoop.
The replication of this component (MySQL Applier for Hadoop) is achieved by connecting to the MySQL master service. Once binary logs are committed, binary log transactions are read and written to HDFS.
This component uses the API provided by libhdfs to operate files in HDFS in a C library. This library is pre-compiled by the Hadoop version.
It connects to the MySQL main service to Read Binary logs, and then:
Extract the row insertion event that occurs on the primary service
Decode the event, extract the data inserted into each field of the row, and use a satisfactory processing program to obtain the required format data.
Append it to a text file in HDFS.
Databases are mapped to separate directories. Their tables are mapped to subdirectories and stored in the data warehouse directory. The data of each table is written to the example file (datafile1.txt) in Hive/HDFS ). Data can be separated by commas (,), or other formats, which can be configured using command line parameters.
Learn more about the group design from this blog.
The installation, configuration, and implementation information have been discussed in detail in this blog. There are also documents integrated with Hive.
You can also understand the role of this video tutorial.
Original article address: Applier, a tool used to synchronize data from the MySQL database to the Hadoop Distributed File System in real time. Thanks to the original author for sharing this article.