ApacheHive is currently one of the first products for free for large data warehouses. People who use ApacheHive do not expect to write any articles on small data volumes, for example, migrating data from MySQL to HiveHBase, in this case, the SQL statement that can be executed quickly is estimated to be more than 10 times longer than the original time in Hive. However, if you have MySQL Data
Apache Hive is currently one of the first products for free for large data warehouses. People who use Apache Hive do not expect any articles on small data volumes, for example, if the data in MySQL is moved to Hive/HBase, then the SQL statement that can be executed quickly is estimated to be more than 10 times longer than the original time. However, if you have MySQL Data
Apache Hive is currently one of the first products for free for large data warehouses. People who use Apache Hive do not expect any articles on small data volumes, for example, if the data in MySQL is moved to Hive/HBase, then the SQL statement that can be executed quickly is estimated to be more than 10 times longer than the original time. However, if you have MySQL Data, you can import a large amount of data into Hive. If you have hundreds of millions of data records plus complex SQL query conditions, it is a headache for MySQL, at this time, it is relatively easy for Hive, but there is no communication bridge between the two.
Alibaba's great cloud computing company cloudera.com is also a powerful supporter of Hadoop. Sqoop, as its name suggests, SQL-to-Hadoop, abstracts various database types through the ManagerFactory abstract class in Sqoop, data in databases such as Hsqldb, MySQL, Oracle, and PostgreSQL can be written to Hive.
You can export/import all data by using one command, and filter tables and data. The efficiency of development and the simplicity of configuration are characteristic of this tool, the same machine configuration, machine quantity, data volume, and data content, but different environments have different execution efficiency. by migrating RMDBS to Hadoop, the performance has been improved, so it reflects the value of sqoop.
Main Sqoop functions mentioned at a Development Conference
JDBC-based implementation
? Works with your popular database vendors
Auto-generation of tedious user-side code
? Write MapReduce applications to work with your data, faster
Integration with Hive
? Allows you to stay in a SQL-based environment
Extensible backend
? Database-specific code paths for better performance
Detailed operation manual:
Http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html (official)
Related Articles:
Hive entry 3-Integration of Hive and HBase
Apache Hive entry 2
Apache Hive Entry 1
Apache Pig Entry 1-Introduction/basic architecture/comparison with Hive
-End-
Original article address: MySQL migration tool to Hive/HBase, thanks to the original author for sharing.