Introduction to the Data Migration Tool Sqoop

Source: Internet
Author: User
Tags sqoop

Note: The following information refer to the teacher Dylan

What is a sqoop?

Sqoop is an open source tool, Sqoop SQL to Hadoop, used primarily in Hadoop (Hive) and traditional databases (MySQL, PostgreSQL ...) Data transfer, the development of the main evolution of the two major editions, SQOOP1 and SQOOP2.

Second Why Choose Sqoop?

1, the efficient and controllable use of resources, you can specify the degree of task parallelism, specify the time-out period;
2, data type mapping and conversion, can be automated, users can also be customized;
3, support a variety of mainstream databases, Mysql,oracle,sql SERVER,DB2 and so on.

three Sqoop1 and Sqoop2 differences

1, two different versions, completely incompatible;
2, Version number Division difference apache:1.4.x,1.99.x
cdh:sqoop-1.4.3-cdh4,sqoop2-1.99.2-cdh4.5.0
Improvement of 3,SQOOP2 than SQOOP1
(1) Introduce Sqoop server, centralize management connector and so on, (2) Multiple access methods: Cli,web ui,rest API, (3) Introduce role-based security mechanism.

4,SQOOP2 and SQOOP1 architecture comparison



5 advantages and disadvantages of SQOOP1 and SQOOP2

SQOOP1 Advantages: Simple architecture deployment.
SQOOP2 Advantages: A variety of interactive methods, command line, Web Ui,rest api,conncetor Centralized management, all the links installed on the Sqoop server, improve the rights management mechanism, connector normalization, only responsible for data read and write.

SQOOP1 disadvantage: The command line mode is error-prone, the format is tightly coupled, unable to support all data types, security mechanism is not perfect, such as password burst;
The installation requires root privileges and the connector must conform to the JDBC model.
Sqoop2 Cons: The architecture is slightly more complex and configuration deployment is more cumbersome.


The installation and use of SQOOP2 will be recorded later.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.