Note: The following information refer to the teacher Dylan
What is a sqoop?
Sqoop is an open source tool, Sqoop SQL to Hadoop, used primarily in Hadoop (Hive) and traditional databases (MySQL, PostgreSQL ...) Data transfer, the development of the main evolution of the two major editions, SQOOP1 and SQOOP2.
Second Why Choose Sqoop?
1, the efficient and controllable use of resources, you can specify the degree of task parallelism, specify the time-out period;
2, data type mapping and conversion, can be automated, users can also be customized;
3, support a variety of mainstream databases, Mysql,oracle,sql SERVER,DB2 and so on.
three Sqoop1 and Sqoop2 differences
1, two different versions, completely incompatible;
2, Version number Division difference apache:1.4.x,1.99.x
cdh:sqoop-1.4.3-cdh4,sqoop2-1.99.2-cdh4.5.0
Improvement of 3,SQOOP2 than SQOOP1
(1) Introduce Sqoop server, centralize management connector and so on, (2) Multiple access methods: Cli,web ui,rest API, (3) Introduce role-based security mechanism.
4,SQOOP2 and SQOOP1 architecture comparison
5 advantages and disadvantages of SQOOP1 and SQOOP2
SQOOP1 Advantages: Simple architecture deployment.
SQOOP2 Advantages: A variety of interactive methods, command line, Web Ui,rest api,conncetor Centralized management, all the links installed on the Sqoop server, improve the rights management mechanism, connector normalization, only responsible for data read and write.
SQOOP1 disadvantage: The command line mode is error-prone, the format is tightly coupled, unable to support all data types, security mechanism is not perfect, such as password burst;
The installation requires root privileges and the connector must conform to the JDBC model.
Sqoop2 Cons: The architecture is slightly more complex and configuration deployment is more cumbersome.
The installation and use of SQOOP2 will be recorded later.