Install and use Sqoop
1. What is Sqoop?
Sqoop (SQL to Hadoop) is a convenient tool for data migration between traditional databases and Hadoop. It makes full use of the parallel features of MapReduce to accelerate data transmission in batches, so far, Sqoop1 and Sqoop2 have evolved.
Sqoop is a bridge between relational databases and hadoop under Hadoop. It supports data import between relational databases and hive, hdfs, and hbase. Full table import and incremental import can be used.
So why Sqoop?
Efficient and controllable resource utilization, task concurrency, and timeout. Data Type ing and conversion can be performed automatically. You can also Customize multiple mainstream databases, such as MySQL, Oracle, SQL Server, and DB2.
2. Similarities and Differences between Sqoop1 and Sqoop2
Two different versions, completely incompatible version division differences, Apache version: 1.4.x (Sqoop1); 1.99.x (Sqoop2) CDH version: Sqoop-1.4.3-cdh4 (Sqoop1); Sqoop2-1.99.2-cdh4.5.0 (Sqoop2) compared with Sqoop1, Sqoop2 introduces Sqoop server, centralized management connector, and other access methods: CLI, Web UI, and rest api introduces role-based security mechanisms.
3. Architecture diagram of Sqoop1 and Sqoop2
Sqoop architecture 1
Sqoop architecture 2
Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop
[Hadoop] Detailed description of Sqoop Installation Process
Use Sqoop to export data between MySQL and HDFS Systems
Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving
Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)
Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive
4. Advantages and Disadvantages of Sqoop1 and Sqoop2
Comparison |
Sqoop1 |
Sqoop2 |
Architecture |
Use only one Sqoop Client |
Introduces Sqoop server centralized management ctor, rest api, web, UI, and permission security mechanism. |
Deployment |
The deployment is simple. The installation requires the root permission. connector must comply with the JDBC model. |
The architecture is a little complicated, and configuration and deployment are more complex. |
Use |
The command line method is prone to errors, and the format is tightly coupled. It cannot support all data types, and the security mechanism is incomplete, such as password leakage. |
Multiple interaction methods, command line, web UI, rest API, conncetor centralized management, all links are installed on Sqoop server, permission management mechanism is improved, connector is standardized, and data read and write is only responsible |
5. Install and deploy Sqoop1
5.0 installation environment
Hadoop: hadoop-2.3.0-cdh5.1.2
Sqoop: sqoop-1.4.4-cdh5.1.2
5.1 download and decompress the installation package
Tar-zxvf sqoop-1.4.4-cdh5.1.2.tar.gz
Ln-s sqoop-1.4.4-cdh5.1.2 sqoop
5.2 configure environment variables and configuration files
Cd sqoop/conf/
Cat sqoop-env-template.sh> sqoop-env.sh
Vi sqoop-env.sh
Add the following code to the sqoop-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# Contributor license agreements. See the NOTICE file distributed
# This work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (The "License"); you may not use this file except T in compliance
# The License. You may obtain a copy of the License
#
# Http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# Distributed under the License is distributed on an "as is" BASIS,
# Without warranties or conditions of any kind, either express or implied.
# See the License for the specific language governing permissions and
# Limitations under the License.
# Pinned ded in all the hadoop scripts with source command
# Shoshould not be executable directly
# Also shoshould not be passed any arguments, since we need original $ *
# Set Hadoop-specific environment variables here.
# Set path to where bin/hadoop is available
Export HADOOP_COMMON_HOME =/home/hadoop
# Set path to where hadoop-*-core. jar is available
Export HADOOP_MAPRED_HOME =/home/hadoop
# Set the path to where bin/hbase is available
Export HBASE_HOME =/home/hadoop/hbase
# Set the path to where bin/hive is available
Export HIVE_HOME =/home/hadoop/hive
# Set the path for where zookeper config dir is
Export zoo1_dir =/home/hadoop/zookeeper
In this configuration file, only the configuration of HADOOP_COMMON_HOME is required. In addition, the configuration of hbase and hive is not required if the configuration is not required.
For more details, please continue to read the highlights on the next page: