Install and use Sqoop

Source: Internet
Author: User
Tags hadoop ecosystem sqoop

Install and use Sqoop

1. What is Sqoop?

Sqoop (SQL to Hadoop) is a convenient tool for data migration between traditional databases and Hadoop. It makes full use of the parallel features of MapReduce to accelerate data transmission in batches, so far, Sqoop1 and Sqoop2 have evolved.

Sqoop is a bridge between relational databases and hadoop under Hadoop. It supports data import between relational databases and hive, hdfs, and hbase. Full table import and incremental import can be used.

So why Sqoop?

Efficient and controllable resource utilization, task concurrency, and timeout. Data Type ing and conversion can be performed automatically. You can also Customize multiple mainstream databases, such as MySQL, Oracle, SQL Server, and DB2.

2. Similarities and Differences between Sqoop1 and Sqoop2

Two different versions, completely incompatible version division differences, Apache version: 1.4.x (Sqoop1); 1.99.x (Sqoop2) CDH version: Sqoop-1.4.3-cdh4 (Sqoop1); Sqoop2-1.99.2-cdh4.5.0 (Sqoop2) compared with Sqoop1, Sqoop2 introduces Sqoop server, centralized management connector, and other access methods: CLI, Web UI, and rest api introduces role-based security mechanisms.

3. Architecture diagram of Sqoop1 and Sqoop2

Sqoop architecture 1

Sqoop architecture 2

Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop

[Hadoop] Detailed description of Sqoop Installation Process

Use Sqoop to export data between MySQL and HDFS Systems

Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving

Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)

Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive

4. Advantages and Disadvantages of Sqoop1 and Sqoop2

Comparison

Sqoop1

Sqoop2

Architecture

Use only one Sqoop Client

Introduces Sqoop server centralized management ctor, rest api, web, UI, and permission security mechanism.

Deployment

The deployment is simple. The installation requires the root permission. connector must comply with the JDBC model.

The architecture is a little complicated, and configuration and deployment are more complex.

Use

The command line method is prone to errors, and the format is tightly coupled. It cannot support all data types, and the security mechanism is incomplete, such as password leakage.

Multiple interaction methods, command line, web UI, rest API, conncetor centralized management, all links are installed on Sqoop server, permission management mechanism is improved, connector is standardized, and data read and write is only responsible

5. Install and deploy Sqoop1

5.0 installation environment

Hadoop: hadoop-2.3.0-cdh5.1.2

Sqoop: sqoop-1.4.4-cdh5.1.2

5.1 download and decompress the installation package

Tar-zxvf sqoop-1.4.4-cdh5.1.2.tar.gz

Ln-s sqoop-1.4.4-cdh5.1.2 sqoop

5.2 configure environment variables and configuration files

Cd sqoop/conf/

Cat sqoop-env-template.sh> sqoop-env.sh

Vi sqoop-env.sh

Add the following code to the sqoop-env.sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# Contributor license agreements. See the NOTICE file distributed
# This work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (The "License"); you may not use this file except T in compliance
# The License. You may obtain a copy of the License
#
# Http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# Distributed under the License is distributed on an "as is" BASIS,
# Without warranties or conditions of any kind, either express or implied.
# See the License for the specific language governing permissions and
# Limitations under the License.

# Pinned ded in all the hadoop scripts with source command
# Shoshould not be executable directly
# Also shoshould not be passed any arguments, since we need original $ *

# Set Hadoop-specific environment variables here.

# Set path to where bin/hadoop is available
Export HADOOP_COMMON_HOME =/home/hadoop

# Set path to where hadoop-*-core. jar is available
Export HADOOP_MAPRED_HOME =/home/hadoop

# Set the path to where bin/hbase is available
Export HBASE_HOME =/home/hadoop/hbase

# Set the path to where bin/hive is available
Export HIVE_HOME =/home/hadoop/hive

# Set the path for where zookeper config dir is
Export zoo1_dir =/home/hadoop/zookeeper

In this configuration file, only the configuration of HADOOP_COMMON_HOME is required. In addition, the configuration of hbase and hive is not required if the configuration is not required.

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.