How to import MySQL data into the Sqoop installation of Hadoop

Source: Internet
Author: User
Tags zookeeper sqoop accumulo

Sqoop is an open source tool that is used primarily in Hadoop (Hive) and traditional databases (MySQL, PostgreSQL ...) Data can be transferred from one relational database (such as MySQL, Oracle, Postgres, etc.) to the HDFs in Hadoop, or the data in HDFs can be directed into a relational database. The Sqoop project began in 2009 as a third-party module for Hadoop, and later, to enable users to quickly deploy, and to enable developers to iteratively develop more quickly, Sqoop became an Apache project independently.

In summary, Sqoop is a conversion tool for data conversion between a relational database and HDFs.

  Note: SQOOP1 is completely incompatible with SQOOP2, 1.4.6 and previous versions are SQOOP1, followed by sqoop2

The contrast between SQOOP1 and Sqoop2 sqoop2 than Sqoop1 's improvement
    1. Introduction of Sqoop server, centralized management connector, etc.
    2. Multiple Access modes: Cli,web ui,rest API
    3. Introduction of role-based security mechanisms
Advantages and disadvantages of SQOOP1 and SQOOP2
    1. The SQOOP1 architecture, which uses only one Sqoop client, SQOOP2 architecture, introduces Sqoop server centralized management connector, as well as rest Api,web,ui, and introduces a privilege security mechanism.
    2. SQOOP1 Advantages Architecture Simple Deployment
    3. Sqoop1 the shortcomings of the command line way error-prone, format tightly coupled, unable to support all data types, security mechanism is not perfect, such as password burst,
    4. installation requires root privileges, connector must conform to the JDBC model
    5. SQOOP2 the advantages of a variety of interactive methods, command line, Web Ui,rest api,conncetor Centralized management, all the links installed on the Sqoop server, improve the rights management mechanism, connector normalization, only responsible for data read and write.
    6. The drawbacks of SQOOP2, a slightly more complex architecture, and a more cumbersome configuration deployment.
SQOOP1 Frame composition

SQOOP2 Frame composition

Because the SQOOP2 is not very perfect, the official recommendation of the production environment is not recommended to use, in this sqoop1.4.6 to introduce

Installation Environment:

CENOS7 system
Sqoop version: 1.4.6
hadoop:2.7.3
mysql:5.7.15
jdk:1.8

Download and unzip sqoop1.4.6

Install it on a single node.
Click Sqoop to download the Sqoop installation file sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz. Upload the file to the server's/usr/local folder.
Execute the following command below

# Enter the user directory of the current user 2. cd/usr/local# Unzip the installation file # Modify the file name # Modify the folder owner, If your current login username is not Hadoop, modify it to your own user name (Hadoop creates user groups and usernames for Hadoop, Sqoop will be used in combination with Hadoop)
Modifying a configuration file sqoop-env.sh
# Copy the sqoop-env-template.sh and name it sqoop-env.sh # Edit sqoop-env.sh

Modify the following information of the sqoop-env.sh, if any other can be added

Export hadoop_common_home=/usr/local/hadoop273 export hadoop_mapred_home=/usr/local/hadoop273

Export Hbase_home=/usr/local/hbase
Export Hive_home=/usr/local/hive
Export Zoocfgdir= #如果读者配置了ZooKeeper, you also need to configure the path of zookeeper in this

Configuring Environment variables

Open the environment variable profile for the current user:

VI ~/.bash_profile

Type the following information in the first line of the configuration file:

Export sqoop_home=/usr/local/sqoop146export PATH= $PATH: $SBT _home/bin: $SQOOP _home/binexport CLASSPATH = $CLASSPATH: $SQOOP _home/lib

Save the file and exit the Vim editor.
Then, execute the following command to make the configuration file effective immediately:

SOURCE ~/.bash_profile
Copy the MySQL driver package to $sqoop_home/lib

The following is to copy the MySQL driver and hadoop-connector jar package to the $sqoop_home/lib directory, there is no first download driver package.

CP./mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar/usr/local/sqoop146/libcp $HADOOP _home /share/hadoop/common/hadoop-common-2.7.3.jar $SQOOP _home/lib
Modify $sqoop_home/bin/configure-sqoop

Comment out Hcatalog,accumulo check (unless you are ready to use components on Hadoop such as Hcatalog,accumulo)

##Moved to is a runtime check in Sqoop.#if[!-d "${hcat_home}"]; Then#echo "Warning: $HCAT _home does notexist! Hcatalog jobs would fail. "#Echo ' please set $HCAT _home to the root ofyour hcatalog installation. '#fi #if[!-d "${accumulo_home}"]; Then#echo "Warning: $ACCUMULO _home does notexist! Accumulo imports would fail. "#Echo ' please set $ACCUMULO _home to the Rootof your Accumulo installation. '#fi #Add Hcatalog to dependency list#if[-E "${hcat_home}/bin/hcat"]; Then#Tmp_sqoop_classpath=${sqoop_classpath}: ' ${hcat_home}/bin/hcat-classpath '#if [-Z "${hive_conf_dir}"]; Then#Tmp_sqoop_classpath=${tmp_sqoop_classpath}:${hive_conf_dir}#fi#Sqoop_classpath=${tmp_sqoop_classpath}#fi #Add Accumulo to dependency list#if[-E "$ACCUMULO _home/bin/accumulo"]; Then#For Jn in ' $ACCUMULO _home/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut-d ': '-f2 ';#sqoop_classpath= $SQOOP _classpath: $jn# Done#For Jn in ' $ACCUMULO _home/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut-d ': '-f2 ';#sqoop_classpath= $SQOOP _classpath: $jn# Done#fi
Testing the connection to MySQL

First make sure that the MySQL service is started, and if it does not start, execute the following command to start:

Service MySQL Start

You can then test whether the connection between Sqoop and MySQL was successful:

Sqoop list-databases--connect jdbc:mysql://127.0.0.1:3306/--username Root-password Root

Root for the database username and password, the MySQL database list displayed on the screen indicates a successful connection.

How to import MySQL data into the Sqoop installation of Hadoop

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.