Use Sqoop2 to import and export data in Mysql and hadoop

Source: Internet
Author: User
Tags sqoop
Recently, when you want to exclude the logic of user thumb ups, you need to combine nginx access. only part of log logs and Mysql records can be used for joint query. Previous nginx logs are stored in hadoop, while mysql Data is not imported into hadoop, to do this, you have to import some tables in Mysql into HDFS. Although the name of Sqoop was too early

Recently, when you want to exclude the logic of user thumb ups, you need to combine nginx access. only part of log logs and Mysql records can be used for joint query. Previous nginx logs are stored in hadoop, while mysql Data is not imported into hadoop, to do this, you have to import some tables in Mysql into HDFS. Although the name of Sqoop was too early

Recently, when you want to exclude the logic of user thumb ups, you need to combine nginx access. only part of log logs and Mysql records can be used for joint query. Previous nginx logs are stored in hadoop, while mysql Data is not imported into hadoop, to do this, you have to import some tables in Mysql into HDFS. Although Sqoop has been well known, it has never been used in a production environment. This is a good practical opportunity.

Please try it.

(1) installation environment

Operating System: Linux (centos6.5)

JDK version: 1.7.0 _ 45

Hadoop version: hadoop2.2.0

Sqoop2: sqoop-1.99.4-bin-hadoop200

Hadoop installation directory:/home/hadoop/hadoop-2.2.0

Sqoop2 Directory:/home/hadoop/sqoop-1.99.4-bin-hadoop200

Hadoop and Sqoop are the home directories of hadoop users under the same user hadoop:/home/hadoop


(2) modify the configuration file of Sqoop2

1. First modify the configuration file/home/hadoop/sqoop-1.99.4-bin-hadoop200/server/conf/sqoop. properties to specify the path of the hadoop configuration file.

Configure the following:

# Hadoop configuration directoryorg.apache.sqoop.submission.engine.mapreduce.configuration.directory=/etc/hadoop/conf/
To:

# Hadoop configuration directoryorg.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/hadoop-2.2.0/etc/hadoop/

2. modify the configuration file/home/hadoop/sqoop-1.99.4-bin-hadoop200/server/conf/catalina. properties.

Here, all the *. jar packages under/home/hadoop/hadoop-2.2.0/share/hadoop are added to the class path of sqoop2.

Configure the following:

common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/lib/hadoop/*.jar,/usr/lib/hadoop/lib/*.jar,/usr/lib/hadoop-hdfs/*.jar,/usr/lib/hadoop-hdfs/lib/*.jar,/usr/lib/hadoop-mapreduce/*.jar,/usr/lib/hadoop-mapreduce/lib/*.jar,/usr/lib/hadoop-yarn/*.jar,/usr/lib/hadoop-yarn/lib/*.jar

To:

common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/common/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/tools/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/httpfs/tomcat/lib/*.jar

(3) Modify Environment Variables

Because sqoop2 and Hadoop are both hadoop users and the home Directory of hadoop users is/home/hadoop, you can directly modify/home/hadoop/. bash_profile and append the following content at the end of the file:

export SQOOP_HOME=/home/hadoop/sqoop-1.99.4-bin-hadoop200  export PATH=$SQOOP_HOME/bin:$PATH  export CATALINA_HOME=$SQOOP_HOME/server  export LOGDIR=$SQOOP_HOME/logs

Of course, after completing the above modification, You need to execute commands under the hadoop user to make the configuration take effect:

source  /home/hadoop/.bash_profile

(4) test sqoop2

Now you can try sqoop2 available unavailable, enter the directory/home/hadoop/sqoop-1.99.4-bin-hadoop200/bin to execute the following command to experience.

Start sqoop2 services:./sqoop2-server start

Go to the Client's shell environment:./sqoop2-shell

Connect the Client to the server: set server -- host 127.0.0.1 -- port 12000 -- webapp sqoop

For other links and jobs, refer:

Http://sqoop.apache.org/docs/1.99.5/CommandLineClient.html:


SQOOP2 does not know why. Why can't I find the delimiter number for specifying fields when exporting data from mysql to HDFS! This is a tragedy, so I have to find another way out. Although at this moment I can check the Sqoop2 source code to see if there is any place to set the separator, the quickest way is to try another version, for example, 1.4.5. Otherwise, SQOOP cannot be so famous.

The trial experience of Sqoop1.4.5 is not smooth sailing!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.