Recently, when you want to exclude the logic of user thumb ups, you need to combine nginx access. only part of log logs and Mysql records can be used for joint query. Previous nginx logs are stored in hadoop, while mysql Data is not imported into hadoop, to do this, you have to import some tables in Mysql into HDFS. Although the name of Sqoop was too early
Recently, when you want to exclude the logic of user thumb ups, you need to combine nginx access. only part of log logs and Mysql records can be used for joint query. Previous nginx logs are stored in hadoop, while mysql Data is not imported into hadoop, to do this, you have to import some tables in Mysql into HDFS. Although the name of Sqoop was too early
Recently, when you want to exclude the logic of user thumb ups, you need to combine nginx access. only part of log logs and Mysql records can be used for joint query. Previous nginx logs are stored in hadoop, while mysql Data is not imported into hadoop, to do this, you have to import some tables in Mysql into HDFS. Although Sqoop has been well known, it has never been used in a production environment. This is a good practical opportunity.
Please try it.
(1) installation environment
Operating System: Linux (centos6.5)
JDK version: 1.7.0 _ 45
Hadoop version: hadoop2.2.0
Sqoop2: sqoop-1.99.4-bin-hadoop200
Hadoop installation directory:/home/hadoop/hadoop-2.2.0
Sqoop2 Directory:/home/hadoop/sqoop-1.99.4-bin-hadoop200
Hadoop and Sqoop are the home directories of hadoop users under the same user hadoop:/home/hadoop
(2) modify the configuration file of Sqoop2
1. First modify the configuration file/home/hadoop/sqoop-1.99.4-bin-hadoop200/server/conf/sqoop. properties to specify the path of the hadoop configuration file.
Configure the following:
# Hadoop configuration directoryorg.apache.sqoop.submission.engine.mapreduce.configuration.directory=/etc/hadoop/conf/
To:
# Hadoop configuration directoryorg.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/hadoop-2.2.0/etc/hadoop/
2. modify the configuration file/home/hadoop/sqoop-1.99.4-bin-hadoop200/server/conf/catalina. properties.
Here, all the *. jar packages under/home/hadoop/hadoop-2.2.0/share/hadoop are added to the class path of sqoop2.
Configure the following:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/lib/hadoop/*.jar,/usr/lib/hadoop/lib/*.jar,/usr/lib/hadoop-hdfs/*.jar,/usr/lib/hadoop-hdfs/lib/*.jar,/usr/lib/hadoop-mapreduce/*.jar,/usr/lib/hadoop-mapreduce/lib/*.jar,/usr/lib/hadoop-yarn/*.jar,/usr/lib/hadoop-yarn/lib/*.jar
To:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/common/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/tools/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*.jar,/home/hadoop/hadoop-2.2.0/share/hadoop/httpfs/tomcat/lib/*.jar
(3) Modify Environment Variables
Because sqoop2 and Hadoop are both hadoop users and the home Directory of hadoop users is/home/hadoop, you can directly modify/home/hadoop/. bash_profile and append the following content at the end of the file:
export SQOOP_HOME=/home/hadoop/sqoop-1.99.4-bin-hadoop200 export PATH=$SQOOP_HOME/bin:$PATH export CATALINA_HOME=$SQOOP_HOME/server export LOGDIR=$SQOOP_HOME/logs
Of course, after completing the above modification, You need to execute commands under the hadoop user to make the configuration take effect:
source /home/hadoop/.bash_profile
(4) test sqoop2
Now you can try sqoop2 available unavailable, enter the directory/home/hadoop/sqoop-1.99.4-bin-hadoop200/bin to execute the following command to experience.
Start sqoop2 services:./sqoop2-server start
Go to the Client's shell environment:./sqoop2-shell
Connect the Client to the server: set server -- host 127.0.0.1 -- port 12000 -- webapp sqoop
For other links and jobs, refer:
Http://sqoop.apache.org/docs/1.99.5/CommandLineClient.html:
SQOOP2 does not know why. Why can't I find the delimiter number for specifying fields when exporting data from mysql to HDFS! This is a tragedy, so I have to find another way out. Although at this moment I can check the Sqoop2 source code to see if there is any place to set the separator, the quickest way is to try another version, for example, 1.4.5. Otherwise, SQOOP cannot be so famous.
The trial experience of Sqoop1.4.5 is not smooth sailing!