Using SQOOP2 to implement HDFS with Oracle data import ____oracle

Source: Internet
Author: User
Tags oracle database sqoop hadoop fs

The previous article has completed the installation of SQOOP2, this article describes sqoop2 to import data from Oracle HDFs has been imported from HDFs Oracle

The use of Sqoop is mainly divided into the following parts

Connect Server Search Connectors Create link Create job Execute job View job run information


Before using SQOOP2, you need to make the following modifications to the Hadoop configuration file and start the Jobhistory service, otherwise, the job can only run once and the second run will have an error

1) configuration mapred-site.xml: (Namenode node Can)
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.101.11:10020</value>
</property>
2) Start Jobhistory Service
sbin/mr-jobhistory-daemon.sh Start Historyserver


1. Connect SQOOP2 Server

SQOOP2 is the C/S mode, previously installed and started the Sqoop server, if you want to use and operate SQOOP2, you need to connect to the server through the client, and then use. The client can initiate from any terminal as long as the server is guaranteed to be in a normal state of operation. That is, the client can be started in a machine that only extracts the SQOOP2 installation package, without Hadoop and sqoopserver.

Start Client Bin/sqoop2-shell

[Hadoop@hadoop01 sqoop-1.99.7-bin-hadoop200]$ Bin/sqoop2-shell Setting conf dir:/home/hadoop/ sqoop-1.99.7-bin-hadoop200/bin/..

/conf Sqoop Home directory:/home/hadoop/sqoop-1.99.7-bin-hadoop200 sqoop shell:type ' help ' or ' \h ' for help. Sqoop:000> \h For information about Sqoop, visit:http://sqoop.apache.org/available commands:: Exit (: X) exit       The Shell:history (: H) display, manage and recall edit-line history help (\h) display this help message set (\st)    Configure various client options and Settings Show (\SH) Display various objects and configuration options Create (\CR) Create new object in Sqoop repository Delete (\d) Delete existing object in Sqoop Repository update (\UP) updat      e objects in Sqoop repository Clone (\CL) Create new object based on existing one start (\sta) Start job stop (\STP) Stop job Status (\stu) Display status of a job enable (\en) Enable object in Sqoop repository disable (\DI) Disable object in Sqoop Repository Grant (\G) grant access to roles and assign privileges revoke (\ r) Re  Voke access from roles and remove privileges for help on a specific command type:help command sqoop:000>

At this point, only the client is started, the server is not connected, and if the test creates link or job, then the default connection is localhost. So, to set up a connection server first:

Sqoop:000> Set server--host 192.168.101.12--port 12000--webapp sqoop
Server is set successfully

2. Search Connector

Connector is the template that is used to create link, for example, to create link for HDFs, use Hdfs-connector to create link for Oracle, Then use Oracle-jdbc-connector or Generic-jdbc-connector

If you set up a server-side connection that is normally available, the show Connector command appears as follows:

Sqoop:000> Show connector +------------------------+---------+------------------------------------------------          ------------+----------------------+
| Name |                           Version | Class |
Supported Directions | +------------------------+---------+------------------------------------------------------------+-------------- --------+
| Generic-jdbc-connector | 1.99.7 | Org.apache.sqoop.connector.jdbc.GenericJdbcConnector | from/to | | Kite-connector | 1.99.7 | Org.apache.sqoop.connector.kite.KiteConnector | from/to | | Oracle-jdbc-connector | 1.99.7 | Org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | from/to | | Ftp-connector | 1.99.7 | Org.apache.sqoop.connector.ftp.FtpConnector | to | | Hdfs-connector | 1.99.7 | Org.apache.sqoop.connector.hdfs.HdfsConnector | from/to | |Kafka-connector | 1.99.7 | Org.apache.sqoop.connector.kafka.KafkaConnector | to | | Sftp-connector | 1.99.7 | Org.apache.sqoop.connector.sftp.SftpConnector |
to | +------------------------+---------+------------------------------------------------------------+--------------
 --------+


Otherwise, you are prompted to reject the connection

Sqoop:000> Show connector 
Exception has occurred during command 
processing: Java.net.ConnectException message: Reject connection (Connection refused)


3. Create link

As the name suggests, you create a connection to each source or destination store. The link type that you can create is listed in the previous step show connector

1) Create HDFs link

sqoop:000> Create link--connector hdfs-connector
creating link for connector with name Hdfs-connector
Fill following values to create new link object
name:hdfs-link   # #link的名称, free up

hdfs cluster

uri:hdfs:/ /192.168.101.11:9000  # #hdfs地址, view Conf directory in the Hadoop configuration file core-site.xml
:/home/hadoop/hadoop-2.6.5/etc/ Hadoop # #hadoop配置文件目录
Additional configs:: 
There are currently 0 values in the map:
entry#               # #直接回车即可 C12/>new Link is successfully created with validation status OK and name Hdfs-link
sqoop:000> Show link
+--- --------+----------------+---------+
|   Name    | Connector Name | Enabled |
+-----------+----------------+---------+
| hdfs-link | hdfs-connector | true |
+-----------+----------------+---------+




2) Create Oracle Link

sqoop:000> Create link--connector generic-jdbc-connector creating link for connector with name Generic-jdbc-connector Please fill following values to create new link object Name:oracle-link # #link的名称, you can start Database connection Driver C Lass:oracle.jdbc.driver.OracleDriver # #Oracle JDBC driver Connection STRING:JDBC:ORACLE:THIN:@192.168.101.9:1521:ORCL #  Oracle connection String Username:scott # #Oracle用户名 Password: * * * #密码 Fetch Size: #以下直接回车, the default can be Connection Properties:there are Currently 0 values in the map:entry# SQL dialect Identifier enclose: # #sql定界符, to avoid errors, you need to play a space New link was successful Ly created with validation the status OK and name Oracle-link sqoop:000> Show link +-------------+-----------------------    -+---------+

|     Name | Connector Name |

Enabled | +-------------+------------------------+---------+

| Oracle-link | Generic-jdbc-connector |

true | | Hdfs-link | Hdfs-connector |

true | +-------------+------------------------+---------+
 


3. Create a job that imports data from Oracle to HDFs

sqoop:000> Create job-f oracle-link-t hdfs-link creating job for links with from name Oracle-link and to name Hdfs-li NK please fill following values to create new Job Object Name:oracle2hdfs #job名称 Database source Schema Name:scott # Oracle schema, user name table name:emp #需要导入的表名 SQL statement: # #SQL, default import entire table Column Names:there are currently 0 value s in the list:element# Partition column:empno # #指定一个列名即可, you can typically use a primary key, or a time column, to mapper task Segmentation Partition column Nullable:bound ary Query:incremental Read Check column:last value:target configuration Override null Value:null value:file  format:0: text_file 1:sequence_file 2:parquet_file choose:0 #选择导入hdfs的一种格式, select txt can Compression codec:0 : NONE 1:default 2:deflate 3:gzip 4:bzip2 5:lzo 6:lz4 7:snappy 8:custom choose:0 #默认无压

Indent Custom codec:output directory:/data # #导入到hdfs的目录, note that when full import, the directory needs to be empty Append mode: # #追加模式, default to full amount throttling resources 
Extractors:loaders:
Classpath Configuration Extra Mapper Jars:there are currently 0 values in the list:element# New job is successfully  Created with validation status OK and name Oracle2hdfs sqoop:000>

4. Create a job imported from HDFs to Oracle

sqoop:000> Create job-f hdfs-link-t oracle-link creating job for links with from name Hdfs-link and to name Oracle-li NK please fill following values to create new Job Object name:hdfs2oracle input Configuration Input directory:/data # #hdfs import directory Override null value:null value:incremental Import incremental type:0: NONE 1:new_files Choose: 0 #默认选0即可, should be in conjunction with the previous parameter, incremental import, set the incremental imports mode last imported date:database target schema Name:scott #oracle用户 Tab 
Le NAME:EMP2 #oracle表名, you need to create a table structure Column in advance Names:there are currently 0 values in the list:element# table: Clear Stage table:throttling Resources Extractors:Loaders:Classpath Configuration Extra mapper Jars:there are 
Currently 0 values in the list:element# New job is successfully created with validation status OK and name Hdfs2oracle Sqoop:000> sqoop:000> Show Job +----+-------------+--------------------------------------+------------------ --------------------+---------+
|    Id |            Name |             From Connector | to Connector |
Enabled | +----+-------------+--------------------------------------+--------------------------------------+---------+
| 8 | Oracle2hdfs | Oracle-link (Generic-jdbc-connector) | Hdfs-link (Hdfs-connector) | true | | 9 | hdfs2oracle | Hdfs-link (Hdfs-connector) | Oracle-link (Generic-jdbc-connector) |
true | +----+-------------+--------------------------------------+--------------------------------------+---------+ sqoop  :000>

5, run the job

Start Job-name Oracle2hdfs

sqoop:000> start Job-name Oracle2hdfs
submission Details
job Name:oracle2hdfs
Server url:http:// 192.168.101.12:12000/sqoop/
Created by:hadoop
creation date:2017-10-17 17:24:23 CST Lastly by
: Hadoop
External id:job_1508227894698_0001
        http://hadoop01:8088/proxy/application_1508227894698_0001/
2017-10-17 17:24:23 cst:booting  -Progress is not available
sqoop:000>


sqoop:000> start Job-name hdfs2oracle
submission Details
job name:hdfs2oracle
Server url:http:// 192.168.101.12:12000/sqoop/
Created by:hadoop
creation date:2017-10-17 17:25:38 CST Lastly by
: Hadoop
External id:job_1508227894698_0002
        http://hadoop01:8088/proxy/application_1508227894698_0002/
2017-10-17 17:25:38 cst:booting  -Progress is not available



6. View the operation of the operation

Since SQOOP2 's job is submitted, it is the MapReduce task that runs on the Hadoop cluster, so it can be viewed through the Web admin interface of Hadoop, address: Http://namenode:8088/cluster, The corresponding ID is the external ID.

As shown in the following illustration:


7. Verification Results

Hdfs

[Hadoop@hadoop02 bin]$/hadoop fs-ls/data/found items-rw-r--r--2 hadoop supergroup 172 2017-10-17 17:40 /data/129d2e07-1a0f-44ec-bbcd-299985b6972d.txt-rw-r--r--2 Hadoop supergroup 119 2017-10-17 17:40/data/1fffd203 -a20f-4925-a328-8fe05994b80a.txt-rw-r--r--2 Hadoop supergroup 119 2017-10-17 17:36/data/2b8c5cd1-9ec2-4ae7-868 3-a4587def6b74.txt-rw-r--r--2 hadoop supergroup 0 2017-10-17 17:40/data/31b026a3-697c-4b11-a49d-4c02adfaf847 . txt-rw-r--r--2 hadoop supergroup 0 2017-10-17 17:35/data/3f6b78a3-a4ee-43ea-b4f3-fb46ae1a4ce0.txt-rw-r--r- -2 hadoop supergroup 0 2017-10-17 17:37/data/7ed1cc68-a1c7-4c98-ac25-c2d9d2ce1a54.txt-rw-r--r--2 Hadoop s          Upergroup 0 2017-10-17 17:33/data/8ca051dd-6d80-4663-b062-931683d01026.txt-rw-r--r--2 hadoop supergroup 0 2017-10-17 17:40/data/95cedc62-7355-4408-8361-cbf875645e94.txt-rw-r--r--2 hadoop supergroup 54 2017-10 -17 17:37/data/9e59a5b9-90d3-4b04-aa59-d20302e4ee2b.txt-rw-r--r--2 hadoop supergroup 0 2017-10-17 17:34/data/b3229660-0409-4d33-9b2 f-df9cdb2549ec.txt-rw-r--r--2 Hadoop supergroup 166 2017-10-17 17:40/data/b59fa170-7c5e-4b1e-8ca3-083bb09a9544 . txt-rw-r--r--2 Hadoop supergroup 2017-10-17 17:40/data/cf92a71a-8528-4842-8a88-c4b4ea4b96e2.txt-rw-r--r- -2 Hadoop supergroup 114 2017-10-17 17:36/data/dda07b7d-01a8-420d-964b-d2a5bee35168.txt [HADOOP@HADOOP02 bin]$ ./hadoop fs-cat/data/dda07b7d-01a8-420d-964b-d2a5bee35168.txt 7782, ' CLARK ', ' MANAGER ', 7839, ' 1981-06-09 ', 2450.00,  null,10 7788, ' SCOTT ', ' ANALYST ', 7566, ' 1987-04-19 ', 3000.00,null,20 [hadoop@hadoop02 bin]$


Oracle




Postscript:

Encountered many errors on the way, the record is as follows:



--Error 1
[Hadoop@hadoop01 sqoop-1.99.7-bin-hadoop200]$./bin/sqoop.sh Server Start
Setting conf dir:./bin/. /conf
Sqoop Home directory:/home/hadoop/sqoop-1.99.7-bin-hadoop200
Starting the SQOOP2 server ...
1 [main] INFO org.apache.sqoop.core.sqoopserver-initializing sqoop server.
[main] INFO org.apache.sqoop.core.propertiesconfigurationprovider-starting config file poller thread
Exception in thread "main" java.lang.noclassdeffounderror:org/apache/hadoop/conf/configuration
At Org.apache.sqoop.security.authentication.SimpleAuthenticationHandler.secureLogin ( SIMPLEAUTHENTICATIONHANDLER.JAVA:36)
At Org.apache.sqoop.security.AuthenticationManager.initialize (authenticationmanager.java:98)
At Org.apache.sqoop.core.SqoopServer.initialize (sqoopserver.java:57)
At Org.apache.sqoop.server.sqoopjettyserver.<init> (sqoopjettyserver.java:67)
At Org.apache.sqoop.server.SqoopJettyServer.main (sqoopjettyserver.java:177)
caused by:java.lang.ClassNotFoundException:org.apache.hadoop.conf.Configuration
At Java.net.URLClassLoader.findClass (urlclassloader.java:381)
At Java.lang.ClassLoader.loadClass (classloader.java:424)
At Sun.misc.launcher$appclassloader.loadclass (launcher.java:335)
At Java.lang.ClassLoader.loadClass (classloader.java:357)
... 5 more
Reason: Sqoop does not have the right to find a jar package for Hadoop, because when Hadoop is installed, Hadoop_common_home, Hadoop_hdfs_home, Hadoop_mapred_home, Hadoop_yarn_ Home four environment variables
Workaround:
Modify Bin/sqoop.sh
Comment The following four lines
Hadoop_common_home=${hadoop_common_home:-${hadoop_home}/share/hadoop/common}
HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-${HADOOP_HOME}/SHARE/HADOOP/HDFS}
Hadoop_mapred_home=${hadoop_mapred_home:-${hadoop_home}/share/hadoop/mapreduce}
Hadoop_yarn_home=${hadoop_yarn_home:-${hadoop_home}/share/hadoop/yarn}


Add the following four lines, or you can change ${hadoop_home} to an absolute path
Hadoop_common_home=${hadoop_home}/share/hadoop/common
Hadoop_hdfs_home=${hadoop_home}/share/hadoop/hdfs
Hadoop_mapred_home=${hadoop_home}/share/hadoop/mapreduce
Hadoop_yarn_home=${hadoop_home}/share/hadoop/yarn




----Error 2
Generic_jdbc_connector_0025:no primary key-please specify partition column.
Reason: Partition column: Parameter not specified when creating a relational database to HDFs job
Workaround: Specify a column name, typically with a primary key, or a time column, so that the mapper task can be split
PS: Because Sqoop will use this column as a mapper division, if the column data distribution is skewed, can cause a certain number of mapper overload, while the other mapper countless running problems, so try to select the data evenly distributed classes as partition column

--Error 3
Generic_hdfs_connector_0007:invalid Input/output directory-unexpected exception
Generic_hdfs_connector_0007:invalid input/output directory-output directory is not empty
Reason: The HDFs output directory must be empty, and the specified HDFs directory must be empty when data is entered into HDFs from a relational database
Workaround: Create a new HDFs directory, or empty the old directory




--Error 4
Exception:org.apache.sqoop.common.SqoopException Message:DRIVER_0002:Given job is already running-job with name Hdfs2o Rcl_3
Reason:
No jobhistory service to start Hadoop
Workaround:
Configure Mapred-site.xml: (Namenode node)
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.101.11:10020</value>
</property>
Start the Jobhistory service
sbin/mr-jobhistory-daemon.sh Start Historyserver


When creating Generic-jdbc-connector Link:


--Error 5
Exception:org.apache.sqoop.common.SqoopException Message:GENERIC_JDBC_CONNECTOR_0016:Can ' t fetch schema-
Identifier Enclose: Specifies the delimiter of identifiers in SQL, and some SQL designator is a quotation mark, the default value is double quotes, such as the Oracle database, where you enter the table name and schema with lowercase emp and scoot. Then SQL will be converted to select * from "Scott". EMP ", resulting in an error











Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.