Sqoop operations-ETL small case

Source: Internet
Author: User
Tags dname sqoop

Extraction-transformation-loading is short for data extraction, conversion, and loading.
Extract data from Oracle, analyze and convert the data through hive, and store the data in Oracle.

 

This case is purely demo-level.

I. Requirements
Store the data analysis of EMP and dept tables to the result table.

EMP and dept tables are built-in Oracle tables. The table structure is as follows:

EMP table

Empno Number (4)
Ename Varchar2 (10)
Job Varchar2 (9)
Mgr Number (4)
Hiredate Date
Sal Number (7,2)
Comm Number (7,2)
Deptno Number (2)

 

 

 

 

 

 

 

Dept table

Deptno Number (2)
Dname Varchar2 (14)
Loc Varchar2 (13)

 

 

 

Result table

Empno Employee ID
Ename Employee name
Commn Allowance
Dname Department No.

 

 

 

 

Ii. Data Preparation

Create hive table

create table emp_etl(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;

create table dept_etl(deptno int,dname string,loc string)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;create table tmp_result_etl(empno int,ename string,comm double,dname string)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;

create table result_etl(empno int,ename string,comm double,dname string)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;

 

Import hive data

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table EMP \--hive-overwrite --hive-import  --hive-table emp_etl  \--null-string ‘‘ --null-non-string ‘0‘ \--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ -m 3;

sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table DEPT \--hive-overwrite --hive-import --hive-table dept_etl \--null-string ‘‘ --null-non-string ‘0‘ \--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ -m 3;

 

Iii. Implementation Method

Analyze and process in hive, export the results to HDFS, and then use sqoop to import HDFS results to the database.
1) extraction: Oracle Data is extracted to hive. See the previous two steps.
2) Conversion: insert the query result to the hive table

INSERT OVERWRITE TABLE result_etl select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno);

3) Conversion: import data to the HDFS File System

INSERT OVERWRITE DIRECTORY ‘RESULT_ETL_HIVE‘ SELECT * from result_etl;

4) load: load data in the HDFS system to Oracle (the result table needs to be manually created)

Create an oracle table to store ETL results

CREATE TABLE RESULT_ETL2(empno INT,ename VARCHAR(10),COMM DOUBLE,dname VARCHAR(14));

 

sqoop export --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table RESULT_ETL2 \--export-dir /user/hadoop/RESULT_ETL_HIVE \--fields-terminated-by ‘\001‘ \-m 2;

 

 

Or put all the scripts (except the Oracle table creation) in the shell file and execute them together.

#!/bin/sh                                                                                               . /etc/profile                                          set -x                                                                                               hql="INSERT OVERWRITE TABLE result_etl            select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno) "                                                      hive -e "$hql"                                                                                                sqoop export --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table RESULT_ETL2  \--export-dir /user/hadoop/RESULT_ETL_HIVE \--fields-terminated-by ‘\001‘  \-m 2;

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.