Extraction-transformation-loading is short for data extraction, conversion, and loading.
Extract data from Oracle, analyze and convert the data through hive, and store the data in Oracle.
This case is purely demo-level.
I. Requirements
Store the data analysis of EMP and dept tables to the result table.
EMP and dept tables are built-in Oracle tables. The table structure is as follows:
EMP table
Empno |
Number (4) |
Ename |
Varchar2 (10) |
Job |
Varchar2 (9) |
Mgr |
Number (4) |
Hiredate |
Date |
Sal |
Number (7,2) |
Comm |
Number (7,2) |
Deptno |
Number (2) |
Dept table
Deptno |
Number (2) |
Dname |
Varchar2 (14) |
Loc |
Varchar2 (13) |
Result table
Empno |
Employee ID |
Ename |
Employee name |
Commn |
Allowance |
Dname |
Department No. |
Ii. Data Preparation
Create hive table
create table emp_etl(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;
create table dept_etl(deptno int,dname string,loc string)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;create table tmp_result_etl(empno int,ename string,comm double,dname string)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;
create table result_etl(empno int,ename string,comm double,dname string)row format delimited fields terminated by ‘\t‘ lines terminated by ‘\n‘ stored as textfile;
Import hive data
sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table EMP \--hive-overwrite --hive-import --hive-table emp_etl \--null-string ‘‘ --null-non-string ‘0‘ \--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ -m 3;
sqoop import --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table DEPT \--hive-overwrite --hive-import --hive-table dept_etl \--null-string ‘‘ --null-non-string ‘0‘ \--fields-terminated-by ‘\t‘ --lines-terminated-by ‘\n‘ -m 3;
Iii. Implementation Method
Analyze and process in hive, export the results to HDFS, and then use sqoop to import HDFS results to the database.
1) extraction: Oracle Data is extracted to hive. See the previous two steps.
2) Conversion: insert the query result to the hive table
INSERT OVERWRITE TABLE result_etl select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno);
3) Conversion: import data to the HDFS File System
INSERT OVERWRITE DIRECTORY ‘RESULT_ETL_HIVE‘ SELECT * from result_etl;
4) load: load data in the HDFS system to Oracle (the result table needs to be manually created)
Create an oracle table to store ETL results
CREATE TABLE RESULT_ETL2(empno INT,ename VARCHAR(10),COMM DOUBLE,dname VARCHAR(14));
sqoop export --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table RESULT_ETL2 \--export-dir /user/hadoop/RESULT_ETL_HIVE \--fields-terminated-by ‘\001‘ \-m 2;
Or put all the scripts (except the Oracle table creation) in the shell file and execute them together.
#!/bin/sh . /etc/profile set -x hql="INSERT OVERWRITE TABLE result_etl select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno) " hive -e "$hql" sqoop export --connect jdbc:oracle:thin:@192.168.1.107:1521/ORCL \--username SCOTT --password TIGER \--table RESULT_ETL2 \--export-dir /user/hadoop/RESULT_ETL_HIVE \--fields-terminated-by ‘\001‘ \-m 2;