Welcome to the big Data and AI technical articles released by the public number: Qing Research Academy, where you can learn the night white (author's pen name) carefully organized notes, let us make a little progress every day, so that excellent become a habit!
First, Sqoop's introduction:
Sqoop is a data acquisition engine/data exchange engine that captures database in relational databases (RDBMS) primarily for data transfer between RDBMS and Hdfs/hive/hbase and can be sqoop The import command imports data from the RDBMS into the hdfs/hive/hbase, and can also import data from the Hdfs/hive/hbase into the RDBMS through the Sqoop Export command, featuring: bulk data acquisition, The underlying dependency on the MapReduce program works by connecting to a relational database (RDBMS) through JDBC.
Second, the experimental conditions of Sqoop:
Experimental condition: Install the Windows XP operating system and Oracle database.
Why choose an Oracle database in a relational database?
Cause: 1. It is easier to install an Oracle database on a Windows system than on a Linux system. 2. The SH user in the Oracle database contains the sales order form, which contains 920,000 records, and the Scott user contains the existing employee table Emp.csv and Department table Dept.csv.
Iii. the driver class name and URL format for each database:
Database driver class name URL format port number
Oracle Oracle.jdbc.OracleDriver Jdbc:oracle:thin: @IP: 1521:ORCL 1521
MySQL com.mysql.jdbc.Driver jdbc:mysql://ip:3306/dbname?name=value 3306
Hive Org.apache.hive.jdbc.HiveDriver Jdbc:hive2://ip:10000/dbname 10000
Iv. Installation and Configuration sqoop:
Note: You do not need to modify the configuration file
1, installation SQOOP:TAR-ZXVF sqoop-1.4.5bin_hadoop-0.23.tar.gz-c ~/training
2. Configure Sqoop_home Environment variables:
Export sqoop_home=/root/training/sqoop-1.4.5bin_hadoop-0.23
Export path= $SQOOP _home/bin: $PATH
Use the Sqoop statement to collect data from the RDBMS:
1. Import all data from the employee table EMP:
Sqoop import--connect jdbc:oracle:thin:@192.168.182.157:1521:orcl--username SCOTT--password Tiger--table EMP-- Target-dir/sqoop/import/emp1
2. Import the specified column in the Employee table EMP:
Sqoop import--connect jdbc:oracle:thin:@192.168.182.157:1521:orcl--usrname SCOTT--password Tiger--table emp-column E Name,sal--TARGET-DIR/SQOOP/IMPORT/EMP2
3. Import all data from the sales table:
Sqoop Import--connect jdbc:oracle:thin:@192.168.182.157:1521:orcl--username sh--password sh--table SALES-- Target-dir/sqoop/import/sales-m 1
4. Import all the tables under the Scott user into HDFs:
Sqoop import-all-tables--connect jdbc:oracle:thin:@192.168.182.157:1521:orcl--usernmae SCOTT--password Tiger
5. Export the data in HDFs into the RDBMS:
Sqoop export--connect jdbc:oracle:thin:@192.168.182.157:1521:orcl--username SCOTT--password Tiger--table STUDENTS-- Export-dir/students
Six, the difference between Oracle database and MySQL database:
1, the Oracle database is case-sensitive, you need to capitalize the: User name, table name, column name, MySQL database is not case-sensitive.
2. The Oracle database has only one database: ORCL, which is created automatically when the Oracle database is installed, and the MySQL database has many databases.
3, the Oracle database has many users, the table belongs to the user, MySQL database has many databases, the table belongs to the database, the database set different access rights for different users.
Seven, Sqoop and flume the same and different:
Same: Sqoop and Flume are data acquisition engines.
Different: Sqoop features: Batch data acquisition, flume characteristics: real-time data acquisition, mainly used in real-time acquisition system.
Li Jinze Allenli, Tsinghua University in the master's degree, Research direction: Big data and artificial intelligence.
The sqoop& of large data acquisition engine captures data from Oracle database