Sqoop study notes--relational database and data migration between HDFS

Last Update:2015-08-03 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Installation:

Upload to a node of the Hadoop cluster, unzip the Sqoop compressed package to use directly;

Second, the configuration:

Copy the connection drive of the database (such as Oracle,MySQL) that need to connect to the Lib in the sqoop directory ;

Third, configure MySQL remote connection

GRANT all privileges the ekp_11.* to ' root ' @ ' 192.168.1.10 ' identified by ' 123456 ' with GRANT OPTION;

FLUSH privileges;

GRANT all privileges on * * to ' root ' @ '% ' identified by ' 123456 ' with GRANT OPTION;

FLUSH Privileges

Iv. Use of:

First: Data in a relational database is imported into HDFS

1.1 uses the default output path, specifies the exported field, and sets the number of maps to 1:

./ sqoop Import--connect jdbc:mysql://192.168.1.10:3306/ Ekp_11 --username Root--password 123 456 --table job_log--columns ' Fd_ ID, Fd_start_time , Fd_end_time, Fd_subject , Fd_success ' - M 1

After successful execution, the root directoryof the same name will be automatically generated under the /user/ directory in HDFs , based on the user name of the operation,root directory will generate the same table name job_log directory, in the job_log directory is the real import into the HDFs data files. such as:

1.2 Specifies the output path, the delimiter after the specified data is imported, and the specified number of maps is 1:

./ sqoop Import--connect jdbc:mysql://192.168.1.10:3306/ Ekp_11 --username Root--password 123 456 --table job_log--target-dir '/sqoop/ JL '--fields-terminated-by ' \ t ' - M 1

After successful execution, we will create our specified directory under the root of HDFs /sqoop/JL , which is really imported into HDFs in the JL Directory data files. To view the contents of the imported data file, the column delimiter for the Data uses tab segmentation as required by us, such as:

1.3 Add where condition , Note: The condition must be enclosed in quotation marks

./ sqoop Import--connect jdbc:mysql://192.168.1.10:3306/ Ekp_11 --username Root--password 123 456 --table job_log--where ' fd_start_time > "2015-06-00 00:00:00" '--target-dir '/sqoop/jl2 ' - M 1

1.4 Add query statement ( use \&NBSP; wrap statement

./ sqoop Import--connect jdbc:mysql://192.168.1.10:3306/ Ekp_11 --username Root--password 123 456 --query ' SELECT * from Job_log where fd_start_time > "2015-06-00 00:00:00" and $CONDITIONS '--target-dir '/sqoop/JL3 ' - M 1

Note: When using the --query command, you need to be aware of the arguments behind the where, and$CONDITIONS This parameter must be added , and there is the difference between single and double quotes, if the --query is followed by double quotation marks, then you need to add \ before $CONDITIONS that \ $CONDITIONS.

1.5 Specify Map number - m

./ sqoop Import--connect jdbc:mysql://192.168.1.10:3306/ Ekp_11 --username Root--password 123 456 --table job_log--target-dir '/sqoop/ Jl4 '--fields-terminated-by ' t '-M 4 --split-by Fd_start_time

Note: If you set the number of maps to 1 - M 1, do not add --split-by ${ Tablename.column}, when you set the number of maps greater than 1 , you need to add a parameter --split-by ${ Tablename.column}, specifies the partitioning of content that is performed by multiple map tasks.

Execution results such as:

Here is a small problem, I was divided by time, there is a data loss problem. You can see that the part-m-0001 file size is 0and there is no data content.

Specify multiple maps and split according to fd_id :

./ sqoop Import--connect jdbc:mysql://192.168.1.10:3306/ Ekp_11 --username Root--password 123 456 --table job_log--target-dir '/sqoop/ Jl4 '--fields-terminated-by ' t '-M 4 --split-by fd_id

Type II: Export Data from HDFS to a relational database ( Don't forget to specify delimiters )

2.1 first creates a table empty table Job_log _bak in the relational database :

CREATE TABLE Job_log_bak select Fd_id,fd_start_time,fd_end_time,fd_subject,fd_success from job_log where fd_id = ' 1 '

Then execute the command to Import the data on HDFs into the table:

./ sqoop Export--connect jdbc:mysql://192.168. 1 . 10:3306/ Ekp_11 --username Root--password 123 456 --export-dir '/user/root/job_log '--table Job_log _bak-m 1--fields-terminated-by ', '

Class III: Exporting data from a relational database to hive

3.1 Copy the table structure of the database into hive , simply duplicating the table's structure:

./sqoop create-hive-table--connect jdbc:mysql://192.168.1.10:3306/ekp_11--table job_log--username root--password 123456--hive-table Job_log

After successful execution, a job_log directory was created in the HDFs hive directory , such as:

3.2 importing files from a relational database into hive

./sqoop import--connect jdbc:mysql://192.168.1.10:3306/ekp_11--username root--password 123456--table Job_log-- Hive-import-m 1 --fields-terminated-by ' \ t '

After successful execution, the table directory Job_logis first created under /user/hive/warehouse/ under the default path of hive . Then, in the table directory, generate the imported data files such as part-m-00000. Specific effects such as:

Note: If you do not specify a delimiter for the field, the hive default delimiter ' \001 ' will be used

Class Fourth: Exporting hive data to a relational database

./sqoop export--connect jdbc:mysql://192.168.1.10:3306/ekp_11--username root--password 123456--table job_log_ Bak--export-dir/user/hive/warehouse/job_log/-M 1--input-fields-terminated-by ' \ t ' --fields-terminated-by ', '

Note: You need to add parameter --input-fields-terminated-byto Sqoop at the time of execution, telling sqoop Reads the delimiter of the file so that it can parse the file fields correctly.

Other uses:

List all databases in the MySQL database

./sqoop list-databases--connect jdbc:mysql://192.168.1.10:3306/--username root--password 123456

Connect mysql and list the tables in the ekp_11 Database

./sqoop List-tables--connect jdbc:mysql://192.168.1.10:3306/ekp_11--username root--password 123456

Not to be continued ...

Sqoop study notes--relational database and data migration between HDFS

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More