Sqoop Mysql Import to HDFs, hive

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sqoop is a database used in Hadoop and relational databases (Oracle,mysql ... Open source tools for data transfer between The following is an example of MySQL, SQL Server, using Sqoop to import data from MySQL, SQL Server into Hadoop (HDFS, Hive) #导入命令及参数介绍通用参数

Name of parameter	Parameter description
--connect <jdbc-uri>	JDBC Connection string
--username <username>	Database user Name
--password <password>	Database Password
-P	When importing, get the database password from the console
--password-file	Gets the database password from the specified file
--verbose	When importing, output more log information

Control parameters

Parameters	Description
–append	Appends data to a dataset that already exists in HDFs. With this parameter, Sqoop will pivot the data into a temporary directory and then re-name the file to a formal directory to avoid duplicate files that already exist in the directory.
–as-avrodatafile	Import data into a Avro data file
–as-sequencefile	Import data into a sequence file
–as-textfile	Import the data into a plain text file, and after the text file is generated, you can query the results in hive with an SQL statement.
–boundary-query	Boundary query, that is, before the import through the SQL query to get a result set, and then the imported data is the data in the result set, such as: –boundary-query ' select Id,no from t where id = 3 ', indicating that the imported data is Id=3 records, or select min (), Max () from, note that the field in the query cannot have a field with a data type of string, or an error will be
–columns<col,col>	Specifies the value of the field to import, in the form of: –columns id,username
–direct	Direct import mode, using the Import and export tool that comes with the relational database. Official online is saying that this import will be faster
–direct-split-size	On the basis of direct import above, the imported stream is divided by bytes, especially when importing data from PostgreSQL using the direct-connect mode, a file that reaches the set size can be separated into several separate files.
–inline-lob-limit	Set the maximum value for a large object data type
-m,–num-mappers	Start n map to import data in parallel, default is 4, it is best not to set the number higher than the number of nodes in the cluster
–query,-e	Import data from the query results, which must be specified with –target-dir, –hive-table, a Where condition in the query statement, and a \ $CONDITIONS required in the Where condition, example: –query ' select * FROM t where \ $CONDITIONS ' –target-dir/tmp/t–hive-table t
–split-by	Table column name, used to slice the unit of work, typically followed by the primary key ID
–table	Relational database table name, which gets the data from the table
–delete-target-dir	Delete Destination Directory
–target-dir	Specify the HDFs path
–warehouse-dir	Can not be used in conjunction with –target-dir, specify the storage directory for data import, suitable for HDFS import, not suitable for importing hive directory
–where	Query criteria when importing data from a relational database, example: –where "id = 2"
-z,–compress	Compression parameters, the data is not compressed by default, this parameter can be used to compress the data using the GZIP compression algorithm for sequencefile, text text files, and Avro files
–compression-codec	Hadoop compression encoding, default is gzip
–null-string	Optional parameter, if not specified, the string null will be used
–null-non-string	Optional parameter, if not specified, the string null will be used
--fields-terminated-by	Custom row Separators
--lines-terminated-by	Custom column Separators
--hive-drop-import-delims	Remove the default Hive delimiter that is included in the import data

# #导入数据到HDFS

# #Mysql作为元数据
$ sqoop Import--connect jdbc:mysql://ip:port/--username USER--password ' PASSWD '  "$conditions " --target-dir ' Mydir '--delete-target-dir



# #SqlServer作为元数据

$ sqoop Import--connect ' jdbc:sqlserver://ip:1433;username=name;password=pwd;database=dataname '--table T_SQOOP_ TEST--where "$conditions"--target-dir ' Mydir '--delete-target-dir

# # #导入到hiveAdd the --hive-import parameter to the import command to import the data into hive。

$ sqoop Import--hive-import--connect jdbc:mysql://ip:port/--username USER--password ' PASSWD '--table t_sqoop_test--m 1--target-dir ' Mydir '--delete-target-dir

Parameter description:

Name of parameter	Parameter description
--hive-import	Data import to Hive
--hive-overwrite	Overwrite data that already exists in the hive table
--create-hive-table	This parameter is set, and the import task fails if the table already exists in hive when it is imported. Default is False
--hive-table <table-name>	Specify the name of the table to import into hive
--hive-drop-import-delims	When you import hive, remove \ n (line break), \ r (carriage return), and \01 (heading start) characters in the character field.
--hive-delims-replacement	When you import hive, replace the \n,\r and \01 characters in the character field with a user-defined string.

need to be aware when importing: 1) Hive uses the \01 character as the column delimiter (field delimiter) by default, \ n and \ r as the row delimiter. Therefore, if you import character fields that contain these characters in the data, there is a problem. For example: T_sqoop_test table ID 2 row, Name field value contains newline character, import into hive, data exception: You can use the--hive-drop-import-delims parameter to remove the \n,\r,\01 characters from the imported data. You can also use--hive-delims-replacement to replace \n,\r and \01.

Sqoop Mysql Import to HDFs, hive

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More