Sqoop is a database used in Hadoop and relational databases (Oracle,mysql ... Open source tools for data transfer between The following is an example of MySQL, SQL Server, using Sqoop to import data from MySQL, SQL Server into Hadoop (HDFS, Hive) #导入命令及参数介绍通用参数
Name of parameter |
Parameter description |
--connect <jdbc-uri> |
JDBC Connection string |
--username <username> |
Database user Name |
--password <password> |
Database Password |
-P |
When importing, get the database password from the console |
--password-file |
Gets the database password from the specified file |
--verbose |
When importing, output more log information
|
Control parameters
Parameters |
Description |
–append |
Appends data to a dataset that already exists in HDFs. With this parameter, Sqoop will pivot the data into a temporary directory and then re-name the file to a formal directory to avoid duplicate files that already exist in the directory. |
–as-avrodatafile |
Import data into a Avro data file |
–as-sequencefile |
Import data into a sequence file |
–as-textfile |
Import the data into a plain text file, and after the text file is generated, you can query the results in hive with an SQL statement. |
–boundary-query |
Boundary query, that is, before the import through the SQL query to get a result set, and then the imported data is the data in the result set, such as: –boundary-query ' select Id,no from t where id = 3 ', indicating that the imported data is Id=3 records, or select min (), Max () from, note that the field in the query cannot have a field with a data type of string, or an error will be |
–columns<col,col> |
Specifies the value of the field to import, in the form of: –columns id,username |
–direct |
Direct import mode, using the Import and export tool that comes with the relational database. Official online is saying that this import will be faster |
–direct-split-size |
On the basis of direct import above, the imported stream is divided by bytes, especially when importing data from PostgreSQL using the direct-connect mode, a file that reaches the set size can be separated into several separate files. |
–inline-lob-limit |
Set the maximum value for a large object data type |
-m,–num-mappers |
Start n map to import data in parallel, default is 4, it is best not to set the number higher than the number of nodes in the cluster |
–query,-e |
Import data from the query results, which must be specified with –target-dir, –hive-table, a Where condition in the query statement, and a \ $CONDITIONS required in the Where condition, example: –query ' select * FROM t where \ $CONDITIONS ' –target-dir/tmp/t–hive-table t |
–split-by |
Table column name, used to slice the unit of work, typically followed by the primary key ID |
–table |
Relational database table name, which gets the data from the table |
–delete-target-dir |
Delete Destination Directory |
–target-dir |
Specify the HDFs path |
–warehouse-dir |
Can not be used in conjunction with –target-dir, specify the storage directory for data import, suitable for HDFS import, not suitable for importing hive directory |
–where |
Query criteria when importing data from a relational database, example: –where "id = 2" |
-z,–compress |
Compression parameters, the data is not compressed by default, this parameter can be used to compress the data using the GZIP compression algorithm for sequencefile, text text files, and Avro files |
–compression-codec |
Hadoop compression encoding, default is gzip |
–null-string |
Optional parameter, if not specified, the string null will be used |
–null-non-string |
Optional parameter, if not specified, the string null will be used |
--fields-terminated-by |
Custom row Separators |
--lines-terminated-by |
Custom column Separators |
--hive-drop-import-delims |
Remove the default Hive delimiter that is included in the import data |
# #导入数据到HDFS
# #Mysql作为元数据
$ sqoop Import--connect jdbc:mysql://ip:port/--username USER--password ' PASSWD ' "$conditions " --target-dir ' Mydir '--delete-target-dir
# #SqlServer作为元数据
$ sqoop Import--connect ' jdbc:sqlserver://ip:1433;username=name;password=pwd;database=dataname '--table T_SQOOP_ TEST--where "$conditions"--target-dir ' Mydir '--delete-target-dir
# # #导入到hiveAdd the
--hive-import parameter to the import command to import the data into hive。
$ sqoop Import--hive-import--connect jdbc:mysql://ip:port/--username USER--password ' PASSWD '--table t_sqoop_test--m 1--target-dir ' Mydir '--delete-target-dir
Parameter description:
Name of parameter |
Parameter description |
--hive-import |
Data import to Hive |
--hive-overwrite |
Overwrite data that already exists in the hive table |
--create-hive-table |
This parameter is set, and the import task fails if the table already exists in hive when it is imported. Default is False |
--hive-table <table-name> |
Specify the name of the table to import into hive |
--hive-drop-import-delims |
When you import hive, remove \ n (line break), \ r (carriage return), and \01 (heading start) characters in the character field. |
--hive-delims-replacement |
When you import hive, replace the \n,\r and \01 characters in the character field with a user-defined string. |
need to be aware when importing: 1) Hive uses the \01 character as the column delimiter (field delimiter) by default, \ n and \ r as the row delimiter. Therefore, if you import character fields that contain these characters in the data, there is a problem. For example: T_sqoop_test table ID 2 row, Name field value contains newline character, import into hive, data exception: You can use the--hive-drop-import-delims parameter to remove the \n,\r,\01 characters from the imported data. You can also use--hive-delims-replacement to replace \n,\r and \01.
Sqoop Mysql Import to HDFs, hive