Sqoop Mysql Import to HDFs, hive

Source: Internet
Author: User
Tags relational database table mysql import sqoop

Sqoop is a database used in Hadoop and relational databases (Oracle,mysql ... Open source tools for data transfer between The following is an example of MySQL, SQL Server, using Sqoop to import data from MySQL, SQL Server into Hadoop (HDFS, Hive) #导入命令及参数介绍通用参数

Name of parameter

Parameter description

--connect <jdbc-uri> JDBC Connection string
--username <username> Database user Name
--password <password> Database Password
-P When importing, get the database password from the console
--password-file Gets the database password from the specified file
--verbose When importing, output more log information

Control parameters
Parameters Description
–append Appends data to a dataset that already exists in HDFs. With this parameter, Sqoop will pivot the data into a temporary directory and then re-name the file to a formal directory to avoid duplicate files that already exist in the directory.
–as-avrodatafile Import data into a Avro data file
–as-sequencefile Import data into a sequence file
–as-textfile Import the data into a plain text file, and after the text file is generated, you can query the results in hive with an SQL statement.
–boundary-query Boundary query, that is, before the import through the SQL query to get a result set, and then the imported data is the data in the result set, such as: –boundary-query ' select Id,no from t where id = 3 ', indicating that the imported data is Id=3 records, or select min (), Max () from, note that the field in the query cannot have a field with a data type of string, or an error will be
–columns<col,col> Specifies the value of the field to import, in the form of: –columns id,username
–direct Direct import mode, using the Import and export tool that comes with the relational database. Official online is saying that this import will be faster
–direct-split-size On the basis of direct import above, the imported stream is divided by bytes, especially when importing data from PostgreSQL using the direct-connect mode, a file that reaches the set size can be separated into several separate files.
–inline-lob-limit Set the maximum value for a large object data type
-m,–num-mappers Start n map to import data in parallel, default is 4, it is best not to set the number higher than the number of nodes in the cluster
–query,-e Import data from the query results, which must be specified with –target-dir, –hive-table, a Where condition in the query statement, and a \ $CONDITIONS required in the Where condition, example: –query ' select * FROM t where \ $CONDITIONS ' –target-dir/tmp/t–hive-table t
–split-by Table column name, used to slice the unit of work, typically followed by the primary key ID
–table Relational database table name, which gets the data from the table
–delete-target-dir Delete Destination Directory
–target-dir Specify the HDFs path
–warehouse-dir Can not be used in conjunction with –target-dir, specify the storage directory for data import, suitable for HDFS import, not suitable for importing hive directory
–where Query criteria when importing data from a relational database, example: –where "id = 2"
-z,–compress Compression parameters, the data is not compressed by default, this parameter can be used to compress the data using the GZIP compression algorithm for sequencefile, text text files, and Avro files
–compression-codec Hadoop compression encoding, default is gzip
–null-string Optional parameter, if not specified, the string null will be used
–null-non-string Optional parameter, if not specified, the string null will be used
--fields-terminated-by Custom row Separators
--lines-terminated-by Custom column Separators
--hive-drop-import-delims Remove the default Hive delimiter that is included in the import data
# #导入数据到HDFS
# #Mysql作为元数据
$ sqoop Import--connect jdbc:mysql://ip:port/--username USER--password ' PASSWD ' "$conditions " --target-dir ' Mydir '--delete-target-dir


# #SqlServer作为元数据
$ sqoop Import--connect ' jdbc:sqlserver://ip:1433;username=name;password=pwd;database=dataname '--table T_SQOOP_ TEST--where "$conditions"--target-dir ' Mydir '--delete-target-dir
# # #导入到hiveAdd the --hive-import parameter to the import command to import the data into hive。
$ sqoop Import--hive-import--connect jdbc:mysql://ip:port/--username USER--password ' PASSWD '--table t_sqoop_test--m 1--target-dir ' Mydir '--delete-target-dir
Parameter description:

Name of parameter

Parameter description

--hive-import Data import to Hive
--hive-overwrite Overwrite data that already exists in the hive table
--create-hive-table This parameter is set, and the import task fails if the table already exists in hive when it is imported. Default is False
--hive-table <table-name> Specify the name of the table to import into hive
--hive-drop-import-delims When you import hive, remove \ n (line break), \ r (carriage return), and \01 (heading start) characters in the character field.
--hive-delims-replacement When you import hive, replace the \n,\r and \01 characters in the character field with a user-defined string.
need to be aware when importing: 1) Hive uses the \01 character as the column delimiter (field delimiter) by default, \ n and \ r as the row delimiter. Therefore, if you import character fields that contain these characters in the data, there is a problem. For example: T_sqoop_test table ID 2 row, Name field value contains newline character, import into hive, data exception: You can use the--hive-drop-import-delims parameter to remove the \n,\r,\01 characters from the imported data. You can also use--hive-delims-replacement to replace \n,\r and \01.

Sqoop Mysql Import to HDFs, hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.