The Sqoop tool connects to a relational database in a Hadoop environment and serves as a bridge to the hadoop storage system. It supports the import of multiple relational data sources and hive, hdfs, and hbase. Generally, relational data tables exist in the backup environment of the online environment. data needs to be imported every day. sqoop can import the entire table based on the amount of data per day, the full table can be imported if the data volume generated every day is not large, but sqoop also provides the incremental data import mechanism.
The following describes several common sqoop commands and some parameters:
Serial number |
Command/command |
Class |
Description |
1 |
Impor |
ImportTool |
Import data from a relational database (from a table or query statement) to HDFS |
2 |
Export |
ExportTool |
Import data from HDFS to a relational database |
3 |
Codegen |
CodeGenTool |
Obtain the data of a table in the database to generate Java and compress it into a jar package. |
4 |
Create-hive-table |
CreateHiveTableTool |
Create Hive table |
5 |
Eval |
EvalSqlTool |
View SQL Execution results |
6 |
Import-all-tables |
ImportAllTablesTool |
Import all tables in a database to HDFS |
7 |
Job |
JobTool |
|
8 |
List-databases |
ListDatabasesTool |
List all database names |
9 |
List-tables |
ListTablesTool |
List all tables in a database |
10 |
Merge |
MergeTool |
|
11 |
Metastore |
MetastoreTool |
|
12 |
Help |
HelpTool |
View help |
13 |
Version |
VersionTool |
View version |
Then list various Common parameters of Sqoop, and list their own parameters for the above 13 commands. Common Parameters of Sqoop are also divided into Common arguments.
Incrementalimport arguments
Outputline formatting arguments
Inputparsing arguments, Hive arguments
HBasearguments
GenericHadoop command-line arguments
1. Common arguments Common parameters, mainly for relational database connection Parameters
Serial number |
Parameters |
Description |
Example |
1 |
Connect |
URL for connecting to a relational database |
Jdbc: mysql: // localhost/sqoop_datas |
2 |
Connection-manager |
Connection Management class, generally not used |
|
3 |
Driver |
Connection driver |
|
4 |
Hadoop-home |
Hadoop directory |
/Home/hadoop |
5 |
Help |
View help information |
|
6 |
Password |
Password used to connect to a relational database |
|
7 |
Username |
User name for connecting to a relational database |
|
8 |
Verbose |
To view more information, you can actually lower the log level. |
This parameter is not followed by a value |
Importcontrol arguments:
Argument |
Description |
-- Append |
Append data to an existing dataset in HDFS |
-- As-avrodatafile |
Imports data to Avro Data Files |
-- As-sequencefile |
Imports data to SequenceFiles |
-- As-textfile |
Imports data as plain text (default) |
-- Boundary-query <statement> |
Boundary query to use for creating splits |
-- Columns <col, col, col…> |
Columns to import from table |
-- Direct |
Use direct import fast path |
-- Direct-split-size <n> |
Split the input stream everyNBytes when importing in direct mode |
-- Inline-lob-limit <n> |
Set the maximum size for an inline LOB |
-M, -- num-mappers <n> |
UseNMap tasks to import in parallel |
-E, -- query <statement> |
Import the resultsStatement. |
-- Split-by <column-name> |
Column of the table used to split work units |
-- Table <table-name> |
Table to read |
-- Target-dir <dir> |
HDFS destination dir |
-- Warehouse-dir <dir> |
HDFS parent for table destination |
-- Where <where clause> |
WHERE clause to use during import |
-Z, -- compress |
Enable compression |
-- Compression-codec <c> |
Use Hadoop codec (default gzip) |
-- Null-string <null-string> |
The string to be written for a null value for string columns |
-- Null-non-string <null-string> |
The string to be written for a null value for non-string columns |
Incrementalimport arguments:
Argument |
Description |
-- Check-column (col) |
Specifies the column to be examined when determining which rows to import. |
-- Incremental (mode) |
Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. |
-- Last-value (value) |
Specifies the maximum value of the check column from the previous import. |
Output lineformatting arguments:
Argument |
Description |
-- Enclosed-by <char> |
Sets a required field enclosing character |
-- Escaped-by <char> |
Sets the escape character |
-- Fields-terminated-by <char> |
Sets the field separator character |
-- Lines-terminated-by <char> |
Sets the end-of-line character |
-- Mysql-delimiters |
Uses MySQL's default delimiter set: fields:, lines: \ n escaped-by: \ optionally-enclosed-:' |
-- Optionally-enclosed-by <char> |
Sets a field enclosing character |
Hivearguments:
Argument |
Description |
-- Hive-home <dir> |
Override $ HIVE_HOME |
-- Hive-import |
Import tables into Hive (Uses Hive's default delimiters if none are set .) |
-- Hive-overwrite |
Overwrite existing data in the Hive table. |
-- Create-hive-table |
If set, then the job will fail if the target hive |
|
Table exits. By default this property is false. |
-- Hive-table <table-name> |
Sets the table name to use when importing to Hive. |
-- Hive-drop-import-delims |
Drops\ N,\ R, And\ 01From string fields when importing to Hive. |
-- Hive-delims-replacement |
Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive. |
-- Hive-partition-key |
Name of a hive field to partition are sharded on |
-- Hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
-- Map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
HBasearguments:
Argument |
Description |
-- Column-family <family> |
Sets the target column family for the import |
-- Hbase-create-table |
If specified, create missing HBase tables |
-- Hbase-row-key <col> |
Specifies which input column to use as the row key |
-- Hbase-table <table-name> |
Specifies an HBase table to use as the target instead of HDFS |
Codegeneration arguments:
Argument |
Description |
-- Bindir <dir> |
Output directory for compiled objects |
-- Class-name <name> |
Sets the generated class name. This overrides -- package-name. When combined with -- jar-file, sets the input class. |
-- Jar-file <file> |
Disable code generation; use specified jar |
-- Outdir <dir> |
Output directory for generated code |
-- Package-name <name> |
Put auto-generated classes in this package |
-- Map-column-java <m> |
Override default mapping from SQL type to Java type for configured columns. |
Sqoop details: click here
Sqoop: click here
Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop
[Hadoop] Detailed description of Sqoop Installation Process
Use Sqoop to export data between MySQL and HDFS Systems
Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving
Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)
Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive