Sqoop usage and introduction

Last Update:2014-05-08 Source: Internet

Author: User

Tags hadoop ecosystem sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Sqoop tool connects to a relational database in a Hadoop environment and serves as a bridge to the hadoop storage system. It supports the import of multiple relational data sources and hive, hdfs, and hbase. Generally, relational data tables exist in the backup environment of the online environment. data needs to be imported every day. sqoop can import the entire table based on the amount of data per day, the full table can be imported if the data volume generated every day is not large, but sqoop also provides the incremental data import mechanism.

The following describes several common sqoop commands and some parameters:

Serial number	Command/command	Class	Description
1	Impor	ImportTool	Import data from a relational database (from a table or query statement) to HDFS
2	Export	ExportTool	Import data from HDFS to a relational database
3	Codegen	CodeGenTool	Obtain the data of a table in the database to generate Java and compress it into a jar package.
4	Create-hive-table	CreateHiveTableTool	Create Hive table
5	Eval	EvalSqlTool	View SQL Execution results
6	Import-all-tables	ImportAllTablesTool	Import all tables in a database to HDFS
7	Job	JobTool
8	List-databases	ListDatabasesTool	List all database names
9	List-tables	ListTablesTool	List all tables in a database
10	Merge	MergeTool
11	Metastore	MetastoreTool
12	Help	HelpTool	View help
13	Version	VersionTool	View version

Then list various Common parameters of Sqoop, and list their own parameters for the above 13 commands. Common Parameters of Sqoop are also divided into Common arguments.

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments, Hive arguments

HBasearguments

GenericHadoop command-line arguments

1. Common arguments Common parameters, mainly for relational database connection Parameters

Serial number	Parameters	Description	Example
1	Connect	URL for connecting to a relational database	Jdbc: mysql: // localhost/sqoop_datas
2	Connection-manager	Connection Management class, generally not used
3	Driver	Connection driver
4	Hadoop-home	Hadoop directory	/Home/hadoop
5	Help	View help information
6	Password	Password used to connect to a relational database
7	Username	User name for connecting to a relational database
8	Verbose	To view more information, you can actually lower the log level.	This parameter is not followed by a value

Importcontrol arguments:

Argument	Description
-- Append	Append data to an existing dataset in HDFS
-- As-avrodatafile	Imports data to Avro Data Files
-- As-sequencefile	Imports data to SequenceFiles
-- As-textfile	Imports data as plain text (default)
-- Boundary-query <statement>	Boundary query to use for creating splits
-- Columns <col, col, col…>	Columns to import from table
-- Direct	Use direct import fast path
-- Direct-split-size <n>	Split the input stream everyNBytes when importing in direct mode
-- Inline-lob-limit <n>	Set the maximum size for an inline LOB
-M, -- num-mappers <n>	UseNMap tasks to import in parallel
-E, -- query <statement>	Import the resultsStatement.
-- Split-by <column-name>	Column of the table used to split work units
-- Table <table-name>	Table to read
-- Target-dir <dir>	HDFS destination dir
-- Warehouse-dir <dir>	HDFS parent for table destination
-- Where <where clause>	WHERE clause to use during import
-Z, -- compress	Enable compression
-- Compression-codec <c>	Use Hadoop codec (default gzip)
-- Null-string <null-string>	The string to be written for a null value for string columns
-- Null-non-string <null-string>	The string to be written for a null value for non-string columns

Incrementalimport arguments:

Argument	Description
-- Check-column (col)	Specifies the column to be examined when determining which rows to import.
-- Incremental (mode)	Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
-- Last-value (value)	Specifies the maximum value of the check column from the previous import.

Output lineformatting arguments:

Argument	Description
-- Enclosed-by <char>	Sets a required field enclosing character
-- Escaped-by <char>	Sets the escape character
-- Fields-terminated-by <char>	Sets the field separator character
-- Lines-terminated-by <char>	Sets the end-of-line character
-- Mysql-delimiters	Uses MySQL's default delimiter set: fields:, lines: \ n escaped-by: \ optionally-enclosed-:'
-- Optionally-enclosed-by <char>	Sets a field enclosing character

Hivearguments:

Argument	Description
-- Hive-home <dir>	Override $ HIVE_HOME
-- Hive-import	Import tables into Hive (Uses Hive's default delimiters if none are set .)
-- Hive-overwrite	Overwrite existing data in the Hive table.
-- Create-hive-table	If set, then the job will fail if the target hive
	Table exits. By default this property is false.
-- Hive-table <table-name>	Sets the table name to use when importing to Hive.
-- Hive-drop-import-delims	Drops\ N,\ R, And\ 01From string fields when importing to Hive.
-- Hive-delims-replacement	Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive.
-- Hive-partition-key	Name of a hive field to partition are sharded on
-- Hive-partition-value <v>	String-value that serves as partition key for this imported into hive in this job.
-- Map-column-hive <map>	Override default mapping from SQL type to Hive type for configured columns.

HBasearguments:

Argument	Description
-- Column-family <family>	Sets the target column family for the import
-- Hbase-create-table	If specified, create missing HBase tables
-- Hbase-row-key <col>	Specifies which input column to use as the row key
-- Hbase-table <table-name>	Specifies an HBase table to use as the target instead of HDFS

Codegeneration arguments:

Argument	Description
-- Bindir <dir>	Output directory for compiled objects
-- Class-name <name>	Sets the generated class name. This overrides -- package-name. When combined with -- jar-file, sets the input class.
-- Jar-file <file>	Disable code generation; use specified jar
-- Outdir <dir>	Output directory for generated code
-- Package-name <name>	Put auto-generated classes in this package
-- Map-column-java <m>	Override default mapping from SQL type to Java type for configured columns.

Sqoop details: click here
Sqoop: click here

Implement data import between Mysql, Oracle, and HDFS/Hbase through Sqoop

[Hadoop] Detailed description of Sqoop Installation Process

Use Sqoop to export data between MySQL and HDFS Systems

Hadoop Oozie learning notes Oozie does not support Sqoop Problem Solving

Hadoop ecosystem construction (hadoop hive hbase zookeeper oozie Sqoop)

Full history of Hadoop learning-use Sqoop to import MySQL Data to Hive

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More