A powerful tool for data exchange between HDFS and relational databases-a preliminary study of sqoop

Last Update:2018-06-11 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sqoop is a tool used for data transmission between hadoop and RDBMS. The configuration is relatively simple. Download the latest sqoop package from the apache website. : Www. apache. orgdistsqoop1.99.1 decompress the package to the server. The server requires jdk, hadoop, and hive. Configuration: confsqoop-env.sh #

Sqoop is a tool used for data transmission between hadoop and RDBMS.
The configuration is relatively simple.
Download the latest sqoop package from the apache website.
: Http://www.apache.org/dist/sqoop/1.99.1/
Decompress the package to the server. The server requires jdk, hadoop, and hive.
Configuration:
Conf/sqoop-env.sh
# Set path to where bin/hadoop is available
Export HADOOP_HOME =/home/hadoop/hadoop-0.20.205.0
# Set the path to where bin/hive is available
Export HIVE_HOME =/home/hadoop/hive-0.8.1
At this time, we can perform the test. We primarily use hive for interaction. Actually, we submit data from a relational database to hive and save it to HDFS for big data computing.

Sqoop mainly includes the following commands or functions.

Codegen Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import- all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore run a standalone Sqoop metastore version Display version informationHere, we mainly use the import function. The command syntax of the export function is similar.

Example

./sqoop import --connect jdbc:mysql://lcoalhost:3306/dbname--username dbuser --password dbpassword --table tablename --hive-import --hive-table hivedb.hivetable --hive-drop-import-delims --hive-overwrite --num-mappers 6


The preceding command imports data from the tablename table in the dbname of the local database to the hivetable table of hivedb.
Some common parameters are not explained.
-Hive-import identifies the import address as hive
-Hive-table identifies the table information in hive
-Hive-drop-import-delims is important because data is imported from the database to HDFS. If special characters are contained, MR resolution is problematic, for example, in the database.
Fields of the text type may contain \ t, \ n, and other parameters. After this parameter is added, special characters are automatically processed.
-Hive-overwrite: if the original hive table already exists, the overwrite operation is performed.
-Num-mappers specifies the number of ER er tasks to execute this import.
Another important parameter-direct can be imported through the database's dump function, which provides better performance than the previous example, however, it cannot be used with the-hive-drop-import-delims Parameter Function. Therefore, you need to determine the command to be used based on your database.
The following is the import command of sqoop:
 
  
   
    
    
   
   
    
    Argument 
    Description 
    
   
   
    
     --connect   
    Specify JDBC connect string 
    
    
     --connection-manager   
    Specify connection manager class to use 
    
    
     --driver   
    Manually specify JDBC driver class to use 
    
    
     --hadoop-home     
    Override $ HADOOP_HOME 
    
    
     --help  
    Print usage instructions 
    
    
     -P  
    Read password from console 
    
    
     --password   
    Set authentication password 
    
    
     --username   
    Set authentication username 
    
    
     --verbose  
    Print more information while working 
    
    
     --connection-param-file   
    Optional properties file that provides connection parameters 
    
   
  
  
   
    
    
   
   
    
    Argument 
    Description 
    
   
   
    
     --hive-home     
    Override$HIVE_HOME  
    
    
     --hive-import  
    Import tables into Hive (Uses Hive's default delimiters if none are set .) 
    
    
     --hive-overwrite  
    Overwrite existing data in the Hive table. 
    
    
     --create-hive-table  
    If set, then the job will fail if the target hive 
    
    
    Table exits. By default this property is false. 
    
    
     --hive-table   
    Sets the table name to use when importing to Hive. 
    
    
     --hive-drop-import-delims  
    Drops\ N,\ R, And\ 01From string fields when importing to Hive. 
    
    
     --hive-delims-replacement  
    Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive. 
    
    
     --hive-partition-key  
    Name of a hive field to partition are sharded on 
    
    
     --hive-partition-value   
    String-value that serves as partition key for this imported into hive in this job. 
    
    
     --map-column-hive   
    Override default mapping from SQL type to Hive type for configured columns. 
    
   
 
The following are some examples.
Write Condition
Sqoop import-table test-columns "id, name"-where "id> 400 ″
Use dump
Sqoop import-connect jdbc: mysql: // server.foo.com/db-table bar-direct--default-character-set = latin1
Column Type redefinition
Sqoop import... -Map-column-java id = String, value = Integer
Define delimiter
Sqoop import-connect jdbc: mysql: // db.foo.com/corp-table EMPLOYEES-fields-terminated-by '\ t'-lines-terminated-by' \ N'-optionally-enclosed-'\"'
 Original article address: Exploring sqoop, a tool for data exchange between HDFS and relational databases. Thank you for sharing it with me.

Argument	Description
`--connect`	Specify JDBC connect string
`--connection-manager`	Specify connection manager class to use
`--driver`	Manually specify JDBC driver class to use
`--hadoop-home`	Override $ HADOOP_HOME
`--help`	Print usage instructions
`-P`	Read password from console
`--password`	Set authentication password
`--username`	Set authentication username
`--verbose`	Print more information while working
`--connection-param-file`	Optional properties file that provides connection parameters

Argument	Description
`--hive-home`	Override`$HIVE_HOME`
`--hive-import`	Import tables into Hive (Uses Hive's default delimiters if none are set .)
`--hive-overwrite`	Overwrite existing data in the Hive table.
`--create-hive-table`	If set, then the job will fail if the target hive
Table exits. By default this property is false.
`--hive-table`	Sets the table name to use when importing to Hive.
`--hive-drop-import-delims`	Drops\ N,\ R, And\ 01From string fields when importing to Hive.
`--hive-delims-replacement`	Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive.
`--hive-partition-key`	Name of a hive field to partition are sharded on
`--hive-partition-value`	String-value that serves as partition key for this imported into hive in this job.
`--map-column-hive`	Override default mapping from SQL type to Hive type for configured columns.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More