A powerful tool for data exchange between HDFS and relational databases-a preliminary study of sqoop

Source: Internet
Author: User
Tags sqoop
Sqoop is a tool used for data transmission between hadoop and RDBMS. The configuration is relatively simple. Download the latest sqoop package from the apache website. : Www. apache. orgdistsqoop1.99.1 decompress the package to the server. The server requires jdk, hadoop, and hive. Configuration: confsqoop-env.sh #

Sqoop is a tool used for data transmission between hadoop and RDBMS. The configuration is relatively simple. Download the latest sqoop package from the apache website. : Http://www.apache.org/dist/ sqoop/1.99.1/decompress to the server. The server requires jdk, hadoop, and hive. Configuration: conf/sqoop-env.sh #

Sqoop is a tool used for data transmission between hadoop and RDBMS.
The configuration is relatively simple.
Download the latest sqoop package from the apache website.
: Http://www.apache.org/dist/sqoop/1.99.1/
Decompress the package to the server. The server requires jdk, hadoop, and hive.
Configuration:
Conf/sqoop-env.sh
# Set path to where bin/hadoop is available
Export HADOOP_HOME =/home/hadoop/hadoop-0.20.205.0
# Set the path to where bin/hive is available
Export HIVE_HOME =/home/hadoop/hive-0.8.1
At this time, we can perform the test. We primarily use hive for interaction. Actually, we submit data from a relational database to hive and save it to HDFS for big data computing.

Sqoop mainly includes the following commands or functions.

Codegen Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import- all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore run a standalone Sqoop metastore version Display version informationHere, we mainly use the import function. The command syntax of the export function is similar.

Example

./sqoop import --connect jdbc:mysql://lcoalhost:3306/dbname--username dbuser --password dbpassword --table tablename --hive-import --hive-table hivedb.hivetable --hive-drop-import-delims --hive-overwrite --num-mappers 6

The preceding command imports data from the tablename table in the dbname of the local database to the hivetable table of hivedb.
Some common parameters are not explained.

-Hive-import identifies the import address as hive
-Hive-table identifies the table information in hive
-Hive-drop-import-delims is important because data is imported from the database to HDFS. If special characters are contained, MR resolution is problematic, for example, in the database.
Fields of the text type may contain \ t, \ n, and other parameters. After this parameter is added, special characters are automatically processed.
-Hive-overwrite: if the original hive table already exists, the overwrite operation is performed.
-Num-mappers specifies the number of ER er tasks to execute this import.

Another important parameter-direct can be imported through the database's dump function, which provides better performance than the previous example, however, it cannot be used with the-hive-drop-import-delims Parameter Function. Therefore, you need to determine the command to be used based on your database.

The following is the import command of sqoop:

Argument Description
--connect Specify JDBC connect string
--connection-manager Specify connection manager class to use
--driver Manually specify JDBC driver class to use
--hadoop-home Override $ HADOOP_HOME
--help Print usage instructions
-P Read password from console
--password Set authentication password
--username Set authentication username
--verbose Print more information while working
--connection-param-file Optional properties file that provides connection parameters
Argument Description
--hive-home Override$HIVE_HOME
--hive-import Import tables into Hive (Uses Hive's default delimiters if none are set .)
--hive-overwrite Overwrite existing data in the Hive table.
--create-hive-table If set, then the job will fail if the target hive
Table exits. By default this property is false.
--hive-table Sets the table name to use when importing to Hive.
--hive-drop-import-delims Drops\ N,\ R, And\ 01From string fields when importing to Hive.
--hive-delims-replacement Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive.
--hive-partition-key Name of a hive field to partition are sharded on
--hive-partition-value String-value that serves as partition key for this imported into hive in this job.
--map-column-hive Override default mapping from SQL type to Hive type for configured columns.

The following are some examples.

Write Condition
Sqoop import-table test-columns "id, name"-where "id> 400 ″
Use dump
Sqoop import-connect jdbc: mysql: // server.foo.com/db-table bar-direct--default-character-set = latin1
Column Type redefinition
Sqoop import... -Map-column-java id = String, value = Integer
Define delimiter
Sqoop import-connect jdbc: mysql: // db.foo.com/corp-table EMPLOYEES-fields-terminated-by '\ t'-lines-terminated-by' \ N'-optionally-enclosed-'\"'

Original article address: Exploring sqoop, a tool for data exchange between HDFS and relational databases. Thank you for sharing it with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.