Sqoop is a tool used for data transmission between hadoop and RDBMS. The configuration is relatively simple. Download the latest sqoop package from the apache website. : Www. apache. orgdistsqoop1.99.1 decompress the package to the server. The server requires jdk, hadoop, and hive. Configuration: confsqoop-env.sh #
Sqoop is a tool used for data transmission between hadoop and RDBMS. The configuration is relatively simple. Download the latest sqoop package from the apache website. : Http://www.apache.org/dist/ sqoop/1.99.1/decompress to the server. The server requires jdk, hadoop, and hive. Configuration: conf/sqoop-env.sh #
Sqoop is a tool used for data transmission between hadoop and RDBMS.
The configuration is relatively simple.
Download the latest sqoop package from the apache website.
: Http://www.apache.org/dist/sqoop/1.99.1/
Decompress the package to the server. The server requires jdk, hadoop, and hive.
Configuration:
Conf/sqoop-env.sh
# Set path to where bin/hadoop is available
Export HADOOP_HOME =/home/hadoop/hadoop-0.20.205.0
# Set the path to where bin/hive is available
Export HIVE_HOME =/home/hadoop/hive-0.8.1
At this time, we can perform the test. We primarily use hive for interaction. Actually, we submit data from a relational database to hive and save it to HDFS for big data computing.
Sqoop mainly includes the following commands or functions.
Codegen Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import- all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore run a standalone Sqoop metastore version Display version informationHere, we mainly use the import function. The command syntax of the export function is similar.
Example
./sqoop import --connect jdbc:mysql://lcoalhost:3306/dbname--username dbuser --password dbpassword --table tablename --hive-import --hive-table hivedb.hivetable --hive-drop-import-delims --hive-overwrite --num-mappers 6
The preceding command imports data from the tablename table in the dbname of the local database to the hivetable table of hivedb.
Some common parameters are not explained.-Hive-import identifies the import address as hive
-Hive-table identifies the table information in hive
-Hive-drop-import-delims is important because data is imported from the database to HDFS. If special characters are contained, MR resolution is problematic, for example, in the database.
Fields of the text type may contain \ t, \ n, and other parameters. After this parameter is added, special characters are automatically processed.
-Hive-overwrite: if the original hive table already exists, the overwrite operation is performed.
-Num-mappers specifies the number of ER er tasks to execute this import.
Another important parameter-direct can be imported through the database's dump function, which provides better performance than the previous example, however, it cannot be used with the-hive-drop-import-delims Parameter Function. Therefore, you need to determine the command to be used based on your database.
The following is the import command of sqoop:
Argument |
Description |
--connect |
Specify JDBC connect string |
--connection-manager |
Specify connection manager class to use |
--driver |
Manually specify JDBC driver class to use |
--hadoop-home |
Override $ HADOOP_HOME |
--help |
Print usage instructions |
-P |
Read password from console |
--password |
Set authentication password |
--username |
Set authentication username |
--verbose |
Print more information while working |
--connection-param-file |
Optional properties file that provides connection parameters |
Argument |
Description |
--hive-home |
Override$HIVE_HOME |
--hive-import |
Import tables into Hive (Uses Hive's default delimiters if none are set .) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
Table exits. By default this property is false. |
--hive-table |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops\ N,\ R, And\ 01From string fields when importing to Hive. |
--hive-delims-replacement |
Replace\ N,\ R, And\ 01From string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive |
Override default mapping from SQL type to Hive type for configured columns. |
The following are some examples.
Write Condition
Sqoop import-table test-columns "id, name"-where "id> 400 ″
Use dump
Sqoop import-connect jdbc: mysql: // server.foo.com/db-table bar-direct--default-character-set = latin1
Column Type redefinition
Sqoop import... -Map-column-java id = String, value = Integer
Define delimiter
Sqoop import-connect jdbc: mysql: // db.foo.com/corp-table EMPLOYEES-fields-terminated-by '\ t'-lines-terminated-by' \ N'-optionally-enclosed-'\"'
Original article address: Exploring sqoop, a tool for data exchange between HDFS and relational databases. Thank you for sharing it with me.