Sqoop Common Command Finishing __sqoop

Source: Internet
Author: User
Tags sqoop
These are from the official website of Sqoop, is 1.4.3 version of the document, if there is a mistake, I hope you will correct me. 1. Import data using Sqoop
Sqoop import--connect jdbc:mysql://localhost/db--username foo--table TEST
2. Account password
Sqoop import--connect jdbc:mysql://database.example.com/employees \
    --username Aaron--password 12345
3. Drive
Sqoop import--driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
    --connect <connect-string> ...
4. How to write SQL statement Import
Sqoop import \
  --query ' SELECT a.*, b.* from-a JOIN b on (a.id = = b.id) WHERE $CONDITIONS ' \
  --split-by a.id Get-dir/user/foo/joinresults
If you are importing sequentially, you can open only one thread
Sqoop import \
  --query ' SELECT a.*, b.* from-a JOIN b on (a.id = = b.id) WHERE $CONDITIONS ' \
  M 1--target-dir/us Er/foo/joinresults
If you have to use single quotes in a WHERE statement, you can write like this, "SELECT * from x WHERE a= ' foo ' and \ $CONDITIONS" 5. The 1.4.3 version of Sqoop does not support complex SQL statements and does not support or statement 6. --split-by <column-name>
The default is the primary key, assuming that there are 100 rows of data, it executes that select * from sometable WHERE ID >= lo and ID < Hi, with (lo, HI) is divided into 4 import (0,250), (250,500), (5 00,750), (750,1001) If the field does not reach the actual partition effect, you can use a different field. If you do not have an indexed column or a table that combines primary keys, you need to manually set a partitioned column. 7.--direct is to take advantage of some of the database itself to provide fast import export data tools, such as MySQL's mysqldump performance than JDBC better, but do not know the large object of the column, when used, those fast import tools of the client must shell script directory. 8. Import data into the HDFs directory, this command will write the data to the/shared/foo/directory.
Sqoop import--connnect <connect-str>--table foo--warehouse-dir/shared \
Or
Sqoop import--connnect <connect-str>--table foo--target-dir/dest \
9. Pass parameters to the Quick Import tool, use--start with the following command passed to MySQL the default character set is Latin1.
Sqoop Import--connect jdbc:mysql://server.foo.com/db--table bar \
    --direct----default-character-set=latin1
10. Convert to Object--map-column-java <mapping> Convert to Java data type--map-column-hive <mapping> goto hive Data Type 11. Add Import
--check-column (COL) specifies the column to is examined when determining rows to import. --incremental (Mode) specifies how Sqoop determines which rows are new.   Legal values for mode include append and lastmodified. --last-value (value) specifies the maximum value of the check column from the previous import. The addition of the import support two modes append and lastmodified, specified with--incremental.
12. Importing large objects, such as BLOBs and CLOB columns, requires special handling, and large objects smaller than 16MB can be stored with other data, over which they are stored in _lobs subdirectories. They use a storage format optimized for large objects that can store 2^63 bytes of data, and we can use the--inline-lob-limit parameter to specify the maximum limit for each lob file. If set to 0, the large object uses external storage. 13. The sentence below the separator and the transfer character
  Some string, with a comma. Another "string with quotes" uses this command to import $ sqoop import--fields-terminated-by,--escaped-by \--enclosed-by ' \ ' ...   There will be the following result "Some string, with a comma.", "1", "2", "3" ... "Another \" string with Quotes\ "", "4", "5", "6" ...   Use this command to import $ sqoop import--optionally-enclosed-by ' \ ' (the rest as above) ...   "Some string, with a comma.", 1,2,3 ... "Another \" string with Quotes\ "", 4,5,6 ... 14.hive Import Parameters--hive-home <dir>   Rewriting $hive_home--hive-import           insert data into hive, use H   Ive default separator--hive-overwrite   Rewrite insert--create-hive-table   build table, if the table already exists, the operation will be an error. --hive-table <table-name>   set to hive table name--hive-drop-import-delims   Import to hive delete \ n, \ r, and \01 - Hive-delims-replacement   Replace with custom characters when importing to hive \ r, and \01 --hive-partition-key          hive Partition's key--hive-partition-value <v>  hive partition's value--map-column-hive <map>       &N Bsp   type matching, the SQL type corresponds to the hive type 15.hive NULL processing Sqoop automatically converts NULL to null processing, but the default is to represent NULL in hive, because the advance processing will not take effect. We need to use--null-string and--null-non-string to handle null values and CONVERT \ n to \\n.
Sqoop Import  ...--null-string ' \\n '--null-non-string ' \\n '
16. Import data into the HBase import when adding--hbase-table, it will import the content into the HBase, the default is to use the primary key as the split column. You can also specify by--hbase-row-key that the column family is specified by--column-family, and it does not support--direct. If you do not want to manually build tables or column families, use the--hbase-create-table parameter. 17. Code generation parameters, not read--bindir <dir> Output directory for compiled objects--class-name <name> Sets the Generate D class name. This overrides--package-name.   When combined with--jar-file, sets the input class. --jar-file <file> Disable code generation; Use specified jar--outdir <dir> Output directory for generated code--package-name <name> put Auto-gene Rated classes in the package--map-column-java <m> Override default mapping from SQL type to Java type for CONFI gured columns. 18. Configure common parameters via profile Conf/sqoop-site.xml
<property>
    <name>property.name</name>
    <value>property.value</value>
 </property>
If you don't configure it here, you need to write a command like this.
Sqoop import-d Property.name=property.value ...
19. Two special parameters sqoop.bigdecimal.format.string large decimal is saved as String, or 1E7 if saved as String or 0.0000007. Sqoop.hbase.add.row.key whether the column as Rowkey is also added to the row data, the default is False. 20. Examples
#指定列 $ sqoop Import--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \--columns "Employee_id,first_name,last_ Name,job_title "#使用8个线程 $ sqoop Import--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \ M 8 #快速模式 $ sqoop I Mport--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \--direct #使用sequencefile作为存储方式 $ sqoop Import--conne CT Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \--class-name com.foocorp.Employee--as-sequencefile #分隔符 $ sqoop I Mport--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \--fields-terminated-by ' \ t '--lines-terminated-by ' \ n
    ' \--optionally-enclosed-by ' \ ' #导入到hive $ sqoop import--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \ --hive-import #条件过滤 $ sqoop Import--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \--where "Start_date & Gt ' 2010-01-01 ' "#用dept_id作为分个字段 $ sqoop Import--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \--split-by dep t_id #追加导入 $ sqoop IMport--connect jdbc:mysql://db.foo.com/somedb--table sometable \--where "ID > 100000"--target-dir/incremental_ DataSet--append
21. Import all tables sqoop-import-all-tables each table has a primary key and cannot use the Where condition filter
Sqoop import-all-tables--connect Jdbc:mysql://db.foo.com/corp
22.export we use Sqoop-export to insert data, if the data already exists, the insertion will fail.
If we use--update-key, it will assume that every data is updated, for example, we use the following statement:
Sqoop-export--table foo--update-key ID--export-dir/path/to/data--connect ...
UPDATE foo SET msg= ' This is a test ', bar=42 WHERE id=0;
UPDATE foo SET msg= ' Some more data ', bar=100 WHERE id=1;
...
This will not be an error if you can't find it. 23. If there is an update, it does not exist to insert
Add this parameter to--update-mode Allowinsert. 24. Handling of transactions
It statement inserts 100 data at a time, then submits every 100 statement, so it submits 10,000 data at a time. 25. Examples
$ sqoop Export--connect jdbc:mysql://db.example.com/foo--table bar  \
    --export-dir/results/bar_data

$ Sqoop export--connect jdbc:mysql://db.example.com/foo--table bar  \
    --export-dir/results/bar_data-- Validate

$ sqoop export--connect jdbc:mysql://db.example.com/foo--call barproc \

26.Validate It is used to compare the number of source data and target data it has three interfaces Validator.

It has three interface Validator. Property:validator description:driver for validation, must implement Org.apache.sqoop.val Idation.
Validator supported values:the value has to be a fully qualified class name. Default value:org.apache.sqoop.validation.RowCountValidator Validation Threshold Property:validation-thresho LD Description:drives The decision based on the validation meeting of the threshold or not. Must implement Org.apache.sqoop.validation.ValidationThreshold supported Values:the value has to be a f
Ully qualified class name. Default value:org.apache.sqoop.validation.AbsoluteValidationThreshold validation Failure Handler Property:va Lidation-failurehandler description:responsible for handling failures, must implement ORG.APACHE.S
Qoop.validation.ValidationFailureHandler supported values:the value has to be a fully qualified class name. Default Value:org.apache.sqoop.vaLidation. Logonfailurehandler
27.validate Example

$ sqoop Import--connect Jdbc:mysql://db.foo.com/corp  \
    --table EMPLOYEES--validate

$ sqoop Export-- Connect Jdbc:mysql://db.example.com/foo--table bar  \
    --export-dir/results/bar_data--validate

$ sqoop Import--connect Jdbc:mysql://db.foo.com/corp--table EMPLOYEES \
    --validate--validator Org.apache.sqoop.validation.RowCountValidator \
    --validation-threshold \
          Org.apache.sqoop.validation.AbsoluteValidationThreshold \
    --validation-failurehandler \
          Org.apache.sqoop.validation.LogOnFailureHandler

The 28.sqoop job saves frequently used jobs so that the next quick call

--create <job-id> Create a new job.

--delete <job-id> Delete Job

--exec <job-id> Execution Job

--show <job-id> Display Job parameters

--list List all Job

29. Examples

#创建job
$ sqoop Job--create myjob-import--connect jdbc:mysql://example.com/db \
    --table mytable
#列出所有job
$ sqoop Job--list
#查看job
$ sqoop Job--show myjob
 job:myjob
 Options:
 ------ ----------------------
 Direct.import = false
 Codegen.input.delimiters.record = 0
 Hdfs.append.dir = False
 db.table = mytable
 ...
#执行job
$ sqoop Job--exec myjob
10/08/19 13:08:45 INFO tool. Codegentool:beginning code generation ...
#重写参数
$ sqoop job--exec myjob----username someuser-p
Enter password:
...

30. Other common tools

Sqoop-metastore

Sqoop-merge

#合并两个目录
 $ sqoop Merge--new-data newer--onto older--target-dir merged \
   --jar-file Datatypes.jar--class-name Fo o--merge-key ID

sqoop-

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.