Sqoop use examples to explain

Source: Internet
Author: User
Tags min table name sqoop

Original Blog Address: Http://blog.csdn.net/evankaka

Abstract: This paper mainly discusses the author in the use of sqoop process of some examples of the First, overview and basic principles

The

         Apache Sqoop (Sql-to-hadoop) project is designed to facilitate efficient big data exchange between RDBMS and Hadoop. With the help of Sqoop, users can easily import data from relational databases into Hadoop and its associated systems (such as hbase and Hive), as well as extract and export data from Hadoop systems into relational databases. Therefore, it can be said that Sqoop is a bridge, connected to the relational database and Hadoop. One of the highlights of Qoop is the ability to import data from a relational database into HDFs through the mapreduce of Hadoop. The Sqoop architecture is simple enough to integrate hive, HBase, and Oozie to transmit data through map-reduce tasks, providing concurrency and fault tolerance. The basic workflow of Sqoop is shown in the following illustration:

from relational database to Hive/hbase
from hive/hbase to relational database
Sqoop in import, you need to set split-by parameters. The sqoop is segmented according to the values of the split-by parameters, and then the segmented regions are assigned to different maps. The value of one row of rows fetched in the database is processed in each map and written to HDFs (it is also known that the imported transaction is in the Mapper Task Unit). At the same time split-by according to different parameter types have different segmentation methods, such as a simple int type, Sqoop will take the maximum and minimum split-by field values, and then according to the incoming Num-mappers to determine the division of several areas. For example, select Max (split_by), Min (split-by) from the obtained Max (split-by) and Min (split-by) respectively is 1000 and 1, and Num-mappers is 2, it will be divided into two regions (1,500 ) and (501-100), it will be divided into 2 sql to the 2 map to import operations, respectively, select xxx from table where split-by>=1 and split-by<500 and select XXX from Table where split-by>=501 and split-by<=1000. Finally, each map obtains data from its own SQL for import work.

ii. Examples of use

The following example shows how to use 1, create a table

Sqoop can automatically map the table structure of a MySQL table or other relational database to a hive table. The mapped table structure is in textfile format. Let's start with a script.

[plain] view plain copy #!/bin/sh. ~/.BASHRC host= ' xx.xx.xx.xx ' database= ' cescore ' user= ' xxxx ' password= ' xxxx ' mysqltable= ' yyyyyyy ' hivedb= ' OD S_uba ' hivetable= ' yyyyyy ' sqoop create-hive-table \--connect jdbc:mysql://${host}:3306/${database}--username ${ User}--password ${password} \--table ${mysqltable} \--hive-table ${hivedb}.${hivetable} \--hive-overwrite--hiv E-partition-key req_date \ RM *.java

--connect indicates the type of connection, and all basic relational databases can

--table indicates the table name to source table

--hive-table indicates the name of the hive table to be created

--hive-overwrite Indicates whether to overwrite the insert (delete all the original partition data, then insert)

--hive-partition-key fields that indicate partitions

We can go to hive to see the corresponding table structure:


2. From MySQL Guide data to Hive

Sqoop can lead data from a relational database to hive or hive to a relational database, and first look at an instance from MYDQL to hive, the sample script is as follows

Let's start with a script.

[Plain]  View Plain  copy   #!/bin/sh   . ~/.bashrc      host= ' xx.xx.xx.xx '    database= ' xxxcc '    user= ' xxxx '    password= ' xxxx '    mysqltable= ' Sys_ Approve_reject_code '    hivedb= ' Ods_uba '    hivetable= ' Sys_approve_reject_code '    Tmpdir= '/user/hive/warehouse/ods_uba.db/' ${hivetable}      sqoop import --connect  jdbc:mysql://${host}:3306/${database} --username ${user} --password ${password}  \  --query  "select * from " ${database} "." ${mysqltable} " where 1 = 1 and \ $CONDITIONS"  \  --hive-import  --hive-table ${hiveDB}.${hiveTable} --target-dir ${tmpDir} --delete-target-dir  --split-by approve_reject_code \  --hive-overwrite \  --null-string  ' \ \ n '  --null-non-string  ' \\n ' &Nbsp; 

In general, if the table in hive does not exist, you can also use the step of creating the table without performing the first step, because if it finds that the hive table does not exist, it will help you create the table, and the data structure of the hive table is the same as MySQL. However, it is important to note that if the hive table is created by your own interactive command line through hive, and the storage format is set to Orc, then using the Sqoop data to this table can be imported successfully, but you will find that it cannot be queried, or the query is garbled. Can only be set to Textfile.

The above SQL is to import the full amount of the table into hive, without partitioning, each derivative will delete the original data, because this is a dictionary table, so you need to do so. Take a look at the meaning of each field

--query indicates the SQL statement for the query, note that the master has added an and \ $conditions, which is required if there is a where condition

--hive-table indicates the target table name

--target-dir indicates the HDFs path of the target table

--delete-target-dir Deleting destination Hfds path data

--split-by indicates the Shuffle field, which is usually the primary key

--hive-overwrite Delete old data before re-inserting

--null-string---null string and processing, mapped to NULL in hive

--null-non-string--null non-string and processing, mapped to NULL in hive 3. From Phoenix Guide data to Hive

Here's a more sophisticated script that will guide data from Phoenix and divide the day's data into 24 copies [plain] view plain copy #!/bin/sh

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.