Sqoop1.99.4 Java API Practices

Source: Internet
Author: User
Tags sqoop

Software version:

sqoop:1.99.4;jdk:1.7;hadoop2.2;

Environment:

1. Hadoop cluster:

Node12:namenode, ResourceManager, Jobhistoryserver, DataNode, NodeManager; 1.5G memory +10g HDD +1 core

Node13:secondarynamenode, DataNode, nodemanager;1.5g memory +10g HDD +1 core

2. Sqoop Server:sqoop Server is deployed on NODE13;
3. IntelliJ idea configuration Download sqoop1.99.4 compressed package, unzip;

(1) Locate Sqoop-1.99.4-bin-hadoop200\server\webapps\sqoop.war, and then unzip the Sqoop.war file to find:

Sqoop-1.99.4-bin-hadoop200\server\webapps\sqoop\web-inf\lib directory, copy the following LIB package to the project build path (non-operational must);


(2) Locate the directory Sqoop-1.99.4-bin-hadoop200\shell\lib, copy the following LIB package to the project build path (run must)


Sqoop Simple concept:

1. Sqoop connector:

Sqoop connector, after Sqoop server starts, use the Client input command: Show connector, you can see Sqoop current connector as follows:


2. Sqoop Link:

A link needs a connector support, so the link's establishment needs to specify a connector (specify its ID, which can be seen from the code below); link is a layer of connector package can set parameters, such as for MySQL, You can set the MySQL driver name, link string, user name, password and so on, for HDFs, you can set the URI; The parameters you can set are as follows:

Hdfs:

Link configuration:linkconfig.uri,hdfs uri:nullfrom Job configuration:fromjobconfig.inputdirectory,input directory: Nulltojob configuration:tojobconfig.outputformat,output format:nulltojobconfig.compression,compression Format: Nulltojobconfig.customcompression,custom compression Format:nulltojobconfig.outputdirectory,output Directory:null


Jdbc:

Link configuration:linkconfig.jdbcdriver,jdbc Driver class:nulllinkconfig.connectionstring,jdbc Connection String: Nulllinkconfig.username,username:nulllinkconfig.password,password:nulllinkconfig.jdbcproperties,jdbc Connection Properties:nullfrom Database Configuration:fromjobconfig.schemaname,schema name:nullfromjobconfig.tablename,table Name:nullfromjobconfig.sql,table SQL statement:nullfromjobconfig.columns,table column names: Nullfromjobconfig.partitioncolumn,partition column Name:nullfromjobconfig.allownullvalueinpartitioncolumn,null Value allowed for the partition column:nullfromjobconfig.boundaryquery,boundary Query:nullto database configuration:to Jobconfig.schemaname,schema name:nulltojobconfig.tablename,table name:nulltojobconfig.sql,table SQL statement: nulltojobconfig.columns,table column Names:nulltojobconfig.stagetablename,stage Table name: Nulltojobconfig.shouldclearstagetable,should Clear Stage Table:null

Set link here, temporarily do not specify whether it is the source or the purpose (Sqoop data transmission is from a source to a purpose, and from to);

3. Sqoop Job:

The establishment of the Sqoop task requires the designation of two things, one: specifying the From and To,from and to only need to provide their corresponding link ID, and the other is to specify the specific parameters from and to (here the parameter refers to the larger variables, such as the input directory in HDFs, Table name or column string in JDBC, etc.);

Sqoop Java API:

0. Establish connector (this does not need to be established, after the configuration of the Sqoop will naturally have)

1. Establish link

1) before establishing link, check the existing link in the Sqoop client first:

Show link, you can see there is no link;

2) Create a link to HDFs:

        long Connectorid = 1;//        MLink link = createhdfslink (client,connectorid);
3) Create a JDBC Link: (Take MySQL for example)

Long connectorid=2;//        MLink link = createmysqllink (client,connectorid);
4) View, Show Link:

2. Set up tasks:

0) View existing tasks: Show job, you can see there is no (default not)

1) Create job:

Set task//        long fromlinkid=1;//        long tolinkid=2;//        string input= "/user/fansy/name.data";//        string Tablename= "Test_name_age";//        String columns= "id,name,age";//        int mappers = 2;//        mjob job = Createjob ( client,fromlinkid,tolinkid,input,tablename,columns,mappers);//        System.out.println (job.getenabled ());
2) View tasks:


3. Start the task:

Start Task//        long jobId =1;//        startjob (client,jobid);

After you start a task, you can use the command show job--jid to view the status of the task, or on port 8088 (Hadoop task port), view the task, or call the function to view the status of the task.


All the code looks like this:

Package Sqoop;import Org.apache.sqoop.client.sqoopclient;import Org.apache.sqoop.model.*;import Org.apache.sqoop.submission.counter.counter;import Org.apache.sqoop.submission.counter.countergroup;import Org.apache.sqoop.submission.counter.counters;import Org.apache.sqoop.validation.status;import java.util.List; Import java.util.resourcebundle;/** * Created by Fansy on 2016/3/7. */public class Sqoop1_99_4_test {public static void main (string[] args) {String url = "Http://node13:12000/sqoo        p/";        Sqoopclient client = new sqoopclient (URL);        Establish link connection//long Connectorid = 1;//MLink link = createhdfslink (client,connectorid);        Long connectorid=2;//MLink link = createmysqllink (client,connectorid); Listinputandoutputparameters (Client,connectorid);//Set up a task//long fromlinkid=1;//long tolinkid=2;//St   Ring input= "/user/fansy/name.data";//String Tablename= "test_name_age";//String columns= "name,age";//     int mappers = 2;//Mjob job = Createjob (client,fromlinkid,tolinkid,input,tablename,columns,mappers);//    System.out.println (job.getenabled ());//Start task//Long jobId =1;//startjob (client,jobid);            } static void describe (list<mconfig> configs, ResourceBundle Resource) {for (Mconfig config:configs) {            System.out.println (Resource.getstring (Config.getlabelkey ()) + ":");            list<minput<?>> inputs = config.getinputs (); for (Minput input:inputs) {System.out.println (Input.getname () + "," +resource.getstring (Input.getlabelkey ()            ) + ":" + input.getvalue ());        } System.out.println (); }}/** * Print parameters for each connector * @param client * @param connectorid */static void Listinputandoutput Parameters (sqoopclient Client,long connectorid) {//String URL = "http://node13:12000/sqoop/";//Sqoopclient Client = new Sqoopclient (URL);//LonG Connectorid = Link.getconnectorid ();//link config for connector describe (Client.getconnector (connectorid). Getlin KConfig (). Getconfigs (), Client.getconnectorconfigbundle (Connectorid));//from the job config for connector describe (CLI Ent.getconnector (Connectorid). Getfromconfig (). Getconfigs (), Client.getconnectorconfigbundle (connectorId));/To Job config for the connector describe (Client.getconnector (connectorid). Gettoconfig (). Getconfigs (), Client.getconnec        Torconfigbundle (Connectorid));      Create a placeholder for link}/** * build mysql link * @param client * @param connectorid * @return */Static MLink Createmysqllink (sqoopclient client,long connectorid) {MLink link = client.createlink (connec        TORID);        Link.setname ("MySQL");        Link.setcreationuser ("Fansy"); Mlinkconfig linkconfig = Link.getconnectorlinkconfig ();//fill in the link config values linkconfig.getstringinput ( "Linkconfig.connectionstring"). SetvAlue ("Jdbc:mysql://192.168.0.1/test_fz");        Linkconfig.getstringinput ("Linkconfig.jdbcdriver"). SetValue ("Com.mysql.jdbc.Driver");        Linkconfig.getstringinput ("Linkconfig.username"). SetValue ("Fansy"); Linkconfig.getstringinput ("Linkconfig.password"). SetValue ("Fansy");//Save the link object that is filled Status s        tatus = Client.savelink (link);        if (Status.canproceed ()) {System.out.println ("Created link with link Id:" + Link.getpersistenceid ());        } else {System.out.println ("Something went wrong creating the link");    } return link; /** * Create HDFS link * @param client * @param connectorid * @return */static MLink Createhdfslin        K (sqoopclient Client,long connectorid) {MLink link = client.createlink (connectorid);        Link.setname ("HDFs");        Link.setcreationuser ("Fansy"); Mlinkconfig linkconfig = Link.getconnectorlinkconfig ();//fill in the link config values LinKconfig.getstringinput ("Linkconfig.uri"). SetValue ("hdfs://node12:8020");//Save the link object that was filled Sta        Tus status = Client.savelink (link);        if (Status.canproceed ()) {System.out.println ("Created link with link Id:" + Link.getpersistenceid ());        } else {System.out.println ("Something went wrong creating the link");    } return link;     /** * Build HDFs to MySQL task * @param client * @param fromlinkid * @param tolinkid * @param input * @param tableName * @param columns * @param mappers * @return * * * static mjob createjob (Sqoopclient cl        Ient,long Fromlinkid,long tolinkid, String input,string tablename,string columns,int mappers) {        Mjob job = Client.createjob (Fromlinkid, Tolinkid);        Job.setname ("HDFs to MySQL"); Job.setcreationuser ("Fansy");//Set the "from" Link job config values mfromconfig fromjobconfig = Job.getfromjobcon     Fig ();   Fromjobconfig.getstringinput ("Fromjobconfig.inputdirectory"). SetValue (input);//Set the "to" Link job config values        Mtoconfig tojobconfig = Job.gettojobconfig ();        Tojobconfig.getstringinput ("Tojobconfig.tablename"). SetValue (TableName); Tojobconfig.getstringinput ("Tojobconfig.columns"). SetValue (columns);//Set the driver config values mdriverconfig D Riverconfig = Job.getdriverconfig ();//Driverconfig.getstringinput ("Throttlingconfig.numextractors"). SetValue (        String.valueof (mappers));        Driverconfig.getintegerinput ("Throttlingconfig.numextractors"). SetValue (mappers);        Status Status = Client.savejob (Job);        if (Status.canproceed ()) {System.out.println ("Created job with Job Id:" + Job.getpersistenceid ());        } else {System.out.println ("Something went wrong creating the job");    } return job; }/** * Start task * @param client * @param jobId */static void Startjob (Sqoopclient clieNt,long jobId) {msubmission submission = Client.startjob (jobId);        System.out.println ("Job submission Status:" + submission.getstatus ());  if (Submission.getstatus (). IsRunning () && submission.getprogress ()! =-1) {System.out.println ("Progress        : "+ String.Format ("%.2f percent ", submission.getprogress () * 100));        } System.out.println ("Hadoop Job ID:" + submission.getexternalid ());        System.out.println ("Job Link:" + submission.getexternallink ());        Counters Counters = Submission.getcounters ();            if (counters! = null) {System.out.println ("counters:");                for (Countergroup group:counters) {System.out.print ("\ t");                System.out.println (Group.getname ());                    for (Counter counter:group) {System.out.print ("\t\t");                    System.out.print (Counter.getname ());                    System.out.print (":"); SYSTEM.OUT.PRINTLN (COUnter.getvalue ());  }}} if (Submission.getexceptioninfo () = null) {System.out.println ("Exception info:        "+submission.getexceptioninfo ()); }}/** * View task status * @param client * @param jobId */static void Checkjobstatus (Sqoopclient client,        Long jobId) {//check job status for a running job msubmission submission = Client.getjobstatus (jobId); if (Submission.getstatus (). IsRunning () && submission.getprogress ()! =-1) {System.out.println ("Progr        ESS: "+ String.Format ("%.2f percent ", submission.getprogress () * 100));    } System.out.println ("Job Status:" +submission.getstatus (). name ()); }}

Test:

1. HDFs Data:

11kate,3fansy,2212kate,3tom,321kate,32kate,3
2. Mysql:

CREATE TABLE ' test_name_age ' (  ' name ' varchar (255) default NULL,  ' age ' int (one) default null) Engine=innodb Defaul T Charset=utf8;

3. After you have finished running, review the tasks and the database



Summarize:

1. Missing data:

You can see that the data in the database is one character before and after the name column data, but the data read in the log seen on port 8088 is actually complete:


2. Cannot specify delimiter, Sqoop1.99.4 version has only one default delimiter implementation

Csvintermediatedataformat

If you want to use a different delimiter, then you have to implement it yourself; in the sqoop1.99.6 version (currently the latest version), JSON and Avro implementations are supported, as follows:



In the end, the version of sqoop1.99 is still a lot of bugs, in fact, use back to the 1.4 version of it!


Sqoop1.99.4 Java API Practices

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.