Hadoop 2.0 code: Brief Analysis of Client code

Source: Internet
Author: User

1. Overview

The following describes how Hadoop submits user-written MR programs in the form of jobs.

It mainly involves four java class files:

Org. apache. hadoop. mapreduce:

Job. java,JobSubmitter. java

Org. apache. hadoop. mapred:

YARNRunner. java,ResourceMgrDelegate. java

 

2. Code Analysis and execution logic process

1) The customer runs the program under the writing class, which saves the implementation of map and reduce functions:

Job job = new Job(new Configuration());job.setJarByClass(MyJob.class);   // Specify various job-specific parameters     job.setJobName("myjob");    job.setInputPath(new Path("in"));job.setOutputPath(new Path("out"));job.setMapperClass(MyJob.MyMapper.class);job.setReducerClass(MyJob.MyReducer.class);// Submit the job, then poll for progress until the job is completejob.waitForCompletion(true);

 

2) The customer Program Submitted by the customer calls the waitForCompletion () function in the Job.

/**  * Submit the job to the cluster and wait for it to finish.  * @param verbose print the progress to the user  * @return true if the job succeeded  * @throws IOException thrown if the communication with the   *         <code>JobTracker</code> is lost  */public boolean waitForCompletion(boolean verbose                                   ) throws IOException, InterruptedException,                                            ClassNotFoundException {    if (state == JobState.DEFINE) {      submit();    }    if (verbose) {      monitorAndPrintJob();    } else {      // get the completion poll interval from the client.      int completionPollIntervalMillis =         Job.getCompletionPollInterval(cluster.getConf());      while (!isComplete()) {        try {          Thread.sleep(completionPollIntervalMillis);        } catch (InterruptedException ie) {        }      }    }    return isSuccessful();  }

If the Job has been initialized, call the submit () function immediately, then call monitorAndPrintJob () to check the running status of the Job and Task, or enter a loop, check whether the submitted Job is completed by polling at a certain interval. If the execution is complete and the loop exists, the isSuccessful () function is called to return the status after execution.

 

2). The waitForCompletion () function calls submit () and enters the submit () function.

/**   * Submit the job to the cluster and return immediately.   * @throws IOException   */  public void submit()          throws IOException, InterruptedException, ClassNotFoundException {    ensureState(JobState.DEFINE);    setUseNewAPI();    connect();    final JobSubmitter submitter =         getJobSubmitter(cluster.getFileSystem(), cluster.getClient());    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {      public JobStatus run() throws IOException, InterruptedException,       ClassNotFoundException {        return submitter.submitJobInternal(Job.this, cluster);      }    });    state = JobState.RUNNING;    LOG.info("The url to track the job: " + getTrackingURL());   }

The submit function first calls connect () to obtain the required ClientProtocol information and connection information, and then writes the information to the Cluster Object. Then, it calls the submitJobInternal () function in the JobSubmitter class, obtain the returned status, set JobStatus to Running, and exit directly.

 

3). Enter the submitJobInternal () function under the JobSubmitter class.

 /**   * Internal method for submitting jobs to the system.   */  JobStatus submitJobInternal(Job job, Cluster cluster)   throws ClassNotFoundException, InterruptedException, IOException {    //validate the jobs output specs     checkSpecs(job);        Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster,                                                      job.getConfiguration());    //configure the command line options correctly on the submitting dfs    Configuration conf = job.getConfiguration();    InetAddress ip = InetAddress.getLocalHost();    if (ip != null) {      submitHostAddress = ip.getHostAddress();      submitHostName = ip.getHostName();      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);    }    JobID jobId = submitClient.getNewJobID();    job.setJobID(jobId);    Path submitJobDir = new Path(jobStagingArea, jobId.toString());    JobStatus status = null;    try {      conf.set("hadoop.http.filter.initializers",           "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());      LOG.debug("Configuring job " + jobId + " with " + submitJobDir           + " as the submit dir");      // get delegation token for the dir      TokenCache.obtainTokensForNamenodes(job.getCredentials(),          new Path[] { submitJobDir }, conf);            populateTokenCache(conf, job.getCredentials());      copyAndConfigureFiles(job, submitJobDir);      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);            // Create the splits for the job      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));      int maps = writeSplits(job, submitJobDir);      conf.setInt(MRJobConfig.NUM_MAPS, maps);      LOG.info("number of splits:" + maps);      // write "queue admins of the queue to which job is being submitted"      // to job file.      String queue = conf.get(MRJobConfig.QUEUE_NAME,          JobConf.DEFAULT_QUEUE_NAME);      AccessControlList acl = submitClient.getQueueAdmins(queue);      conf.set(toFullPropertyName(queue,          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());      // removing jobtoken referrals before copying the jobconf to HDFS      // as the tasks don't need this setting, actually they may break      // because of it if present as the referral will point to a      // different job.      TokenCache.cleanUpTokenReferral(conf);      // Write job file to submit dir      writeConf(conf, submitJobFile);            //      // Now, actually submit the job (using the submit name)      //      printTokens(jobId, job.getCredentials());      status = submitClient.submitJob(          jobId, submitJobDir.toString(), job.getCredentials());      if (status != null) {        return status;      } else {        throw new IOException("Could not launch job");      }    } finally {      if (status == null) {        LOG.info("Cleaning up the staging area " + submitJobDir);        if (jtFs != null && submitJobDir != null)          jtFs.delete(submitJobDir, true);      }    }  }

Submit mainly performs the following operations:

  • Check that the input and output parameters of the Job are parameters, obtain the configuration information and the address of the remote host, generate JobID, determine the desired working directory (also the directory where MRAppMaster. java is located), and set necessary information during execution.
  • Copy the required Jar file and configuration file information to the specified working directory on the HDFS system for each node to call and use
  • Calculate and obtain the number of Input shards to determine the number of maps.
  • Call the submitJob () function in the YARNRunner class to submit the Job and output the required parameters (such as JobID ).
  • Wait for the submit () operation to return the Job execution status, and delete the corresponding working directory.

 

 

4) submitJob () function under YARNRunner class

@Override  public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)  throws IOException, InterruptedException {        /* check if we have a hsproxy, if not, no need */    MRClientProtocol hsProxy = clientCache.getInitializedHSProxy();    if (hsProxy != null) {      // JobClient will set this flag if getDelegationToken is called, if so, get      // the delegation tokens for the HistoryServer also.      if (conf.getBoolean(JobClient.HS_DELEGATION_TOKEN_REQUIRED,           DEFAULT_HS_DELEGATION_TOKEN_REQUIRED)) {        Token hsDT = getDelegationTokenFromHS(hsProxy, new Text(                 conf.get(JobClient.HS_DELEGATION_TOKEN_RENEWER)));        ts.addToken(hsDT.getService(), hsDT);      }    }    // Upload only in security mode: TODO    Path applicationTokensFile =        new Path(jobSubmitDir, MRJobConfig.APPLICATION_TOKENS_FILE);    try {      ts.writeTokenStorageFile(applicationTokensFile, conf);    } catch (IOException e) {      throw new YarnException(e);    }    // Construct necessary information to start the MR AM    ApplicationSubmissionContext appContext =      createApplicationSubmissionContext(conf, jobSubmitDir, ts);    // Submit to ResourceManager    ApplicationId applicationId = resMgrDelegate.submitApplication(appContext);    ApplicationReport appMaster = resMgrDelegate        .getApplicationReport(applicationId);    String diagnostics =        (appMaster == null ?            "application report is null" : appMaster.getDiagnostics());    if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED        || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {      throw new IOException("Failed to run job : " +        diagnostics);    }    return clientCache.getClient(jobId).getJobStatus(jobId);  }
  • Set necessary configuration information and initialize the Application context information. The context information includes the resources required by the MRAppMaster, and execute the MRAppMaster command.
  • Call the submitApplication () method of ResourceMgrDelegate, input the Application context information, submit the Job to ResourceManager, and return the generated ApplicationId after the function is executed (the ApplicationId is generated when JobID is generated ).
  • Finally, the Job status is returned, and the function exits.

 

 

5) submitApplication () function under ResourceMgrDelegate class

public ApplicationId submitApplication(      ApplicationSubmissionContext appContext)   throws IOException {    appContext.setApplicationId(applicationId);    SubmitApplicationRequest request =         recordFactory.newRecordInstance(SubmitApplicationRequest.class);    request.setApplicationSubmissionContext(appContext);    applicationsManager.submitApplication(request);    LOG.info("Submitted application " + applicationId + " to ResourceManager" +            " at " + rmAddress);    return applicationId;  }

 

This function is simple.

  • Set ApplicationId in the Application context,
  • Set the Application context information to the request information to be requested.
  • Finally, use Hadoop RPC to remotely call the submitApplication () method in the ClientRMService class of ResourcesManager, and submit the configured request information containing the Application context information to the ResourcesManager end.

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.