Hadoop jobtrack Analysis

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. After the client specifies various parameter configurations of the job, it calls the job. waitforcompletion (true) method to submit the job to jobtracker, waiting for the job to complete.

Public void submit () throws ioexception, interruptedexception, classnotfoundexception {ensurestate (jobstate. define); // check jobstate status setusenewapi (); // check and set whether to use the new mapreduce API // connect to the jobtracker and submit the Job Connect (); // link jobtracker info = jobclient. submitjobinternal (CONF); // submit the job information to super. setjobid (info. GETID (); State = jobstate. running; // change job status}

The preceding Code consists of two steps: connecting to jobtracker and submitting job information. The Connect Method instantiate jobclient objects, including setting jobconf and init:

Public void Init (jobconf conf) throws ioexception {string tracker = Conf. get ("mapred. job. tracker "," local "); // read the configuration file information to determine whether the job is running in local standalone mode or in distributed mode tasklogtimeout = Conf. getint (tasklog_pull_timeout_key, default_tasklog_timeout); this. ugi = usergroupinformation. getcurrentuser (); If ("local ". equals (tracker) {// for standalone mode, new localjobrunner Conf. setnummaptasks (1); this. jobsubmitclient = new localjobrunner (CONF);} else {This. jobsubmitclient = createrpcproxy (jobtracker. getaddress (CONF), conf );}}

In distributed mode, an RPC proxy link is created:

public static VersionedProtocol getProxy(      Class<? extends VersionedProtocol> protocol,      long clientVersion, InetSocketAddress addr, UserGroupInformation ticket,      Configuration conf, SocketFactory factory, int rpcTimeout) throws IOException {    if (UserGroupInformation.isSecurityEnabled()) {      SaslRpcServer.init(conf);    }    VersionedProtocol proxy =        (VersionedProtocol) Proxy.newProxyInstance(            protocol.getClassLoader(), new Class[] { protocol },            new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));    long serverVersion = proxy.getProtocolVersion(protocol.getName(),                                                   clientVersion);    if (serverVersion == clientVersion) {      return proxy;    } else {      throw new VersionMismatch(protocol.getName(), clientVersion,                                 serverVersion);    }  }

From the code above, we can see that hadoop actually uses the proxy API provided by Java to implement Remote Procedure Call.

After the initialization, You need to submit a job

Info = jobclient. submitjobinternal (CONF); // submit the job information

The submit method does the following:

1. Replace the directory name in conf with the HDFS proxy name.

2. Check whether the output is legal: for example, whether the path already exists or is clear

3. Divide the data into multiple splits and place them on HDFS, and write them into the job. xml file.

4. Call the submitjob method of jobtracker

This method creates a jobinprogress object, checks whether the access permission and system parameters meet the job requirements, and finally addjob:

 private synchronized JobStatus addJob(JobID jobId, JobInProgress job)   throws IOException {    totalSubmissions++;    synchronized (jobs) {      synchronized (taskScheduler) {        jobs.put(job.getProfile().getJobID(), job);        for (JobInProgressListener listener : jobInProgressListeners) {          listener.jobAdded(job);        }      }    }    myInstrumentation.submitJob(job.getJobConf(), jobId);    job.getQueueMetrics().submitJob(job.getJobConf(), jobId);    LOG.info("Job " + jobId + " added successfully for user '"              + job.getJobConf().getUser() + "' to queue '"              + job.getJobConf().getQueueName() + "'");    AuditLogger.logSuccess(job.getUser(),         Operation.SUBMIT_JOB.name(), jobId.toString());    return job.getStatus();  }

Totalsubmissions records the number of times the client submits a job to jobtracker. Jobs are the ing tables of all jobs that can be managed by jobtracker.

Map <jobid, jobinprogress> jobs = collections. synchronizedmap (New treemap <jobid, jobinprogress> ());

Taskscheduler is used to schedule job execution policies. Its Class diagram is as follows:

Hadoop job scheduling mechanism;

Public Enum schedulingmode {
Fair, FIFO
}
1. Fair Scheduling of fairscheduler
For each user, distributed resources are allocated fairly, and each user has a job pool. If a user currently occupies many resources, it is unfair for other users, then, the scheduler will kill some tasks of users with many resources and release the resources for others to use.
2. Capacity scheduling jobqueuetaskscheduler
Maintain multiple queues in the distributed system. Each queue has a certain capacity, and jobs in each queue are scheduled according to the FIFO policy. A queue can contain queues.

Both scheduler must implement the public synchronized list <task> assigntasks (tasktracker tracker) method of taskscheduler. This method generates tasks that can be assigned through specific calculations.

Next let's take a look at jobtracker's work:

Record the number of retries to update jobtracker:

 while (true) {      try {        recoveryManager.updateRestartCount();        break;      } catch (IOException ioe) {        LOG.warn("Failed to initialize recovery manager. ", ioe);        // wait for some time        Thread.sleep(FS_ACCESS_RETRY_PERIOD);        LOG.warn("Retrying...");      }    }

Start the Job scheduler. The default value is fairscheduler:
Taskscheduler. Start (); initializes some management objects, such as the job pool management pool.

    // Initialize other pieces of the scheduler      jobInitializer = new JobInitializer(conf, taskTrackerManager);      taskTrackerManager.addJobInProgressListener(jobListener);      poolMgr = new PoolManager(this);      poolMgr.initialize();      loadMgr = (LoadManager) ReflectionUtils.newInstance(          conf.getClass("mapred.fairscheduler.loadmanager",               CapBasedLoadManager.class, LoadManager.class), conf);      loadMgr.setTaskTrackerManager(taskTrackerManager);      loadMgr.setEventLog(eventLog);      loadMgr.start();      taskSelector = (TaskSelector) ReflectionUtils.newInstance(          conf.getClass("mapred.fairscheduler.taskselector",               DefaultTaskSelector.class, TaskSelector.class), conf);      taskSelector.setTaskTrackerManager(taskTrackerManager);      taskSelector.start();

Jobinitializer has an executorservice threadpool of a fixed size. Each thread is used to initialize a job.

 try {      JobStatus prevStatus = (JobStatus)job.getStatus().clone();      LOG.info("Initializing " + job.getJobID());      job.initTasks();      // Inform the listeners if the job state has changed      // Note : that the job will be in PREP state.      JobStatus newStatus = (JobStatus)job.getStatus().clone();      if (prevStatus.getRunState() != newStatus.getRunState()) {        JobStatusChangeEvent event =           new JobStatusChangeEvent(job, EventType.RUN_STATE_CHANGED, prevStatus,               newStatus);        synchronized (JobTracker.this) {          updateJobInProgressListeners(event);        }      }    }

Initialization is mainly used to generate tasks and notify other listeners to perform other operations. Inittasks mainly handles the following tasks:

// Record the running job information submitted by the user. Try {userugi. DOAs (New privilegedexceptionaction <Object> () {@ override public object run () throws exception {jobhistory. jobinfo. logsubmitted (getjobid (), Conf, jobfile, starttimefinal, hasrestarted (); return NULL ;});} catch (interruptedexception IE) {Throw new ioexception (IE );} // set and record the job priority setpriority (this. priority); // generate the key required by each task // generateandstoretokens ();

Then, read the metadata of the data of jobtracker split. The metadata includes the following attributes:

Private tasksplitindex splitindex; // The index location after shuffling private long inputdatalength; // The data length after shuffling private string [] locations; // The data storage location

Calculate nummaptasks based on the length of the metadata and verify whether the data storage address can be connected.

Next we generate map tasks and reducer tasks:

   maps = new TaskInProgress[numMapTasks];    for(int i=0; i < numMapTasks; ++i) {      inputLength += splits[i].getInputDataLength();      maps[i] = new TaskInProgress(jobId, jobFile,                                    splits[i],                                    jobtracker, conf, this, i, numSlotsPerMap);    }

   this.jobFile = jobFile;    this.splitInfo = split;    this.jobtracker = jobtracker;    this.job = job;    this.conf = conf;    this.partition = partition;    this.maxSkipRecords = SkipBadRecords.getMapperMaxSkipRecords(conf);    this.numSlotsRequired = numSlotsRequired;    setMaxTaskAttempts();    init(jobid);

In addition to the jobtracker, split, and job information corresponding to the task

Maxskiprecords: records the maximum number of error records that can be skipped during task execution;

Setmaxtaskattempts -- set the maximum number of times a task can be executed. When a task fails to be executed twice, it will re-Execute again in Skip mode, record those bad records, and then skip these bad records when the task is executed again for the fourth time.

The process of creating a CER task is similar.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop jobtrack Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop jobtrack Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support