Jobhistory Search Intelligence

Source: Internet
Author: User
Tags dateformat html page javascript array
Preface

Hadoop cluster troubleshooting work must have used jobhistory, this is a very useful "sharp weapon", why so? As the name of the tool, this tool can help you find the history job ran the information, and the information records are very detailed, From job to task to taskattempt. If this is the case, 1 jobs suddenly failed, you want to find out why, in the Jobhistory Web interface, click the Details link, basically can find the reason. But seemingly perfect job analysis tools, There are also many places where it is not very convenient to use, so we want to make some improvements so that it is more usable, and I believe we can help people who are also using Jobhistory.


deficiencies of existing jobhistory

From the beginning to use this analysis tool to now, 1 I have always been in a particularly uncomfortable place is that it is difficult to quickly find out the history of a little bit farther job information, because sometimes I would like to do the same period of job comparison, including running time, task failures and other indicators. So you need to put yesterday, The data of the day before yesterday was pulled out. Then a lot of people's normal response is to show a longer job by increasing the number of jobs that are displayed by default on the Jobhistory page. This method is simple and convenient, by changing the following configuration item and restarting Jobhistory, you can do it immediately:

<property>
  <name>mapreduce.jobhistory.joblist.cache.size</name>
  <value>20000 </value>
  <description>size of the job list cache</description>
</property>

This default value is to display 2w, we must not think that this value is very large, when your cluster 1 days to run tens of thousands of jobs, this value is obviously small, you can only display to nearly 2, 3 days of data, if suddenly I want to see last week's data, found that can not see, this is simply a nightmare. We were in this situation , and then we turn this configuration up to 10w so we can keep the data for a while. But another problem is exposed, page loading is too slow, Jobhistory's main page in 1 minutes is not to be displayed, if you read the Jobhistory page rendering code, As you can see, his page is directly all rendered well after the display, and does not say that the so-called each page only loads part, you want to load 10w

Record, I immediately returned to the 10w record, forming a super large HTML page, back to the browser, so we later found that a lot of time spent on the download page time, rather than the back end of the job list to return information on the time. But there is no way, in order to see more historical data, You can only sacrifice the user experience. I believe that some of the Bo friends are certainly some people have also encountered such problems. Describe so much, in fact, we finally want to achieve the 1 goal is that I want to not display so many job data, keep the last 1 days, so that the page can be opened quickly, Second, I was able to find historical data. Obviously, this is not in the original jobhistory, so we can transform him, so that the tool can be more "intelligent" some.


jobhistory Use status

Here's a look at how the current Hadoop developer uses jobhistory, which typically uses the following button:


This is a very wide search button, after the job page information is loaded, you can enter the target job name you want, the filter results will appear immediately, when you clear the query string, the job list records will be restored to the original. Apparently, This is the 1 very simple and straightforward search function. I have so much information, there is the display, not on the display. So to achieve the optimization goals mentioned in the previous section, we must optimize the search function. Of course, we keep the current search button intact.


jobhistory Search Optimization Goals

From the above we know to do jobhistory "intelligent", need to transform in the search function. So what is the specific search scenario that we are more likely to encounter?

First, based on the job name, we know the job name of the failure, and then we make the query.

Second, according to Jobid, we get the failed job from the log or other way, directly to the job search, jump to the details page.

And there are the most critical 1 premise, the above search function is not dependent on the front page display of the job information list, such as my job display number of 10, I can still find out 1 weeks ago a failure job information. Here are a few points:

1. Here it is necessary to do jobhistory front and rear end cache-job number of configuration separation, the current use is the same configuration, so this will lead to such problems.

2. Completion of the second demand point is easier than the first, because the second has a jobid, directly linked to the Assembly, directly to a redirect can be resolved, all Job Details page Information link is a template, not to be too complex to implement.

3. The first requirement requires the backend to do some filtering by passing in the job name, and then back to the front-end display, filtering out the vast majority of useless job information. Like this demand, if in the common business system development, it must be simple, then how to transform in Hadoop, This logical realization of others is not so simple and direct.


jobhistory Specific Code transformation

Before the end of the goal and the method, the final need to really from the code level to achieve, so need to understand how the current Jobhistory home page is how to get, where the data comes, where the page front-end code is written directly there is a ready-made. html file? It's useless to speculate in a vacuum. Only in-depth research and analysis of the source code can have the answer. In fact, the code implementation of the page in the Hsjobsblock.java class. The logical implementation of page rendering is implemented in the Render method.

/* * (non-javadoc) * @see Org.apache.hadoop.yarn.webapp.view.htmlblock#render (org.apache.hadoop.yarn.webapp.view.Ht
      Mlblock.block) */@Override protected void render (Block html) {tbody<tableThe method that obtains the historical job information is the Appcontext.getalljobs method that appears in the For loop. This method triggers the Get job information method for the backend service. But Appcontext is a basic class, we need to find the corresponding implementation class, The inheritance of Appcontext is as follows:


Obviously, the method is in the Jobhistory class. The following method is called:

  @Override public
  map<jobid, job> getalljobs () {
    return storage.getallpartialjobs ();
  }
It will eventually be called to this method:

@Override public
  map<jobid, job> getallpartialjobs () {
    log.debug ("called Getallpartialjobs ()");
    Sortedmap<jobid, job> result = new Treemap<jobid, job> ();
    try {for
      (Historyfileinfo mi:hsManager.getAllFileInfo ()) {
        if (mi! = null) {
          JobId id = mi.getjobid ();
          Result.put (ID, New Partialjob (Mi.getjobindexinfo (), id));}}}
    catch (IOException e) {
      Log.warn ("Error trying to scan for all Fileinfos", e);
      throw new Yarnruntimeexception (e);
    }
    return result;
  }
All files with the. jhist suffix for job run information are controlled by the Historyfilemanager class, which is the Hsmanager in the above code. He returns the following object information:

Public collection
So the information record that eventually returns to the front end is Joblistcache this object. This object is initially set to the size of the cache, which is the configuration item mentioned earlier in the article:

  Protected Joblistcache Createjoblistcache () {
    return new Joblistcache (Conf.getint (
        jhadminconfig.mr_history _joblist_cache_size,
        jhadminconfig.default_mr_history_joblist_cache_size), maxhistoryage);
  }
So we have to separate this configuration item so that the job information control is separated from the front and back end. After the general process is clear, we go back to the previous point of demand, to implement the job name filtering, so we want to add new interfaces. We can imitate the existing GetAllJobs method, The getdisplayedjobs is then created with 1 parameters (String filetername). First you need to add the interface definition in Appcontext:

Map<jobid, job> getdisplayedjobs (String filtername);
The default implementation is then given, in all of his inheriting classes, to return null at the beginning. Then add a similar interface in Historystorage.java, which is the method that needs to be called by Jobhistory:

  /**
   * Get partial displayed of the cached jobs.
   * @param filtername the filter job name
   * @return All of the cached jobs
   */
  map<jobid, job> Getpartialdi Splayedjobs (String filtername);
Then in the cachedhistorystorage to give implementation, but before implementation, but also need to control the number of display, not blindly display all the job in the cache, so you have to add a configuration item in this class, let's call the following name

<property>
  <name>mapreduce.jobhistory.joblist.cache-displayed.size</name>
  <value >1000</value>
  <description>the size of job-list cache displayed in the Jobhistory Web UI.
  </description>
</property>
Then initialize this configuration item into the variable

  @SuppressWarnings ("Serial")
  private void Createloadedjobcache (Configuration conf) {
    ...

    Cachedisplayedsize =
        Conf.getint (jhadminconfig.mr_history_joblist_cache_displayed_size,
            jhadminconfig.default_mr_history_joblist_cache_displayed_size);

    ...
  }
The design of this new configuration is a key step in the implementation of the front-end job number control, and then in the use of specific getjob methods, the implementation of the logic is as follows:

@Override public Map<jobid, job> getpartialdisplayedjobs (String filtername) {log.debug ("called Getpartialdisp
    Layedjobs () ");
    String JobName;

    int cachejobsize = 0;
    Sortedmap<jobid, job> result = new Treemap<jobid, job> ();
          try {for (Historyfileinfo Mi:hsManager.getAllFileInfo ()) {if (mi! = null) {cachejobsize++;
                if (Cachejobsize > Cachedisplayedsize) {log.info ("Getpartialdisplayedjobs operation Ends"
            + ", allfileinfo size is more than Cachedisplayedsize:" + cachedisplayedsize);
          Break
          } JobId id = mi.getjobid ();
          JobName = Mi.getjobindexinfo (). Getjobname (); if (FilterName = = NULL | | filtername.length () = = 0) {result.put (ID, New Partialjob (Mi.getjobindexinfo (), id))
          ;
          } else if (jobName! = null && jobname.length () > 0) {if (Jobname.contains (filtername)) {    Result.put (ID, New Partialjob (Mi.getjobindexinfo (), id)); 
      }}}}} catch (IOException e) {Log.warn ("Error trying to scan for all Fileinfos", e);
    throw new Yarnruntimeexception (e);
  } return result; }
Compared with the original direct acquisition Job-cache method, the number control and the name condition filtering filter are added. OK, the implementation of the backend Code section is so, the following is the front-end part of the changes. Here is a brief description:

New parameters:

/**
 * Params constants for the AM WebApp and the WebApp.
 */Public
interface Amparams {
  ....
  Static final String jobfilter_name = "Jobfilter.name";
}
Modify the code in the Hswebapp.java that involves the link jump section and the link in the navigation bar, and the default incoming empty string is the job name filter condition:

Route (Pajoin ("/app", Jobfilter_name), Hscontroller.class);
Li (). A (URL ("app", "" ")," Jobs "). _ (). _ ();
Finally, change the method of the Appcontext call in the page code to read as follows:

/*
   * (non-javadoc)
   * @see Org.apache.hadoop.yarn.webapp.view.htmlblock#render ( Org.apache.hadoop.yarn.webapp.view.HtmlBlock.Block)
   *
  /@Override protected void render (Block html) {
    ...
    String filtername = $ (jobfilter_name);
    for (Job J:appcontext.getdisplayedjobs (filtername). VALUES ()) {
      Jobinfo job = new Jobinfo (j);
      Jobstabledata.append ("[\" ")
      . Append (Dateformat.format (New Date (Job.getsubmittime ())). Append (" \ ", \" ")
      . Append (Dateformat.format (new Date (Job.getstarttime))). Append ("\", \ "").
      Append (Dateformat.format (new date (Job.getfinishtime ()))). Append ("\", \ "").
      append ("<a href="). Append (URL ("Job", Job.getid ())). Append (">").
      Append ( Job.getid ()). Append ("</a>\", \ "")
      . Append (Stringescapeutils.escapejavascript ( Stringescapeutils.escapehtml (
        job.getname ())). Append ("\", \ "" ")
...
. OK, so far, the first step of intelligent search completed, the other 1 points of demand is relatively simple, because directly on the page transformation can be realized, write-end JS code, construct a link.

String Jobidsearchclickmethod =
            "function Jobsearch () {\ n" +
            "    var Jobid = $ ('. Jobid '). val () \ n" +
            "    window.location = '/jobhistory/job/' + jobid\n "+
            "}\n ";
Finally add the corresponding 2 search buttons, and JS part of the code embedded in the page.

 @Override protected void render (Block html) {tbody<table
....
    String Jobidsearchclickmethod =
            "function Jobsearch () {\ n" +
            "    var Jobid = $ ('. Jobid '). val () \ n" +
            "    window.location = '/jobhistory/job/' + jobid\n "+
            "}\n ";

    String Jobnamesearchclickmethod =
            "function jobnamesearch () {\ n" +
            "    var filtername = $ ('. JobName '). Val () \ n "+
            "    window.location = '/jobhistory/app/' + filtername\n "+
            "}\n ";
    Html.script (). $type ("Text/javascript").
    _ ("var jobstabledata=" + jobstabledata + "\ n" + jobidsearchclickmethod
        + Jobnamesearchclickmethod). _ ();
The final rendered page effect is as follows, the 2 search buttons in the upper left corner of the table are the new ones, and the page looks ugly but works.



search function Test

first. Specify Jobid search tests


After the search is confirmed, go to the appropriate details page:


This link is we use JS code to assemble out.

second. Specify Job name Search

For example, I have run several word count test jobs here. Start by typing in 1 unrelated filters hello (see browser link):


Not getting the result, right.

Then enter word to match:


Get 2 records to meet our requirements.

The number of front-end page job count control function, I have also tested, can pass, here is not the screenshot shows, everyone is interested in this part of the code can be changed to their own Hadoop code, patch code link is shown below.


RELATED LINKS

Github Patch Link:https://github.com/linyiqun/open-source-patch/tree/master/mapreduce/MAPREDUCE-hsSearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.