Anatomy of a configuration class in Hadoop

Source: Internet
Author: User
Tags iterable

The configuration is the common class of the five components of Hadoop, so put it under the core, org.apache.hadoop.conf.Configruration. This class is the configuration information class for the job, and any configuration information that can be used must be passed through configuration, because it enables the sharing of information between multiple mapper and multiple reducer tasks.

Class diagram

Description: The configuration implements the Iterable and writable two interfaces, where Iterable is implemented to iterate over all Name-value key-value pairs loaded into memory by the configuration object. Implementation of writable is to achieve the requirements of the Hadoop framework serialization, can be in-memory name-value serialized to the hard disk, about the implementation of the two interfaces, I would like to say no more, should think of understand. The following is a detailed analysis of how the configuration works, including the loading of profiles, the principles of obtaining configuration information and loading configuration information, as well as the considerations that should be noted during use. The study of any one class begins with a constructor, and even if it is a singleton, static factory-obtained objects are also inseparable from constructor. Configuration has three constructors
Public Configuration () {this (true);   }/** A New configuration where the behavior of reading from the default * resources can be turned off. * If the parameter {@code Loaddefaults} is False, the new instance * would not be the load resources from the default files    .    * @param loaddefaults Specifies whether to load from the default files */Public Configuration (Boolean loaddefaults) {    This.loaddefaults = Loaddefaults;    Updatingresource = new hashmap<string, string> ();    Synchronized (Configuration.class) {registry.put (this, null);   }}/** * A new configuration with the same settings cloned from another.   * * @param other the configuration from which to clone settings. */@SuppressWarnings ("unchecked") public Configuration (configuration other) {this.resources = (ArrayList) other.reso    Urces.clone (); Synchronized (Other) {if (other.properties! = null) {this.properties = (properties) Other.properties.clone ()      ; } if (otheR.overlay = null) {This.overlay = (Properties) other.overlay.clone ();    } This.updatingresource = new hashmap<string, string> (Other.updatingresource);    } this.finalparameters = new hashset<string> (other.finalparameters);    Synchronized (Configuration.class) {registry.put (this, null); }  }
1,configuration () 2,configuration (Boolean loaddefaults) 3, Configuration (Configuraiont Other) The first two constructor use a typical overlapping constructor pattern, or the default parameterless constructor generates a configuration object that loads the default profile, where the configuration (Boolean Loaddefaults) is the parameter in order to control whether the object being constructed is an identity that has the default configuration file loaded or not. But if I were to design, I wouldn't be in such a mess, directly using two static factory methods to identify objects of different natures--getconfigruationwithdefault () and getconfiguration, so that developers can use them to words too literally, Isn't that a good way? No, it's not a rip. When Loaddefaults is False, the configuration object does not load the profile loaded by Adddefaultresource (String Resource) into memory. But will be passed AddResource (...) The loaded configuration file is loaded into memory. How is it implemented?in the constructor of the configuration.this. Loaddefaults = loaddefaults is the flag that sets whether the default profile is loaded, and we Shinuna, after constructing the configuration object, the next call GetType ( The String name,type default method obtains the value of a name. Take Getint as an example to see the Code of Getint ()  getInt (String name,int defalutvale)
public int GetInt (string name, int defaultvalue) {    string valuestring = Get (name);    if (valuestring = = null)      return defaultvalue;    try {      String hexstring = gethexdigits (valuestring);      if (hexstring! = null) {        return Integer.parseint (hexstring, +);      }      Return Integer.parseint (valuestring);    } catch (NumberFormatException e) {      return defaultvalue;    }  }
The first line of the method, string valuestring = Get (name), is the key, so take a look at the get (string name) method get (string name)
Private synchronized Properties GetProps () {    if (properties = = null) {      properties = new properties ();      Loadresources (Properties, resources, Quietmode);      if (overlay!= null) {        properties.putall (overlay);        For (map.entry<object,object> Item:overlay.entrySet ()) {          updatingresource.put (String) Item.getkey (), Unknown_resource);    }}} return properties;  }
Here's what to say, the path from Getint--and get---getprops is any time the GetType method calls the path to go, but in the GetProps () This place is going to separate, The first time the GetType method is used to determine the properties = = NULL, the Loadresources (Properties,resources,quietmode) method is executed. However, subsequent code is not executed if the properties are not null. The following step into the Loadresources (Properties,resources,quietmode) method to explore the Loadresources (Properties,resources,quietmode)
private void Loadresources (properties properties,                             ArrayList resources,                             Boolean quiet) {    if (loaddefaults) {for      (String resource:defaultresources) {        LoadResource (properties, resource, quiet);      }          Support for the hadoop-site.xml as a deprecated case      if (getresource ("Hadoop-site.xml")!=null) {        LoadResource ( Properties, "Hadoop-site.xml", quiet);      }    }        for (Object resource:resources) {      LoadResource (properties, resource, quiet);    }  }

Did you see the loaddefaults? is not very happy, in constructor involved in the control of the default configuration file loaded Loaddefaults finally appeared. Defaultresource is only loaded when Loaddefaults is true. However, the configuration files stored in the resources will be loaded, and there are two containers defaultresources and resource that store the configuration files.

/**   * List of configuration resources.   *  /private arraylist<object> resources = new arraylist<object> ();/**   * List of default resources. Resources is loaded in the order of the list    * entries */  private static final copyonwritearraylist<string > defaultresources =    new copyonwritearraylist<string> ();
One is the list of the Cofiguration resources, a list of the default resources, so how do you distinguish between the default resources? Don't worry, look at the following analysis. There are multiple ways to load a configuration file in the Config class, Adddefaultresource (string name), AddResource (string resoruce), and overloaded methods, Addresourceobject (Object Resource). Due to AddResource (...) The method of the class is finally implemented by calling Addresourceobject, so this depends on the difference between Adddefaultresource (String name) and Addresourceobject (Object Resource). Adddefaultresource (String Resource)
public static synchronized void Adddefaultresource (String name) {    if (!defaultresources.contains (name)) {      Defaultresources.add (name);      For (Configuration Conf:REGISTRY.keySet ()) {        if (conf.loaddefaults) {          conf.reloadconfiguration ();        }      }    }  }

Addresourceobject (Object object)

Private synchronized void Addresourceobject (Object Resource) {    Resources.add (Resource);                      Add to Resources    reloadconfiguration ();  }
Do you have a clear view? I didn't see it clearly. Adddefaultresource (String name) internally throughdefaultresources. Add(name) adds the name of the configuration file to the Container defaultresources container, Addresourceobject (Object Resource) through the resources. Add(resource) adds the configuration file to the resources container. So that means that the default configuration file is loaded by Adddefaultresource (String name) and stored in the Defaultresources container. A profile stored in resources cannot be used as the default configuration file. careful observation of the implementation of these two methods, found thatreloadconfiguration(), there is an article to do , or read the source to speak it  reloadconfiguration()
/**   * Reload configuration from previously added resources.   * This   method would clear all the configuration read from    the added * resources, and final parameters. This would make the resources to    * be read again before accessing the values. Values that is added   * via set methods would overlay values read from the resources.   *  /public synchronized void Reloadconfiguration () {    properties = null;                            Trigger Reload    finalparameters.clear ();                      Clear Site-limits  }
Well, properties=null,fianlparmeters.clear (), this clears the name-value that exist in memory. Therefore, you have to re-load the configuration file into memory after using the GetType method, so it is recommended that you do not use it during the job runAdddefaultresource (String Resource)and theAddresourceobject (Object object), because this causes the configuration file to reload into memory. It is necessary to explain theFinalparameters this filed, this feilds is also a set container, mainly used to store the final modified Name-value, fianl modified by Name-value can not be overwritten by subsequent configuration files, But in the program you can pass set (String name,string value),It is not clear here, it is very strange not to allow the administrator to modify the Name-value through the configuration file but can be modified by the user.    With regard to the third constructor, it is easy to know based on the parameters and implementation that the configuration object is generated just like the incoming configuration object .. now the principle of how to control whether or not to load the configuration file when constructing the Config object is clear, and the principle of GetType is also understood. Here is the timing diagram that calls the GetType method constructor, the GetType principle should already be clear, now look at the SetType method, the SetType (string name, Type value) method inside calls the set (string name,string value) method, This is the same as the relationship between GetType (string Name,type defaultvalue) and get (string). So let's think about a problem now.: Said above, in the use of adddefaultresources (...) and Addresourceobject (...) The Name-value method empties the in-memory key-value pair, and the Name-value in the configuration file can be reloaded into memory, saying that these name-value key-value pairs are not lost. However, the values set by Settype () are not written to the configuration file, they are in memory.
public void Set (string name, String value) {    getoverlay (). SetProperty (name, value);    GetProps (). SetProperty (name, value);    This.updatingResource.put (name, Unknown_resource);  }
GetProps returns the properties object that holds all the Name-value key-value pairs, using Set (String name,string value) The Name-value method sets the total amount of memory space that is placed on the properties object and is not written to the file, so Adddefaultresources (...) and Addresourceobject (...) When properties is set to NULL, it is not easy to load the Name-value through set (String name,string value) is it discarded?Note there is one more place in set (String name,string value) that is the keygetoverlay(). SetProperty(name, value), where the Getoverlay () method returns the overlay, the object's reference type is properties. The paradox is that the set (String name,string value) method adds Name-value to two properties objects. Yes, it is now certain that the Name-value key values set by the set (String name,string value) method are available in the field overlay object and field properties, and look back at the GetProps () method
Private synchronized Properties GetProps () {    if (properties = = null) {      properties = new properties ();      Loadresources (Properties, resources, Quietmode);      if (overlay!= null) {        properties.putall (overlay);        For (map.entry<object,object> Item:overlay.entrySet ()) {          updatingresource.put (String) Item.getkey (), Unknown_resource);    }}} return properties;  }
in the case of the properties null, in addition to loading the configuration file Value-name, will also detect whether the overlay object is not empty, not empty the overlay object in the Name-value loaded into the properties, Well, this andreloadconfiguration() is not contradictory, because reloadconfiguration() is to set the properties object to null, The overlay is not empty. It can be said that the role of overlay is to save the user-set Name-value as a backup of the memory portion of the properties, so that the Name-value configured by the system and the administrator in the properties is backed up by the configuration file. Later user-loaded Name-value have overlay backed up into memory, and the properties will not lose information while the configuration object is alive. Both the Settype and GetType methods can trigger the Loadresources () method to add Name-value to the Properties object's memory. However, once the properties have stored the Name-value key pair in the configuration file, calling Settype or the GetType method will not trigger the load action of the loadresources () unless the adddefaultresources (...) and Addresourceobject (...).  Summarize: 1 Do not use during job run adddefaultresources (...) and Addresourceobject (...) Load the resource, because this causes the Properties object to be refactored again, it is recommended to use Settype (...) at this time.
2 configuration is used very frequently in the entire mapreduce, and the Jobtraker,tasktraker process uses the configuration object when it is started. The configuration object is also used in HDFs, so I think it's important to understand the basic workings of the configuration.
3 configuration can be used to share information between mapreduce tasks, of course, the shared information is configured in the job, once the map or reduce task in the job started, the configuration object is completely independent. So the shared information is set in the job. ?

Anatomy of a configuration class in Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.