1. Brief Introduction
This document describes some of the most important and commonly used Hadoop on Demand (HOD) configuration items. These configuration items can be specified in two ways: the INI-style configuration file, the command-line options for the Hod shell specified by the--section.option[=value] format. If the same option is specified in two places, the values in the command line override the values in the configuration file.
You can obtain a brief description of all the configuration items by using the following command:
$ hod--VERBOSE-HELP2. Duan
The Hod configuration file is divided into the following configuration segments:
Hod:hod Client Configuration item Resource_manager: Specifies the configuration entry for the resource manager to use, as well as some additional parameters that are required to use the resource manager. The configuration items of the Ringmaster:ringmaster process hodring:hodring the configuration items of the process gridservice-mapred:map/reduce the configuration items of the daemon Gridservice-hdfs: Configuration Item 3 of the HDFs daemon. Hod Configuration Items
The next section describes some of the configuration items that are common in most hod configuration segments, and then describes the configuration items that are specific to each configuration segment.
3.1 General Configuration Items
Some configuration items are defined in multiple segments in the Hod configuration. A configuration item defined in a segment is used by all processes that the paragraph applies to. These configuration items have the same meaning, but they can have different values in different segments.
The temporary directory used by the Temp-dir:hod process. Make sure that the user running Hod has permission to create subdirectories in this specified directory. If you want to use a different temporary directory for each assignment, you can use environment variables that the resource manager makes available to the HOD process. For example, when torque is set, the--ringmaster.temp-dir=/tmp/hod-temp-dir. $PBS _jobid will allow ringmaster to use a different temp directory for each application Troque will expand the environment variable before ringmaster starts. Debug: Numeric type with a value range of 1-4. 4 produces the most log information. Log-dir: The directory where the log files are stored. The default value is <install-location>/logs/. The restrictions and considerations for Temp-dir variables are also used here. Xrs-port-range: Port range, where one of the available ports is selected to run the XML-RPC service. Http-port-range: Port range, where one of the available ports is selected for running the HTTP service. Java-home: Java location for Hadoop. The address that the Syslog-address:syslog daemon will bind to. The format is host:port. If this option is configured, the Hod log information is logged to the syslog at this location. 3.2 Hod Configuration item cluster: A descriptive name for the cluster. For torque, this value is specified as the ' node property ' for all nodes in the cluster. Hod uses this value to calculate the number of available nodes. Client-params: A comma-delimited list of Hadoop configuration parameters, each of which is a key-value pair. A hadoop-site.xml is generated on the submission node to run the Map/reduce job. Job-feasibility-attr: A regular expression that specifies whether and how the job can be checked for feasibility-resource manager restrictions or scheduling restrictions. It is currently implemented through the ' comment ' attribute of the torque job, which is not enabled by default. When this configuration item is set, HOD uses it to determine which kinds of restrictions are enabled and whether the collection is recycled or left in a queued state when the request exceeds the limit or if the cumulative limit is exceeded. The Torque Comment property can be periodically updated by an external mechanism. For example, the Comment property is updated by the checklimits.sh in the Hod/support directory, which sets job-feasibilityThe value of-attr equals Torque_user_limits_comment_field, "user-limits exceeded. Requested: ([0-9]*) Used: ([0-9]*) Maxlimit: ([0-9]*)] will cause hod to behave accordingly. 3.3 Resouce_manager Configuration Entry queue: The name of the queue configured in the resource manager, and the job is submitted here. Batch-home: An installation directory with the Explorer executable file under ' bin '. Env-vars: A comma-delimited list of key-value pairs, in the form of Key=value, which is passed to the job running at the compute node. For example, if Ptyhon is not installed in a regular location, the user can specify the path to the PYTHON executable by setting the environment variable ' hod_python_home '. This variable is then available to the HOD process in which the compute node runs. 3.4 Ringmaster Configuration Item Work-dirs: This is a comma-delimited list of paths that will be used as hod to generate and pass to Hadoop the root directory of the directory where DFS and map/reduce data are stored. For example, this is the path where the DFS blocks are stored. In general, how many disks are assigned to a number of paths to ensure that all disks are exploited. Restrictions and considerations for Temp-dir variables apply here as well. Max-master-failures:hadoop the number of times a master daemon can fail before it is started, HOD will let this cluster allocation fail. In the Hod cluster, there are sometimes one or several "bad" nodes due to some problems, such as a machine that doesn't have Java installed, no Hadoop installed, or a Hadoop version error. When this configuration item is set to a positive integer, ringmaster returns the error to the client only if the Hadoop matser (Jobtracker or Namenode) is on the bad node above, and the number of failed startup failures exceeds the set value for the reasons mentioned above. If the number of attempts to start does not exceed the set value, the same Hadoop master assigns the hodring to the next hodring request to run a command. In this way, even if there are some bad nodes in the cluster, Hod will try his best to make this assignment successful. 3.5 GRIDSERVICE-HDFS Configuration Item external: If you are placed False,hod you must create a allocate cluster yourself on the node that is assigned through the HDFS command. Note that in this case, if the cluster is recycled, the HDFs cluster stops,All data will be lost. If set to true, it attempts to link to an external configured HDFS system. Typically, because the input of the job is placed on the HDFs before the job is run, and the output of the job needs to be persisted, a HDFS cluster within the production environment is of little significance. Host: Externally configured namenode hostname. Fs_port:namenode RPC Service-bound port. Info_port:namenode the port that the Web UI service binds to. pkgs: Install directory with bin/hadoop executable file. Can be used to use a preinstalled version of Hadoop on a cluster. Server-params: A comma-delimited list of Hadoop configuration parameters, each of which is a key-value pair form. These will be used to produce hadoop-site.xml files that are used by Namenode and Datanode. Final-server-params: Except to be marked as final and above. 3.6 gridservice-mapred Configuration Item external: If you are placed False,hod you must create a allocate cluster yourself on the node that is assigned through the Map/reduce command. If set to true, it attempts to link to an external configured map/reduce system. Host: Externally configured jobtracker hostname. Tracker_port:jobtracker RPC Service-bound port. Info_port:jobtracker the port that the Web UI service binds to. pkgs: Install directory with bin/hadoop executable file. Server-params: A comma-delimited list of Hadoop configuration parameters, each of which is a key-value pair form. These will be used to produce hadoop-site.xml files that are used by Jobtracker and Tasktracker. Final-server-params: Except to be marked as final and above. 3.7 The directory on the Hodring configuration item Mapred-system-dir-root:dfs, Hod creates subdirectories in this directory and passes the full path as the value of the parameter ' Mapred.system.dir ' to the Hadoop daemon. The full path format is Value-of-this-option/userid/mapredsystem/cluster-id. Note that if HDFs has permissions enabled, all users should be allowed to create subdirectories under the path specified here. Setting the value of this configuration entry to supplied causes Hod to use theUser's home directory to produce mapred.system.dir values. Log-destination-uri: A URL that reflects an external static DFS or path on a local filesystem on a cluster node. When the cluster is recycled, Hod uploads the Hadoop log to the path. To specify a DFS path, use the ' hdfs://path ' format. To specify a local file system path on a cluster node, use the ' file://path ' format. When Hod reclaims the cluster, the Hadoop log is deleted as part of the Hod cleanup process. To keep these logs persistent, you can use this configuration item. The format of the path is Values-of-this-option/userid/hod-logs/cluster-id. Note that all users should be guaranteed to create subdirectories in the directory specified here. Setting this value to Hdfs://user causes these logs to be transferred to the user's home directory on DFS. pkgs: Install directory with bin/hadoop executable file. This configuration entry is used if a HDFs Url,hod is assigned to the Log-destination-uri to upload a log. Note that this configuration entry comes in handy when the user uses a different version of the tarballs with the external static HDFs.