Overview
Hadoop on Demand (HOD) is a system that supplies and manages separate map/reduce and Hadoop Distributed File System (HDFS) instances on a shared cluster. It makes it easy for administrators and users to quickly build and use Hadoop. Hod is also useful for Hadoop developers and testers who can share a physical cluster through hod to test their different versions of Hadoop.
Hod relies on resource Manager (RM) to allocate nodes that are used to run Hadoop instances on top. Currently, Hod is using the Torque Resource Manager.
The basic HOD system architecture contains the following components:
A resource manager (possibly with a scheduler) a variety of HOD components Hadoop map/reduce and HDFs daemon
By interacting with the above components, HOD supplies and maintains instances of Hadoop map/reduce, or HDFs instances, on a given cluster. A node in a cluster can be considered to consist of two sets of nodes:
submission node (submit nodes): The user requests a cluster on these nodes through the Hod client, and then submits the Hadoop job via the Hadoop client. Compute node (Compute nodes): Using the Resource Manager, the Hod component runs on these nodes to supply the Hadoop daemon. The Hadoop jobs are then run on these nodes.
The following is a brief description of the application cluster and the steps required to run the job on top.
The user allocates the required number of node clusters on the submission node using the HOD client to supply Hadoop on top. The HOD client uses the Resource Manager interface (qsub in torque) to submit a hod process called ringmaster as a resource manager to apply for an ideal number of nodes. This job is submitted to the resource manager's central server (called Pbs_server in torque). On the compute node, the resource manager receives and processes jobs assigned by the central server (Pbs_server in torque) from the (slave) daemon (pbs_moms in torque). The ringmaster process starts running on one of the compute nodes (Mother superior in torque). After that, ringmaster runs the second Hod component hodring, the distributed task, through another interface of the resource Manager (Pbsdsh in torque) on all assigned compute nodes. After initialization, the hodring will acquire the Hadoop instruction with the Ringmaster communication and follow it. Once the Hadoop command starts, they register with the ringmaster and provide information about the daemon. The configuration files required by the Hadoop instance are all generated by Hod itself, with some options from the user setting in the profile. Hod the client to maintain and ringmaster communication to find out where the Jobtracker and HDFs daemons are located.
The following document describes how to install Hod on a node of a physical cluster.
Prerequisites
To use Hod, your system should contain the following hardware and software
Operating system: Hod is currently tested on RHEL4.
Node: Hod requires at least 3 nodes configured by the resource manager.
Software
The following components must be installed on all nodes before using Hod:
Torque: Resource Manager Python:hod requires Python 2.5.1
The following components are optional and you can install to get hod better features:
Twisted Python: This can be used to enhance the scalability of the hod. If the module is detected to be installed, HOD uses it, otherwise the default module is used. Hadoop:hod can automatically distribute Hadoop to all nodes of the cluster. However, if Hadoop is available on all nodes, Hod can also use the already installed Hadoop. Hod currently supports Hadoop 0.15 and its subsequent versions.
Note: The HOD configuration requires that the installation locations of these components remain consistent across all nodes of the cluster. If the installation location on the Submit node is the same, it will be simpler to configure.
Resource Manager
Currently, Hod uses Torque Explorer to assign nodes and submit jobs. Torque is an Open-source resource manager from cluster resources, a community based on the PBS project effort. It provides control over batch jobs and distributed compute nodes (Compute nodes). You are free to download torque from here.
All torque related documentation can be found in the Torque Resource Manager section here. You can see the wiki documentation here. If you want to subscribe to a torque mailing list or view an issue archive, visit here.
Use Hod with torque:
Install the torque component: Install Pbs_server on one node (head node), install PBS_MOM on all compute nodes, and install the PBS client on all compute nodes and submit nodes. At least do the most basic configuration, so that the torque system to run, that is, so that pbs_server can know which machines to call. See here for basic configuration. To learn about advanced configuration, see here. Create a job submission queue on the pbs_server. The queue name is the same as the Hod configuration parameter resource-manager.queue. HOD clients use this queue to submit ringmaster processes as torque jobs. Specify a cluster name as the property on all nodes of the cluster. This can be done with the Qmgr command. For example: Qmgr-c "Set node node Properties=cluster-name". The cluster name and HOD configuration parameters Hod.cluster are the same. Make sure that the job can be committed to the node. This can be done by using the Qsub command. For example, echo "Sleep 30" | Qsub-l nodes=3 Installation Hod
Now that the resource manager is installed, we then download and install Hod.
If you want to get hod from the Hadoop tar package, it is under the root directory of ' hod ' under ' contrib '. If you are compiling the source code, you can run ant tar in the Hadoop root directory to generate the Hadoop tar package. Then from the Get Hod, refer to the above. Distribute all the files in this directory to all nodes in the cluster. Note that the location of the file copy should be consistent across all nodes. Note that when you compile Hadoop, you create Hod, and you correctly set permissions for all Hod script files. Configure Hod
After installing HOD you can configure it. The minimum configuration required to run Hod is described below, and more advanced configurations are explained in the Hod Configuration Guide.
Minimum Configuration
To run Hod, the following minimum configuration is required:
on the node where you want to run Hod, edit the HODRC file <install dir>/conf directory. This file contains the minimum set of settings necessary to run Hod.
Specify the values that are appropriate for your environment for the variables defined in this configuration file. Note that some variables appear more than once in a file.
${java_home}:hadoop JAVA installation location. Hadoop supports the Sun JDK 1.5.x and above. ${cluster_name}: Cluster name, specified by ' node property ', mentioned in the Resource Manager configuration. ${hadoop_home}:hadoop the installation location on the compute node and the commit node. ${rm_queue}: Job submission queue set up in the Resource manager configuration. ${rm_home}: The resource Manager installs the location of the compute node and the submission node.
The following environment variables may need to be set, depending on your system environment. These variables must be defined where you run the Hod client, and must be specified in the Hod configuration file by setting the Resource_manager.env-vars value. Multiple variables can be specified as a comma-delimited list of key=value pairs.
hod_python_home: If Python is installed in a non-default location on a compute node or a commit node, then this value must be set to the actual location of the Python executable file. Advanced Configuration
You can review and modify other configuration options to meet your specific needs. For more information about Hod configuration, refer to the Configuration guide.
Run Hod
When the HOD is configured, you can run it. For more information, refer to the Hod User's Guide.
Support Tools and utilities
This section describes some of the support tools and applications that can be used to manage hod deployments.
logcondense.py-Manage log files
As mentioned in the Hod User's Guide, Hod can be configured to upload Hadoop logs to a configured static HDFs. As time increases, the number of logs is increasing. logcondense.py can help administrators clean up log files uploaded to HDFs.
Run logcondense.py
logcondense.py in the Hod_install_location/support folder. You can use Python to run it, such as Python logcondense.py, or grant execution permissions to run logcondense.py directly. If permissions are enabled, logcondense.py needs to have sufficient privileges to delete the user running the log files on the HDFs upload directory. For example, as mentioned in the Configuration guide, users can configure the log to be placed in their home directory on the HDFs. In this case, you need to have superuser privileges to run logcondense.py to remove log files from all users ' home directories.
logcondense.py command line Options
Logcondense.py supports the following command-line options
short option long option meaning example-p--packagehadoop the full path of the script. The version of Hadoop must be consistent with the version running HDFs. /usr/bin/hadoop-d--days deletes the path of the log file 7-c--confighadoop configuration directory that exceeds the specified number of days, Hadoop-site.xml exists in this directory. The hadoop-site.xml shall indicate the Namenode of the HDFs to be stored in the log to be deleted. /home/foo/hadoop/conf-l--logs a HDFs path, the same HDFs path must be specified with Log-destination-uri, without hdfs://URI string, as mentioned in the Configuration Guide. /user-n--dynamicdfs If you want to delete the HDFs log for true,logcondense.py in addition to deleting the Map/reduce log. Otherwise, it deletes only the Map/reduce log, which is the default behavior when this option is not specified. This option is useful for the following scenarios: A dynamic HDFs is supplied by the Hod, a static HDFs is used to collect log files-perhaps a very common use scenario in the test cluster. False
For example, if you want to delete all 7 days ago log files, hadoop-site.xml stored in ~/hadoop-conf, Hadoop installed in ~/hadoop-0.17.0, you can:
Python logcondense.py-p ~/hadoop-0.17.0/bin/hadoop-d 7-c ~/hadoop-conf-l supplied
CHECKLIMITS.SH-Monitoring resource limits
Checklimits.sh is a hod tool for TORQUE/MAUI environments (Maui Cluster Scheduler is an open source job scheduler for clustering and supercomputers, from Clusterresourcces). The checklimits.sh script updates the comment field of torque when a newly submitted job violates or exceeds the restrictions set by the user in the Maui scheduler. It uses Qstat to do a traversal in torque's job-list to determine whether the job is in the queue or has been completed, run the Maui tool checkjob check that each job violates user restriction settings, Then run Torque's qalter tool update Job's ' comment ' properties. Currently, it updates the comment value of those jobs that violate the restrictions to user-limits exceeded. Requested: ([0-9]*) Used: ([0-9]*) Maxlimit: ([0-9]*). After that, hod according to this annotation content to make corresponding processing.
Run checklimits.sh
Checklimits.sh can be found under the Hod_install_location/support directory. After having permission to execute, this shell script can be run directly through SH checklimits.sh or./checklimits.sh. The tool running on the machine should have torque and Maui binaries running, and these files will be in the path of this shell script process. In order to update the comment values for different user jobs, this tool must run with Torque administrator privileges. The tool must be run repeatedly at intervals to ensure that the job's constraints are updated, such as through Cron. Note that the Resource Manager and scheduler commands used in this script may run at a high price, so it is best not to run in a compact loop without sleeping.
Verify-account-Script to verify the account number used by the user to submit a job
Production systems typically use account systems to charge users who use shared resources. HOD supports a parameter called Resource_manager.pbs-account that allows the user to specify the account to use when submitting a job. It is necessary to verify the validity of this account in the account management system. Script Hod-install-dir/bin/verify-account provides a mechanism for users to insert custom scripts to implement this verification process.
integrated Verify-account
in Hod
Before allocating the cluster, Hod runs the Verify-account script and passes the Resource_manager.pbs-account value as a parameter to the user's custom script to complete the user's confirmation. The system can also replace its own account system in this way. If the return value in the user script is not 0, it causes the HOD to allocate the cluster to fail. And in the event of an error, HOD also prints the error messages that are generated in the script. In this way, any descriptive error message can be returned to the user from the user's script.
The default script in Hod is to not do any user verification and return 0.
If Hod does not find the Verify-account script mentioned above, Hod will assume that the user's verified functionality is turned off, and then continue to assign work later.