Process reference (and basic translation from) This article: https://jabriffa.wordpress.com/2015/02/11/ installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/and this article: https://linuxcluster.wordpress.com/2012/04/01/ enabling-torque-for-email-notification/.
This process treats the current computer as server, Compue node, scheduler, and submission host.
Step 1: Install torque from Ubuntu
Apt-get Install torque-server torque-client torque-mom torque-pam
Here the download installs the old version Torque-2.4.16. All the way yes.
Step 2: Turn off the currently open default service
/etc/init.d/torque-mom Stop/etc/init.d/torque-scheduler stop/etc/init.d/torque-server stoppbs_server-t Create
And:
Killall Pbs_server
This step is important, otherwise the next changes will be overwritten after the next pbs_server restart.
Step 3: Because Panther does not currently have an FQDN of only IP, the domain name is selected as PANTHER.NCSU.
(Note: According to the reference blog, here you need to choose a two-word server.domain form of domain name, or you may encounter problems later.)
echo PANTHER.NCSU >/etc/torque/server_nameecho PANTHER.NCSU >/var/spool/torque/server_priv/acl_svr/acl_ Hostsecho [email protected] >/var/spool/torque/server_priv/acl_svr/operatorsecho [email protected] >/var/spool /torque/server_priv/acl_svr/managers
and add this line in the/etc/hosts:
10.123.32.** PANTHER.NCSU
Step 4: Treat the computer itself as compute node
echo "PANTHER.NCSU np=4" >/var/spool/torque/server_priv/nodes
Here you can modify the NP according to the actual situation
Tell Mom_nodes the exact location of compute node:
echo PANTHER.NCSU >/var/spool/torque/mom_priv/config
Step 5: Restart the torque service
/etc/init.d/torque-server Start/etc/init.d/torque-scheduler start/etc/init.d/torque-mom Start
Step 6: Set PBS parameters
Qmgr-c ' Set Server scheduling = True ' qmgr-c ' Set server keep_completed = ' #最长时间1000小时qmgr-C ' Set server Mom_job_sy NC = True ' qmgr-c ' Create queue std ' #创建std queueqmgr-c ' Set queue batch queue_type = Execution ' qmgr-c ' Set queue batch started = True ' qmgr-c ' Set queue batch enabled = True ' qmgr-c ' Set queue batch resources_default.walltime = 10:00:00 ' Qmgr -C ' Set queue batch resources_default.nodes = 1 ' qmgr-c ' Set server Default_queue = Std '
and set Submission pool:
Qmgr-c ' Set server submit_hosts = Panther ' qmgr-c ' Set server Allow_node_submit = True '
The above selected domain name is PANTHER.NCSU, here need to choose its name,panther for submission pool
Step 8: Submit a Test task
Results:
Appendix. Set up email notifications with SSMTP: Https://help.ubuntu.com/community/EmailAlerts
Errors and Solutions:
1. Errors:
Unable to copy File/var/spool/torque/spool/15.panther.ncsu.ou to [email protected]:/home/zjyx/work/tests/pbs/fdm/ Oe.15.panther.ncsu
Error from copy
Host Key verification failed.
Lost connection
End Error Output
Output retained on this host in:/var/spool/torque/undelivered/15.panther.ncsu.ou
Solutions: (Http://torqueusers.supercluster.narkive.com/Ut2n70R1/host-key-verification-failed:Host key Verification failed)
Just try to delete ~/.ssh/known_hosts, and ssh between different nodes set up by torque. In my case, I did ssh panther.ncsu, ssh localhost, ssh panther, and SSH Panther.
Install torque Process on Ubuntu