$ scontrol show node
NodeName=mycentos6x Arch=x86_64 CoresPerSocket=2
CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.55 Features=(null)
Gres=(null)
NodeAddr=mycentos6x NodeHostName=mycentos6x Version=14.11
OS=Linux RealMemory=1000 AllocMem=0 Sockets=2 Boards=1
State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2015-07-21T09:19:03 SlurmdStartTime=2015-07-21T09:19:32
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=Low RealMemory [[email protected]2015-07-20T21:23:33]
At last, it may be that some of the jobs in front of the operation have a problem, the job status is always "CG (completing)", resulting in the node is not available.
Solutions
Run the following commands in turn
# scontrol update NodeName=<node> State=DOWN Reason=hung_completing
# /etc/init.d/slurm restart
# scontrol update NodeName=<node> State=RESUME
Then review the status
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle mycentos6x
$ scontrol show node
NodeName=mycentos6x Arch=x86_64 CoresPerSocket=2
CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.17 Features=(null)
Gres=(null)
NodeAddr=mycentos6x NodeHostName=mycentos6x Version=14.11
OS=Linux RealMemory=1000000 AllocMem=0 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2015-07-21T09:19:03 SlurmdStartTime=2015-07-21T09:23:15
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Please indicate this address in the form of a link.
This address: http://blog.csdn.net/kongxx/article/details/48193333
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Slurm node status always drained problem