Small-scale jobs run on thick node queues. Configuration:
The three queue nodes are the same, with 16-Core 4-core 64-core Xeon x7350 2.93 GHz and GB memory.
X64_small: 2 nodes, 1-8 cores, 6 hours
X64_3950 5 nodes in total 1-64 cores 6 hours
X64_3950_long 11 nodes in total: 1-64 cores 144 hours
X64_small is used to run small jobs.
To run a slightly larger job, use x64_3950 or x64_3950_long. In fact, the resource usage of these two queues is more idle than that of x64_small.
A large number of jobs run in the blade queue, with a limit of over 64 Cores
The job runs for 1 minute and requires two CPU cores. One CPU core is used on a single node and submitted to the x64_small queue. The standard output file is zlt. out. The error output file is zlt. err, run the program name COMM:
[scwangj@LB270108 zjl]$ bsub -W 1 -a intemmpi -n 2 -R "span[ptile=1]" -q x64_small -o zlt.out -e zlt.err mpirun.lsf ./comm
Job <78607> is submitted to queue <x64_small>.
Submit multiple jobs at a time and write a bash script submit. Sh [in fact, this is not necessary. The command overlay method is also good]:
#!/bin/bashfor i in 50 60 70 80 90 100do bsub -W 6 -a intemmpi -n $i -R span[ptile=1] -q x64_blades -o $i.out -e $i.err ./matrixdone
View job:
[scwangj@LB270210 zl]$ bjobs -u scwangjJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME82504 scwangj PEND x64_blades lb270210 ./matrix Jul 4 12:4782505 scwangj PEND x64_blades lb270210 ./matrix Jul 4 12:4782506 scwangj PEND x64_blades lb270210 ./matrix Jul 4 12:4782507 scwangj PEND x64_blades lb270210 ./matrix Jul 4 12:4782487 scwangj PEND x64_small lb270210 ./matrix Jul 4 12:35[scwangj@LB270210 zl]$
Appendix:
[scwangj@v3903 20x20x100]$ cat submit.sh #!/bin/bashfor i in 1 2 3 4 5 6 7 8do bsub -W 5:40 -a intelmpi -n $i -R span[ptile=2] -q x64_small -o $i.out -e $i.err mpirun.lsf ./simpledone[scwangj@v3903 20x20x100]$ cd ..[scwangj@v3903 ddm]$ ls10.err 18.err 1.err 20x20x100 2.out 9.out bsubmpi ddm.sh fluid.grd serial solveuss.F solvewss.F stagsimple.F submit.log tdma.F uc.fun variable.mod10.out 18.out 1.out 2.err 9.err a.sh bsub.txt del.sh ppoisson.F simple solvevss.F s.sh submit2.sh submit.sh time.dat uc.nam[scwangj@v3903 ddm]$ cat submit.sh #!/bin/bashfor i in 1 2 3 4 5 6 7 8do bsub -W 5:40 -a intelmpi -n $i -R span[ptile=2] -q x64_small -o $i.out -e $i.err mpirun.lsf ./simpledone[scwangj@v3903 ddm]$ cat submit2.sh #!/bin/bashfor i in 9 10 11 12 13 14 15 16 17 18do bsub -W 5:40 -a intelmpi -n $i -R span[ptile=9] -q x64_3950 -o $i.out -e $i.err mpirun.lsf ./simpledone[scwangj@v3903 ddm]$ bjobs -u scwangjJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME82725 scwangj RUN x64_3950 v3903 9*t3701 * ./simple Jul 4 20:39 2*t360182726 scwangj RUN x64_3950 v3903 9*t3802 * ./simple Jul 4 20:39 3*t410282727 scwangj RUN x64_3950 v3903 9*t3701 * ./simple Jul 4 20:39 4*t380282728 scwangj RUN x64_3950 v3903 9*t3601 * ./simple Jul 4 20:39 5*t410282729 scwangj RUN x64_3950 v3903 9*t3701 * ./simple Jul 4 20:39 6*t380282730 scwangj RUN x64_3950 v3903 9*t3601 * ./simple Jul 4 20:39 7*t410282731 scwangj RUN x64_3950 v3903 9*t3701 * ./simple Jul 4 20:39 8*t380282745 scwangj RUN x64_3950 v3903 9*t3701 * ./simple Jul 4 20:4182634 scwangj RUN x64_small v3903 t4601 * ./simple Jul 4 16:5982635 scwangj RUN x64_small v3903 1*t4601 * ./simple Jul 4 16:59 1*t370182710 scwangj RUN x64_small v3903 t4601 * ./simple Jul 4 20:3782711 scwangj RUN x64_small v3903 2*t3701 * ./simple Jul 4 20:3782746 scwangj PEND x64_3950 v3903 * ./simple Jul 4 20:4182619 scwangj PEND x64_small lb270210 * ./matrix Jul 4 16:53[scwangj@v3903 ddm]$