NQS introduction Guide
Table of Contents
NQS stands for Network Queuing System. It is a system to manage (large)
programs that cannot be run interactively on a machine as they require
too much CPU-time, memory or other system resources. For that reason,
those large programs have to be run in batch (batch jobs).
NQS takes care of that batch management; based on the
job specifications NQS will start execution of jobs
when there are enough system resources available for the job to complete.
Until that time, a job request will be queued.
You need to make a batch job which contains all the
job specifications and the instructions to run your
program. In fact it is similar to a shell script, except for those extra
job specifications.
See also the examples of several job files.
Once you have created such a job file, you have to submit it to
the NQS system with the qsub command. NQS will take care of
the job from thereon.
Once the job is finished, it will disappear from the queue and you will
find its output and/or error files in the directory where you executed the
qsub command.
These are the NQS user commands :
- qsub
- Submit a batch job to the NQS system
- qdel
- Delete a batch job from the NQS queue
- qstat
- See the status of jobs in the NQS queue
- csstat
- ORCA only: See the status of jobs in the NQS queue on the CRAY
The job parameters define the status of the job. Job parameters are
recognized because they have to be written in the following way:
#QSUB-option value [entity]
as in
#QSUB-lt 1000
which means a per process timelimit of 1000 CPU seconds.
The options are the same that can be specified as arguments to the
qsub command. Here are a few
examples of frequently used options (also see the
machine specific limits):
#QSUB-q LP
- Job queue; possible queues are HP (high priority), MP (medium), LP (low)
and NI (night);
The CRAY has a VL (very low priority) queue.
On ORCA there are also special queues HPC, MPC, LPC and NIC to
submit jobs to the CRAY.
See the machine specific queues
#QSUB-lt 1000
- Time limit for each individual process in the job
#QSUB-lT 1000
- Time limit for the whole job
#QSUB-lf 100
- Permanent File Size Limit for each individual process in the job
#QSUB-lF 100
- Permanent File Size Limit for the whole job
#QSUB-lm 64
- CRAY only - Memory Size Limit for each individual process in the job
#QSUB-lM 64
- CRAY only - Memory Size Limit for the whole job
#QSUB-ls 128
- Stack size limit for each individual process in the job
#QSUB-lS 128
- Stack size limit for the whole job
#QSUB-lw 128
- ORCA only - Working set size limit for each individual process in the job
#QSUB-ld 64
- Data segment size limit for each individual process in the job
#QSUB-lD 64
- Data segment size limit for the whole job
#QSUB-eo
- redirect stdout en stderr to the same file
#QSUB-r jobname
- name of the job; is also the name of the job-output file
(named
jobname.o99999 where 99999 is the job-id)
#QSUB-s /bin/sh
- shell used to process commands
The difference between the limits specified with small letters and
capital letters (-lt vs -lT) applies to
the limits of each individual process in the job versus the overal
job limit. For example: #QSUB-lt 100 means a process
in a job can use maximum 100 CPU seconds before being terminated by
the system;
#QSUB-lT 1000 means the total CPU time limit
of the job cannot exceed 1000 CPU seconds.
You can find explanations of these and other job parameters in the
qsub man-page
Though NQS has been installed on CRAY and ORCA, there are
some slight differences in their usage. The main differences are:
- Specifying memory usage
- On the CRAY, the limit in memory usage is defined by Memory
Size Limit, whereas on ORCA Working Set Size Limit
is used to express this. Furthermore, the unit of this parameter on
the CRAY is different from the one on ORCA, MegaWords
versus Megabytes (1 Mw = 8 Mb). To set the limits on
those machines, you have to use another parameter:
- CRAY
- #QSUB-lm 1Mw
- #QSUB-lM 1Mw
- ORCA
- #QSUB-lw 4Mb
With the command qstat,
one can follow the progress of a batch job. It is most frequently invoked as
qstat -a