NQS introduction Guide


Table of Contents

What is NQS?

NQS stands for Network Queuing System. It is a system to manage (large) programs that cannot be run interactively on a machine as they require too much CPU-time, memory or other system resources. For that reason, those large programs have to be run in batch (batch jobs).

NQS takes care of that batch management; based on the job specifications NQS will start execution of jobs when there are enough system resources available for the job to complete. Until that time, a job request will be queued.

How do I use NQS?

You need to make a batch job which contains all the job specifications and the instructions to run your program. In fact it is similar to a shell script, except for those extra job specifications. See also the examples of several job files.

Once you have created such a job file, you have to submit it to the NQS system with the qsub command. NQS will take care of the job from thereon.

Once the job is finished, it will disappear from the queue and you will find its output and/or error files in the directory where you executed the qsub command.

NQS commands

These are the NQS user commands :
qsub
Submit a batch job to the NQS system
qdel
Delete a batch job from the NQS queue
qstat
See the status of jobs in the NQS queue
csstat
ORCA only: See the status of jobs in the NQS queue on the CRAY

Defining the NQS job parameters

The job parameters define the status of the job. Job parameters are recognized because they have to be written in the following way:
#QSUB-option value [entity]
as in

#QSUB-lt 1000

which means a per process timelimit of 1000 CPU seconds.

The options are the same that can be specified as arguments to the qsub command. Here are a few examples of frequently used options (also see the machine specific limits):

#QSUB-q LP
Job queue; possible queues are HP (high priority), MP (medium), LP (low) and NI (night);
The CRAY has a VL (very low priority) queue.
On ORCA there are also special queues HPC, MPC, LPC and NIC to submit jobs to the CRAY.
See the machine specific queues
#QSUB-lt 1000
Time limit for each individual process in the job
#QSUB-lT 1000
Time limit for the whole job
#QSUB-lf 100
Permanent File Size Limit for each individual process in the job
#QSUB-lF 100
Permanent File Size Limit for the whole job
#QSUB-lm 64
CRAY only - Memory Size Limit for each individual process in the job
#QSUB-lM 64
CRAY only - Memory Size Limit for the whole job
#QSUB-ls 128
Stack size limit for each individual process in the job
#QSUB-lS 128
Stack size limit for the whole job
#QSUB-lw 128
ORCA only - Working set size limit for each individual process in the job
#QSUB-ld 64
Data segment size limit for each individual process in the job
#QSUB-lD 64
Data segment size limit for the whole job
#QSUB-eo
redirect stdout en stderr to the same file
#QSUB-r jobname
name of the job; is also the name of the job-output file (named jobname.o99999 where 99999 is the job-id)
#QSUB-s /bin/sh
shell used to process commands

The difference between the limits specified with small letters and capital letters (-lt vs -lT) applies to the limits of each individual process in the job versus the overal job limit. For example: #QSUB-lt 100 means a process in a job can use maximum 100 CPU seconds before being terminated by the system;
#QSUB-lT 1000 means the total CPU time limit of the job cannot exceed 1000 CPU seconds.

You can find explanations of these and other job parameters in the qsub man-page

Differences between NQS on CRAY and ORCA

Though NQS has been installed on CRAY and ORCA, there are some slight differences in their usage. The main differences are:
Specifying memory usage
On the CRAY, the limit in memory usage is defined by Memory Size Limit, whereas on ORCA Working Set Size Limit is used to express this. Furthermore, the unit of this parameter on the CRAY is different from the one on ORCA, MegaWords versus Megabytes (1 Mw = 8 Mb). To set the limits on those machines, you have to use another parameter:

CRAY
#QSUB-lm 1Mw
#QSUB-lM 1Mw
ORCA
#QSUB-lw 4Mb

NQS Limits at the Computing Centre

Following the job progress

With the command qstat, one can follow the progress of a batch job. It is most frequently invoked as
	qstat -a

Examples of NQS jobs


VUB/ULB Computing Centre, 10 April 1998
Email: User Support Group.