bsub.1
NAME
bsub - submit a job for batched execution by the LSF system
SYNOPSIS
bsub [ -h ] [ -V ] [ -H ] [ -x ] [ -r ] [ -N ] [ -B ] [ -I |
-Ip | -Is | -K ]
[ - T time_event ] [ [ -
X "exception_cond([params])::action" ] ... ] [ -
w depend_cond ] [ - q queue_name ... ] [ -
m host_name[+[pref_level]] ... ] [ -
n min_proc[,max_proc] ] [ - R res_req ] [ -
J job_name_spec ] [ -b begin_time ] [ -t term_time ] [
-i in_file ] [ - o out_file ] [ - e err_file ] [ -
u mail_user ] [ [ -f "lfile op [ rfile ]" ] ... ] [ -
E "pre_exec_command [ argument ... ]" ] [ -
c cpu_limit[/host_spec ] ] [ -
W run_limit[/host_spec ] ] [ - F file_limit ] [ -
M mem_limit ] [ -D data_limit ] [ -S stack_limit ] [ -
C core_limit ] [ -k "chkpnt_dir [ chkpnt_period ]" ] [
-w depend_cond ] [ -L login_shell ] [ -P project_name ]
[ -G user_group ] [ command [ argument ... ] ]
DESCRIPTION
Submit a job for batched execution on host(s) that satisfy
the resource requirements of the job and can provide a fast
turnaround time. If the load of all the candidate hosts is
too high, or some specific conditions configured in LSF are
not satisfied, the job will be executed later when system
resources become available and the conditions are satisfied.
This allows the system to restrict the number of jobs that
are executed simultaneously so as to keep system overhead
low, and to adjust the number of started jobs based on the
current system load. Jobs are started and suspended accord-
ing to the current system load.
The job is submitted to a batch job queue configured in the
LSF system in the local cluster. To get information about
the queues, see bqueues(1). LSF may automatically select an
appropriate queue for a job if a specific queue is not
specified (see option -q below). If the job is successfully
submitted, then a unique job ID (a positive number) is
printed together with the queue to which the job has been
submitted. This job ID can be used to operate on the job.
The batch job can be specified by the command line argument
command, or through the standard input if the command is not
present on the command line. The command can be anything
that is provided to a UNIX Bourne shell (see sh(1)). Command
is assumed to begin with the first word that is not part of
a bsub option. All arguments that follow command are pro-
vided as the arguments to the command.
If the batch job is not given on the command line, bsub
reads the job commands from standard input. If the standard
input is a controlling terminal, the user is prompted with
"bsub>" for the commands of the job. The input is ter-
minated by entering CTRL-D on a new line. You can submit
multiple commands through standard input. The commands are
executed in the order in which they are given. bsub options
can also be specified in the standard input if the line
begins with #BSUB; e.g., "#BSUB -x". If an option is given
on both the bsub command line, and in the standard input,
the command line option overrides the option in the standard
input. The user can specify the shell to run the commands
by specifying the shell pathname in the first line of the
standard input; e.g., "#! /bin/csh". If the shell is not
given in the first line, the Bourne shell is used. The
standard input facility can be used to spool a user's job
script; e.g., "bsub < script". See EXAMPLES below for exam-
ples of specifying commands through standard input.
The user's execution environment, including the current
working directory, file creation mask, and all the environ-
ment variables, is set for the batch job. In addition, a
number of LSF environment variables are set before starting
the batch job (see the section EXECUTION ENVIRONMENT VARI-
ABLES below).
By default LSF assumes that a uniform user name and user ID
spaces exist among all the hosts in the cluster. That is, a
job submitted by a given user will run under the same user's
account on the execution host. For situations where non-
uniform user ID/user name space exists, account mapping must
be used to determine the account used to run a job. See the
section ACCOUNT MAPPING below.
When a job is executed, command line and stdout/stderr
buffers are stored in the directory home_directory/.lsbatch
on the execution host. The directory given in the
/etc/passwd file on the execution host is used as the job's
home directory. If this directory is not accessible,
/tmp/.lsbtmp<userId> is used as the job's home directory.
If the current working directory is under the home directory
on the submission host, then the current working directory
is also set to be the same relative directory under the home
directory on the execution host. The job is run in /tmp if
the current working directory is not accessible on the exe-
cution host.
Parallel jobs are typically submitted with the -n option
having max_proc greater than one. This allows the job to
initially use multiple processors. The job is dispatched to,
and started on, the first host chosen by the LSF system, and
the environment variable LSB_HOSTS contains the list of
chosen host names.
LSF Administrators can configure external submission and
execution time executables to perform site-specific actions
on jobs (see esub(8) and eexec(8)).
OPTIONS
-h Print command usage to stderr and exit.
-V Print LSF release version to stderr and exit.
-H Hold the job in the PSUSP state when the job is submit-
ted. The job will not be scheduled until bresume is
issued. When the job is finished, the job is requeued
to the PEND state if the job is repetitive. If the job
is non-repetitive, the job will be requeued to the
PSUSP state.
-x Exclusive execution mode. The job is running
exclusively on a host, i.e., prior to the execution of
the job, no other batch jobs are started on the host.
The host status becomes closed_Excl, as is shown by
bhostsfR(1). No other jobs, with the exception of jobs
being forced to run by brun(1), are dispatched to the
host until the job completes. In addition, the host
status is shown to be lockU by lsload(1), i.e, the host
will not be selected by LIM in response to any place-
ment requests, made by lsplace(1), lsrun(1) or lsgrun
and any other load sharing application until the job
completes. However, user can still override the LIM's
decision by explicitly specifying the locked host,
using the -m option of lsrun(1) and lsgrun(1) for exam-
ple, to force an interactive job to run on the host.
You cannot submit an exclusive job unless the queue is
configured to allow exclusive jobs.
-T time_event
A time event is used to define a job which runs repeti-
tively based on a particular set of days and time
within a day. The time_event is of the form:
[cal_name[@user_name]:]hour:minute[%dur]
cal_name is an existing calendar name defined in the
system and is used to select the days on which the job
will run. If cal_name is not specified a daily calen-
dar is assumed. If user_name is not specified, the sys-
tem will try to find a calendar owned by the current
user with the name cal_name. If not found, it will look
for a calendar owned by the system (SYS). A set of
calendar commands are available for users to define,
view and manipulate calendars (see bcal(1), bcadd(1),
bcmod(1), bcdel(1) and bchist(1)). hour:minute speci-
fies the time within each calendar day when a time
event begins. hour and minute can a number, a comma
separated list of numbers, or two numbers separated by
- to indicate a range. For hour, the numbers must
between 0 and 23. For minute the numbers must be
between 0 and 59. The wildcard character * can be used
in the hour and minute fields to indicate every hour or
minute, respectively. dur is the duration of the time
event specified in minutes. The default duration is 3
minutes. The job should be scheduled and start within
the duration of the time event. If a calendar referred
to in the time event is removed, then the job will not
be scheduled until the time event is modified (see
bmod(1)).
The following are examples of time_event :
"payrollDay:8:0%15"
"backupDay:*:0%50"
"*:0,5,10,15,20,25,30,35,40,45,50,55"
"weekend:0-8:0%30"
-X "except_cond([params])::action"
Specify exception handlers for a job. An exception
handler tells how the system should respond when an
exceptional condition occurs on a job. It consists of
an exception condition name and action pair. When the
exception condition specified is detected, the action
is performed. Multiple -X options can be specified to
handle different exceptions.
exception_cond is the name of an exception condition
that can be detected by the system. The valid exception
conditions names are: missched, overrun, underrun,
abend, startfail, cantrun, and hostfail. Some exception
conditions require parameters to be specified.
The missched exception occurs when a job has not been
scheduled within the time event specified in the -T
option. No parameters are required for the missched
exception.
The overrun exception occurs when the job has been run-
ning too long. It requires a single integer parameter,
maxtime which represents the maximum time in minutes.
If the job has not finished after maxtime minutes of
starting, the overrun exception condition is detected.
The underrun exception takes a single integer parame-
ter, mintime, that represents the minimum running time
for the job. If the job finishes within mintime minutes
of starting, the underrun exception is detected.
The abend exception is used to specify an abnormal ter-
mination condition for a job based on exit code. The
parameter consist of one or more exit code values
separated by comma. Two exit code values separated by
'-' to indicate a range of values can be used in place
of a single value. If the job exits with one of the
values specified, the abend condition is detected.
The startfail exception condition is detected if the
system failed to start a job due to the lack of system
resources (e.g process, memory, user account), neces-
sary to setup a job for execution.
The cantrun exception is set when it is determined that
the dependency condition specified by the -w option is
invalid or if the startfail exception occurs 20 times
in a row. For jobs submitted with a time event via the
-T option, the cantrun exception condition can be
detected once in each time event. If the reason for the
cantrun exception is due to failure to start the job 20
times, the job is also suspended.
The hostfail exception is detected when the host on
which the job is running is determined to be unavail-
able.
One of the following actions be specified for each
exception condition: alarm(alarm_severity,
alarm_name), setexcept(event_name) rerun, kill. The
alarm action interfaces with LSF Alarm Management sys-
tem to record a new incident of the alarm with name
alarm_name. The alarm management system will invoke the
configured notification method. The alarm_severity is a
number indicating the severity of the alarm. The alarm
name and severity together with information about the
job is passed to the alarm management system. (See
raisalarm(1) balarms(1) for creating, viewing, ack-
nowledging and resolving an alarm incident.)
The action setexcept causes the exception event
event_name to be set. Other jobs waiting on the excep-
tion event event_name specified through the -w option
can be triggered. event_name is an arbitrary string.
The action rerun causes the job to be rescheduled for
execution. Any dependencies associated with the job
must be satisfied before re-execution takes place. The
rerun action can only be specified for the abend and
hostfail exception condition. For the startfail excep-
tion condition automatically triggers the rerun action.
The action kill causes the the current execution of the
job to be terminated. This action can only be specified
for the overrun exception condition.
There can only be one handler for a each exception in a
job. However, the abend exception can have multiple
handlers to specify different actions for different
exit codes.
The following are examples of exception handlers:
abend(2)::alarm(5,pageAdmin)
abend(1,10-20,25)::rerun
abend(83)::setexcept(dataBaseErr)
missched()::alarm(1,pageAdmin)
overrun(60)::kill
underrun(5)::rerun
-w depend_cond
The logic expression depend_cond specifies the condi-
tions that the submitted job depends on. Only when
depend_cond is satisfied (TRUE), will the job be con-
sidered for dispatch. The successful dispatching of the
job is also subject to batch scheduling policies con-
figured.
The dependency logic expression evaluates to a binary
value of either TRUE or FALSE and is composed of job or
system conditions (see below for the definition) using
AND/OR/NOT ('&&', '||' and '!') logic operators.
Parentheses '(' and ')' can be used to alter the pre-
cedence of the logical operations.
As shell interprets many of the logical operators and
parentheses, a valid dependency condition should in
general be quoted by (") or ('), except when
depend_cond is a single job ID or job name (see Job
status conditions below).
The job-dependency conditions may be job status,
calendar conditions, file event conditions and user
event conditions, as described in detail below.
Job status conditions:
started( jobId | [group_spec/][job_name] )
If the specified batch job has started run-
ning or has already finished, the condition
is TRUE; otherwise FALSE. If the job is
dependent on "group/", it is equal to
"group/*" which depends on all jobs under the
group.
done( jobId | [group_spec/][job_name] )
If the specified batch job has finished suc-
cessfully and thus is in the DONE state, the
condition is TRUE, otherwise FALSE.
exit( jobId | [group_spec/][job_name] )
If the specified batch job is in the EXIT
state, the condition is TRUE, otherwise
FALSE.
exit( jobId | [group_spec/][job_name], exit_code )
If the specified batch job is in the EXIT
state, and the job exited with exit_code, the
condition is TRUE, otherwise FALSE. The
reserved exit_code of 512 indicates the job
exited while pending.
exit( jobId | [group_spec/][job_name], op exit_code )
If the specified batch job is in the EXIT
state, and the job's exit value is op
exit_code, the condition is TRUE, otherwise
FALSE. op is one of >, >=, <, <=, ==, or !=.
The reserved exit_code of 512 indicates the
job exited while pending.
ended( jobId | job_name )
If the specified batch job has finished, the
condition is TRUE, otherwise FALSE.
The job_name can be preceded by a group specification
to indicate a dependency on a job belonging to a par-
ticular group. See bgadd(1) for the group specification
syntax. If the group_spec is not specified, the job
must exist under the root group, /. The -J option
allows you to submit a job to a particular group.
If only the jobId|[group_spec/]job_name is specified,
the system assumes it means
done(jobId | [group_spec/]job_name)). Note that a
numeric job name should be doubly quoted, e.g. -w
"'210'" since the shell treats -w "210" the same as -
w 210 which is treated as a job with jobId being 210.
While JobId may be used to specify jobs of any users,
Job name can only be used to specify the user's own
jobs. job_name* indicates all the job names that begin
with the same string job_name (e.g. job_name,
job_name1, job_name_test, job_name.log).
If any one of the specified jobId or job_name is non-
existent in the system, the job submission will be
rejected.
File event conditions:
file( file_condition_expression )
File event conditions are defined as a logical expres-
sion in terms of the following four file status func-
tions:
arrival( file_location_spec )
TRUE when file specified by file_location_spec
arrives. The arrival of a file refers to the tran-
sition from non-existence to existence of the
file. The format of file_location_spec is dis-
cussed below.
exist( file_location_spec )
If file specified by file_location_spec exists,
return TRUE; otherwise return FALSE. The format of
file_location_spec is discussed below.
size( file_location_spec )
If file specified by file_location_spec exists,
return the size of file in bytes; otherwise return
0. The format of file_location_spec is discussed
below.
age( file_location_spec )
If file specified by file_location_spec exists,
return the file age in minutes since the last
modification; otherwise return 0. The format of
file_location_spec is discussed below.
file_location_spec takes the form
[hostName:]absolute_path/fileName, where hostName is
the name of the host on which the file can be accessed.
Note that hostName does not have to be the name of the
host on which the job executes. If hostName is not
specified in file_location_spec, then the system
assumes that the file is accessible from any host.
The file condition expression can be composed of the
above file status functions and numbers using logic
operators '&&', '||' and Here are some examples of file
condition expressions:
file (exist(/data/tmp.log) && size(/data/tmp.log) >=
3.5M)
file (size(/data/tmp.log) > 3M || age(/data/tmp.log)>
60)
User event conditions:
event( event_spec ) or ev( event_spec )
Either a file or a user event condition can be used to
specify a dependency on an external event the LSF system
administrator has installed for the system. This function is
available only if the site has installed an external event
daemon (see eeventd(8)). The format of event_spec is site
specific and has to be designed by the site. The whole
string event_spec is passed to and processed by the external
event daemon. This function allows the site to define site
dependent events (e.g., tape silo status) that will be used
by the system in job scheduling. A separate command exists
for viewing events (see bevents(1) ).
The time event specification described in the -T option can
be used in the dependency condition to specify a time window
which can be use in combination with other events to trigger
a job only if the event occurs within the time window.
Exception conditions:
exception( event_name )
An exception event can be used to specify a dependency on an
exception in another job. The exception handler of the
another job belonging to the same user must set event_name
using the setexcept action (see -X option) When an exception
handler sets the exception event, the event will be set to
TRUE for all jobs waiting on the event. event_name is an
arbitrary string specified by the user.
Job group conditions:
active(group_spec)
TRUE if the group is in the ACTIVE state.
inactive(group_spec)
TRUE if the group is in the INACTIVE state.
hold(group_spec)
TRUE if the group is in the HOLD state.
numrun(group_spec, op num))
TRUE if the number of jobs in RUN state satisfy
the test.
numpend(group_spec, op num))
TRUE if the number of jobs in PEND state satisfy
the test.
numdone(group_spec, op num))
TRUE if the number of jobs in DONE state satisfy
the test.
numexit(group_spec, op num))
TRUE if the number of jobs in EXIT state satisfy
the test.
numended(group_spec, op num))
TRUE if the total number of jobs in the DONE or
EXIT state satisfy the test.
numstart(group_spec, op num))
TRUE if the total number of jobs in the RUN, USUSP
or SSUSP state satisfy the test.
Job group conditions can be used to specify a dependency on
the job group state or on the value of the counters associ-
ated with a job group. The counters keep track of the
number of jobs in various states. group_spec is the path-
name of a group. See bgadd(1) for the group specification
syntax. op is one of >, >=, <, <=, ==, or !=. num is a
positive integer or the wildcard character * to indicate the
total number of jobs within the group.
Job array conditions:
numrun(array_jobId, op num))
TRUE if the number of jobs in RUN state satisfy
the test.
numpend(array_jobId, op num))
TRUE if the number of jobs in PEND state satisfy
the test.
numdone(array_jobId, op num))
TRUE if the number of jobs in DONE state satisfy
the test.
numexit(array_jobId, op num))
TRUE if the number of jobs in EXIT state satisfy
the test.
numended(array_jobId, op num))
TRUE if the total number of jobs in the DONE or
EXIT state satisfy the test.
numhold(array_jobId, op num))
TRUE if the number of jobs in PSUSP state satisfy
the test.
numstart(array_jobId, op num))
TRUE if the total number of jobs in the RUN, USUSP
or SSUSP state satisfy the test.
Job array conditions can be used to specify a dependency on
the the value of the counters associated with a job array.
The counters keep track of the number of jobs in various
states. array_jobId is the job Id of an array job. See -J
option for creating an array job. op is one of >, >=, <, <=,
==, or !=. num is a positive integer or the wildcard char-
acter * to indicate the total number of jobs within the
array. If * is specified the number op should be omitted.
The following are examples of depend_cond :
"done(1351) && ended(job1) && (started(job2) ||
exit(job3))"
"1351 || job2 || started(job3)"
"done(job_name*) && ended(test_name*)"
"event(ready(Tape1)) && event (ready(Tape3))"
"event(swap_great_100M)"
"file(exist(/tmp/event.log)&&size(</tmp/event.log>)>=3.5M)"
"done(simu)&&file(arrival(klee:/u/database/report))"
"exception(dataBaseErr)"
numended(/daily/backup,*)
active(/daily/reports)
-r Specify that the job can be rerun. If the execution
host of the job is considered to be unavailable, the
LSF requeues this job in the same job queue, and reruns
it from its start when a suitable host is found, as if
the job were submitted as a new job. The same job ID
is used. The user who submitted the failed job
receives a mail informing of the failure and the
requeueing of the job.
For a job that is checkpointed (see -k option and
bchkpnt(1)) before the execution host becomes unavail-
able, the job is restarted from the last checkpoint.
The restarted job is requeued for execution in the same
manner as for a job that is restarted using the bres-
tart command (see brestart(1)). In order for the job
to be successfully restarted, the job's checkpoint
directory must reside in a shared file system accessi-
ble to the previous host and the host receiving the
restarted job.
-N Send the job report to the submitter by mail when the
job finishes. If the -o option is not given, the
information that would otherwise be stored in the file
out_file specified through -o option will be included
in the mail.
If neither -N nor -o is specified, the job report and
the information that would otherwise be stored in the
file out_file is sent by mail to the submitter unless
the job is an LSF Batch interactive job (see the -I
option) in which case the output is sent to the user's
terminal.
-B Send mail to the submitter when the job is dispatched
and begins execution. The default is not to send such
mail.
-I Submit an LSF Batch interactive job. Unlike a non-
interactive job, bsub blocks until the job is ter-
minated.
Terminal support is available for a batch interactive
job. If the -i option is not given, the user can
interact with the job's standard input via the termi-
nal. If the -o (or -e) option is not given, the job's
standard output (or standard error) is sent to the
user's terminal instead of by mail. No job report is
sent to the user by mail unless the -N option is given.
-Ip Submit a batch interactive job and create a pseudo-
terminal when the job starts. Some applications (e.g.,
vi) require a pseudo-terminal in order to run
correctly. See the -I option for the description of an
interactive job.
-Is Submit a batch interactive job and create a pseudo-
terminal with shell mode support when the job starts.
This option should be specified for submitting interac-
tive shells, or applications which redefine the ctrl-C
and ctrl-Z keys (e.g., jove). See the -I option for
the description of a batch interactive job.
-K Submit a batch job and wait for the job to complete.
In case the job needs to be rerun due to transient
failures, the command will return after the job fin-
ishes. The bsub command returns the same value as the
job upon completion. The bsub command exits with value
126 if the job was terminated while pending.
-q queue_name ...
Submit the job to one of the queues specified by
queue_name .... This can be either a single queue
name, or a list of queue names defined in the LSF sys-
tem. In the latter case, the list must be enclosed by
quotation marks (" " or ' '). Queues are usually named
to correspond to the type of jobs usually submitted to
them, or to the type of services they provide.
When a list of queue names is specified, LSF selects an
appropriate queue in the list for the job based on the
job's parameters, job's resource limits and other res-
trictions, such as the requested host(s), user's acces-
sibility to a queue, queue status (closed or open),
whether a queue can accept exclusive jobs, etc. The
order in which the queues are considered in selection
is the same order in which these queues are listed; the
queue listed first is considered first.
If this option is absent, the user default queue list
specified by the user's environment variable
LSB_DEFAULTQUEUE is used for the queue selection. If
neither this option nor LSB_DEFAULTQUEUE is present,
the system default queue list specified by the LSF
administrator in the lsb.params configuration file is
used (see lsb.params(5) for parameter DEFAULT_QUEUE).
-m host_name[+[pref_level]] ...
Limit the candidate hosts for executing this job to
those specified by host_name ... . This can be either a
single host name, or a list of host names or host group
names defined by the LSF system. In the latter case,
the list must be enclosed by quotation marks (" " or '
'). You can find membership of a host group using the
bmgroup command.
The + after the host name is used to specify the
preference for dispatching a job to that host. If the
host preference is not given, hosts are ordered by load
if several hosts can satisfy the resource requirement
of a job. pref_level is a positive number specifying
the preference level of that host. The larger the
number, the higher the preference for that host or host
group. The special host_name, "others", can be used to
refer to other hosts not listed. E.g., -m "orange+
others" specifies that orange is preferred over all
other hosts in the queue.
If a job queue is specified using the -q option, then
the host list of that queue (see bqueues(1)) must
include all the hosts specified by this option for the
job to be acceptable. The default is to use the hosts
of the queue as candidates, which satisfy -R option.
-n min_proc[,max_proc]
The minimum and maximum numbers of processors requested
to run the (parallel) job. min_proc and max_proc are
integers. If only one integer is specified, it applies
to both min_proc and max_proc. The default of this
option assumes only one processor is requested.
If the max_proc is greater than PROCLIMIT of a queue to
which a job is submitted, LSF will reject the job (see
lsb.queues(5)). In an LSF MultiCluster environment,
if a queue exports jobs to remote clusters (see
SNDJOBS_TO in lsb.queues(5)), then PROCLIMIT is not
imposed on jobs submitted to this queue.
After accepting a parallel job, LSF searches for hosts
that both meet the resource requirements of the job and
are lightly loaded. Once the min_proc number of such
processors is available (some may be on the same mul-
tiprocessor host), the job is dispatched to the first
host selected, with the list of selected host names for
the job specified in the environment variable
LSB_HOSTS. The job itself is expected to start paral-
lel components on these hosts and establish communica-
tion among them, optionally using the Remote Execution
Server (RES) of LSF.
-R res_req
Resource requirement. If this option is not specified,
LSF tries to obtain resource requirement information
for the command from the remote task list that is main-
tained by the load sharing library (see lsfintro(1)).
Any run queue length specific resource, such as `r15s',
`r1m' or `r15m', specified in the res_req refers to the
effective queue length. If the command is not listed
in the remote task list or the specified resource
requirement res_req does not contain any host type or
model related specifiers, such as type or model, or
boolean host or model resources (see lsfintro(1)), the
default resource requirement is to run command on a
host or hosts that are of the same host type (see
lshosts(1)) as the submission host.
-J job_name_spec
Assign the character string specified by job_name_spec
to the job. You can later use this job_name_spec to
identify this job. The default job_name is the com-
mand. The job name need not be unique.
To place a job in a particular job group, the job_name
can include a group specification before the job name.
See bgadd(1) for the syntax of group specification.
Specifying, /a/b/myJob submits the job with name myJob
under the group /a/b.
job_name_spec is either a job_name or job_name followed
by an index list index_list which is used to submit a
job array. A job array is submitted using the syntax:
job_name_spec[index | start_index-end_index:step, ...
]. where index, start_index, end_index and step are
integers. The characters '[' and ']' are reserved char-
acters that cannot be part of the job name. If the
start_index is omitted, 1 is assumed.
All jobs of an array in one submission share the same
job Id and parameters. Each array job is distinguished
by its array index. Since job names are not unique,
multiple job arrays may have the same name with a dif-
ferent or same set of indices.
-b begin_time
Dispatch the job for execution on or after begin_time.
begin_time is in the form of [[month:]day:]hour:minute
where month is 1-12, day is 1-31, hour is 0-23, and
minute is 0-59. The time refers to the next matching
wall clock time. The default is to start the job as
soon as possible. If -b is used, then at least two
fields must be given. These fields are assumed to be
hour:minute. If three fields are given, they are
assumed to be day:hour:minute, and four fields are
assumed to be month:day:hour:minute.
-t term_time
The job termination deadline. If a UNIX job is still
running at term_time, it is sent a SIGUSR2 signal, and
killed if it does not terminate within 10 minutes. If a
Windows NT job is still running at term_time, it is
killed immediately. For a detailed description of how
these jobs are killed, refer to the default behaviour
of the bkill command. In the queue definition, a TER-
MINATE action can be configured to override the above
default action ( see the description of JOB_CONTROLS in
lsb.queues(5)). term_time is in the same form as
begin_time for the -b option.
-i in_file
The batch job gets its standard input from file
in_file. in_file is a file path name. Default is
/dev/null (no input). If the file in_file is not found
on the execution host, the file is copied from the sub-
mission host to a temporary file in the user's
$HOME/.lsbatch directory on the execution host. This
file is removed when the job completes. The file copy
can be performed only if RES is running on the submis-
sion host, or if the user has allowed rcp access (see
rcp(1)).
If the special characters %J are specified in in_file
then they are replaced by the jobId of the job. If the
special characters %I are specified in in_file then
they are replaced by the index of the job in the array
if the job is a member of an array and 0 otherwise. See
-J option for creating array jobs.
-o out_file
Store the standard output of the job to the file
out_file. If the out_file file already exists, the job
output is appended to it. out_file is a file path name.
If -e is not present, the standard error of the job
will also be stored to file out_file. If -N is not
present, the job report is output as the header of file
out_file.
If neither -o nor -N is specified, the job report and
the information that would otherwise be stored in the
file out_file is sent by mail to the submitter unless
the job is a batch interactive job (see the -I option)
in which case the output is sent to the user's termi-
nal. If neither -o nor -N is specified, the behavior
depends on whether the job is repetitive or not. For a
non-repetitive job, the information that would other-
wise be stored in the file out_file is sent by mail to
the submitter. For a repetitive job, this information
is lost.
If the special characters %J are specified in out_file
then they are replaced by the jobId of the job. If the
special characters %I are specified in out_file then
they are replaced by the index of the job in the array
if the job is a member of an array and 0 otherwise. See
-J option for creating array jobs.
-e err_file
Store the standard error output of the job to the file
err_file. For default, see the -o option. If the spe-
cial character %I is specified in err_file then they
are replaced by the index of the job in the array if
the job is a member of an array and 0 otherwise. See -J
option for creating array jobs.
-u mail_user
Send mail to a specified email destination. If -u is
not present, the default destination LSB_MAILTO in
lsf.conf will be used. If the special characters %J
are specified in err_file then they are replaced by the
jobId of the job.
" -f "lfile op [ rfile ]"
Copy a file between the local (submission) host and the
remote (execution) host. lfile/rfile can be an abso-
lute or a relative path name of a file that is avail-
able on the local/remote host. If one is not speci-
fied, it defaults to the other, which must be given.
Use multiple -f options to specify multiple files.
op is an operator that specifies whether the file is
copied to the remote host, or whether it is copied back
from the remote host. op must be surrounded by white
space. The following describes the op operators:
`>' copy lfile to rfile before job starts. rfile
is overwritten if it exists.
`<' copy rfile to lfile after the job completes.
lfile is overwritten if it exists.
`<<' append rfile to lfile after the job com-
pletes. lfile is created if it does not exist.
`><' and `<>' : equivalent to performing `>' and
then the `<' operation. `<>' is the same as `><'.
The stdin file is copied to a temporary file on the
remote host at execution time if it is not found on
that host (see the -i option description). The stdout
and stderr files must be explicitly specified using the
-f option if the user wants those files to be copied
back to the submission host when job execution com-
pletes.
If the local and remote hosts have different file name
spaces, you must always specify relative path names.
If the local and remote hosts do not share the same
file system, you must ensure that the directory con-
taining rfile exists. It is recommended that only the
file name be given for rfile when running in hetero-
geneous file systems; this places the file in the job's
current working directory. If the file is shared
between the submission and execution hosts, then no
file copy is performed.
LSF uses the lsrcp (see lsrcp(1)) command to transfer
files. lsrcp contacts the RES on the remote host to
perform the file transfer. If the RES is not available,
rcp (see rcp(1)) is used. The user must ensure that
the rcp binary is in the user's $PATH on the execution
host.
Jobs that are submitted from LSF client hosts should
specify the -f option only if rcp is allowed. Simi-
larly, rcp must be allowed if account mapping is used.
-E pre_exec_command [ arguments ... ]
Execute the pre_exec_command on the host to which the
batch job is dispatched to run (or on the first host
selected for the parallel batch job) before actually
running the batch job. If the pre_exec_command exits
with 0, then the real job is started on the host, oth-
erwise the job goes back to PEND status and is
rescheduled later.
LSF assumes that the pre_exec_command can be run many
times without having side effects.
Standard input and output are directed to the same
files as for the job. The pre_exec_command is run
under the same user ID, environment, and home and work-
ing directory as the batch job. If the pre_exec_command
is not in the user's normal execution path (the $PATH
variable), the full path name of the command must be
specified.
-c cpu_limit[/host_spec]
Set the total CPU time limit to cpu_limit for the batch
job. The default is no limit. This option is useful
for preventing erroneous jobs from running away or
using up too much resource. A SIGXCPU signal is first
sent to the job, followed by SIGKILL when the total CPU
time for the whole job has reached the limit.
cpu_limit is in the form of [hour:]minute, where minute
can be greater than 59. So, three and a half hours can
either be specified as 3:30, or 210. Optionally, a
host name or a host model name defined in LSF can be
provided as host_spec following cpu_limit and a `/'
character. (See lsinfo(1) to get host model informa-
tion.) host_spec is also used in the option -W. In its
absence, the system default is assumed (see the
description of DEFAULT_HOST_SPEC in lsb.queues(5)); if
the system default is not defined, the host model of
the local machine is assumed. The appropriate CPU
scaling factor for the specified host or host model
defined in LSF is used to adjust the actual CPU time
limit at the execution host by multiplying the factor
of host_spec and dividing the factor of the execution
host.
-W run_limit[/host_spec]
Set the wall-clock run time limit of this batch job.
The default is no limit. If the accumulated time the
job has spent in RUN state exceeds this limit, it is
terminated. A UNIX job is sent a SIGUSR2 signal, then
killed if it does not terminate within 10 minutes. A
Windows NT job is killed immediately. For a detailed
description of how these jobs are killed, refer to the
default behaviour of the bkill command.
run_limit is in the same form as cpu_limit of the -c
option. host_spec is the same as in the -c option. If
the job also has termination time specified (see the -t
option), LSF determines whether the job can actually
run for the specified length of time allowed by the run
limit before the termination time. If not, then the
job will be aborted. To override this behavior, use
the IGNORE_DEADLINE queue option (see lsb.queues(5)).
-F file_limit
Set a per-process (soft) file size limit for each of
the processes that belong to this batch job (see
getrlimit(2)). The default is no soft limit. If a
process of this job attempts to write to a file such
that the file size would exceeds file_limit KBytes,
that process is sent a SIGXFSZ signal. This condition
normally terminates the process, but may be caught. On
HP-UX, the file size limit cannot be set, so this
option has no effect.
-M mem_limit
Set the total process resident set size limit to
mem_limit KBytes for the whole job. The default is no
limit. Exceeding the limit causes the job to ter-
minate.
-D data_limit
Set a per-process (soft) data segment size limit for
each of the processes that belong to this batch job
(see getrlimit(2)). The default is no soft limit. A
sbrk call to extend the data segment beyond data_limit
KBytes will return an error. On HP-UX, a data size
limit cannot be set, so this option has no effect.
-S stack_limit
Set a per-process (soft) stack segment size limit for
each of the processes that belong to this batch job
(see getrlimit(2)). The default is no soft limit. On
HP-UX, a stack size limit cannot be set, so this option
has no effect.
-C core_limit
Set a per-process (soft) core file size limit for all
the processes that belong to this batch job (see
getrlimit(2)). The default is no soft limit. If a
process of this job attempts to create a core file
beyond core_limit KBytes, then that process will be
sent a SIGXFSZ signal or the writing of a core file
will terminate at this limit, depending on the UNIX
systems on different platforms. On HP-UX, a core file
size limit cannot be set, so this option has no effect.
-k chkpnt_dir [ chkpnt_period ]
The job is specified as checkpointable. Optionally, a
checkpoint period of chkpnt_period minutes may be
specified. Quotation marks (") or (') must surround
chkpnt_dir and chkpnt_period if the checkpoint period
is given, e.g., -k "job1chkdir 10". The checkpoint
directory can be a relative or absolute pathname, and
is used for restarting the job (see brestart(1)).
Multiple jobs can checkpoint into the same directory.
When a job is checkpointed, the checkpoint information
is stored in chkpnt_dir/job_ID. The chkpnt_period must
be a positive integer.
The running job is checkpointed automatically every
chkpnt_period minutes if chkpnt_period is given. The
checkpoint period can be changed using bchkpnt(1).
Because checkpointing is a heavyweight operation, it is
suggested that the checkpoint period be greater than
half an hour.
Process checkpointing is not available on all host
types, and may require linking programs with a special
library (see libckpt.a(3)). If this option is not
specified, the job is considered as non-checkpointable.
LSF invokes the echkpnt (see echkpnt(8)) executable
found in LSF_SERVERDIR to checkpoint the job. A user
can override the default echkpnt executable for the job
by defining the environment variable, $ECHKPNTDIR, to
point to user's own echkpnt executable. This allows a
user to use other checkpointing facilities, including
application level checkpointing.
-w depend_cond
The logic expression depend_cond specifies the condi-
tions that the submitted job depends on. Only when
depend_cond is satisfied (TRUE), will the job be con-
sidered for dispatch. The successful dispatching of the
job is also subject to batch scheduling policies con-
figured.
The dependency logic expression evaluates to a binary
value of either TRUE or FALSE and is composed of job
conditions (see below for the definition) using
AND/OR/NOT ('&&', '||' and '!') logic operators.
Parentheses '(' and ')' can be used to alter the pre-
cedence of the logical operations.
As shell interprets many of the logical operators and
parentheses, a valid dependency condition should in
general be quoted by (") or ('), except when
depend_cond is a single job ID or job name (see Job
status conditions below).
The job-dependency conditions are job status conditions
or job array conditions, as described in detail below.
Job status conditions:
started( jobId | job_name )
If the specified batch job has started run-
ning or has already finished, the condition
is TRUE; otherwise FALSE.
done( jobId | job_name )
If the specified batch job has finished suc-
cessfully and thus is in the DONE state, the
condition is TRUE, otherwise FALSE.
exit( jobId | job_name )
If the specified batch job is in the EXIT
state, the condition is TRUE, otherwise
FALSE.
exit( jobId | job_name, exit_code )
If the specified batch job is in the EXIT
state, and the job exited with exit_code, the
condition is TRUE, otherwise FALSE. The
reserved exit_code of 512 indicates the job
exited while pending.
exit( jobId | job_name, op exit_code )
If the specified batch job is in the EXIT
state, and the job's exit value is op
exit_code, the condition is TRUE, otherwise
FALSE. op is one of >, >=, <, <=, ==, or !=.
The reserved exit_code of 512 indicates the
job exited while pending.
ended( jobId | job_name )
If the specified batch job has finished, the
condition is TRUE, otherwise FALSE.
If only the jobId|job_name is specified, the system
assumes it means done(jobId | job_name)). Note that a
numeric job name should be doubly quoted, e.g. -w
"'210'" since the shell treats -w "210" the same as -
w 210 which is treated as a job with jobId being 210.
While JobId may be used to specify jobs of any users,
Job name can only be used to specify the user's own
jobs. If more than one job use the same job name, the
last submitted job is assumed. job_name* indicates all
job names that begin with the same string job_name
(e.g. job_name, job_name1, job_name_test,
job_name.log).
If any one of the specified jobId or job_name is non-
existent in the system, the job submission will be
rejected.
Job array conditions:
numrun(array_jobId, op num))
TRUE if the number of jobs in RUN state
satisfy the test.
numpend(array_jobId, op num))
TRUE if the number of jobs in PEND state
satisfy the test.
numdone(array_jobId, op num))
TRUE if the number of jobs in DONE state
satisfy the test.
numexit(array_jobId, op num))
TRUE if the number of jobs in EXIT state
satisfy the test.
numended(array_jobId, op num))
TRUE if the total number of jobs in the DONE
or EXIT state satisfy the test.
numstart(array_jobId, op num))
TRUE if the total number of jobs in the RUN,
USUSP or SSUSP state satisfy the test.
Job array conditions can be used to specify a depen-
dency on the the value of the counters associated with
a job array. The counters keep track of the number of
jobs in various states. array_jobId is the job Id of
an array job. See -J option for creating an array job.
op is one of >, >=, <, <=, ==, or !=. num is a posi-
tive integer or the wildcard character * to indicate
the total number of jobs within the array. If * is
specified the number op should be omitted.
The following are examples of depend_cond :
"done(1351) && ended(job1) && (started(job2) ||
exit(job3))"
1351 || job2 || started(job3)"
"done(job_name*) && ended(test_name*)"
"numdone(1334,*) && numdone(1335,>= 10)"
-L login_shell
The name of the login shell (must specify an absolute
path) used to initialize the execution environment.
Note that this is not the shell under which the job
will be executed. If this option is specified, LSF
will start login_shell as though it were the login
shell; thus the system and user startup files will be
sourced and the job will run under this environment.
After sourcing the startup files, the login_shell pro-
cess will be overlaid by the process of the job file.
For example, given -L csh, csh is started as a login
shell and /etc/login, ~/.cshrc and ~/.login are
sourced, and then the csh process is overlaid when job
starts. The default is not to start a login shell but
just run the job file under the execution environment
from which the job was submitted.
Note that the environment variable LSB_QUEUE is set by
LSF so that shell scripts (say the user's .profile or
.cshrc script) can test for batch job execution when
appropriate, and not (for example) perform any setting
of terminal characteristics, since a batch job is not
connected to a terminal. For example, if your login
shell is C-shell, the following .login file prevents
stty and tset from being run during batch jobs.
if (! $?LSB_QUEUE) then
stty erase ^H kill ^U
tset -S
endif
If your login shell is Bourne shell, the following
.profile file has the same effect.
if test "$LSB_QUEUE" = "" ; then
stty erase ^H kill ^U
tset -S
fi
-P project_name
The name of the project that resources consumed by this
job will be charged to. If this option is not specified
then bsub(1) will use the default project. The default
project is determined as follows:
If this option is absent, the user default project
specified by the user's environment variable
LSB_DEFAULTPROJECT is used for job submission. If nei-
ther this option nor LSB_DEFAULTPROJECT is present, the
system default project specified by the LSF administra-
tor in the lsb.params configuration file is used (see
lsb.params(5) for parameter DEFAULT_PROJECT).
Project names are logged in the lsb.acct(5) file and is
used by bacct(1). On IRIX 6 the user must be a member
of the project as listed in /etc/project(4). If the
user is a member of the project then the /etc/projid(4)
file is consulted to map the project name to a numeric
project ID. Before the submitted job begins execution a
new array session ( newarraysess(2)) is created and the
project ID is assigned to it using setprid(2).
-G user_group
The name of the LSF user group (see lsb.users(5)) to
which the job will belong. The job's user must be a
direct member of the specified user group. This option
allows a user to specify a particular user group for
the job if the user belongs to multiple user groups
defined for fairshare scheduling. If this option is
not given, and the user belongs to multiple user
groups, the job will be scheduled under the user group
that allows the job to be dispatched as soon as
possible. This option has no effect if fairshare is
not defined for the user groups to which the user
belongs.
EXECUTION ENVIRONMENT VARIABLES
The following environment variables are set before a job is
started.
LSB_JOBID
This is the ID of the job assigned by LSF, as shown by
bjobs(1).
LSB_JOBINDEX
For a job belonging to a job array, LSB_JOBINDEX
denotes the index of the job into the job array.
LSB_HOSTS
This is a list of hosts that are used to run the batch
job. For sequential jobs, it contains only one host
names. For parallel jobs, it contains multiple host
names separated by spaces. A host name may be repeated
if multiple components of a parallel job is allocated
on this host.
LSB_QUEUE
This is the name of the queue from which the job is
dispatched.
LSB_JOB_NAME
This is the name of the job. The name of a job can be
specified explicitly when the user submits the job. If
the user does not specify a job name, the job name will
be up to the last 60 characters of the job's command
line.
LSB_EXIT_PRE_ABORT
The queue-level (see lsb.queues(5)) or job-level (see
-E option above) pre_exec_command can exit with this
value if it wants the job be aborted instead of being
requeued or executed.
LSB_JOB_STARTER
This variable is defined if a job starter command is
defined for the queue (see lsb.queues(5)).
LSB_RESTART
This variable is set to "Y" if the job is a restarted
job (see brestart(1)) or if the job has been migrated
(see bmig(1)). Otherwise, this variable is not
defined.
LSB_EXIT_REQUEUE
This variable is set to the REQUEUE_EXIT_VALUES parame-
ter of the queue in which the job is running (see
lsb.queues(5)). This variable is not defined if
REQUEUE_EXIT_VALUES is not defined in the queue.
LSB_INTERACTIVE
This variable is set to "Y" if the job is an interac-
tive job. An interactive job is submitted using the -I
option. This variable is not defined if the job is not
interactive.
LSB_EVENT_ATTRIB
The attributes of external events that were specified
in the job's dependency condition (see bevents(1) and
eeventd(8)). This variable is of the format
"event_name1 attribute1 event_name2 attribute2 ...".
LS_JOBPID
This is the process id of the job.
LS_SUBCWD
This is the directory on the submission host in which
the job was submitted.
LSB_SUB_HOST
This variable is set to the host name where the job was
submitted.
ACCOUNT MAPPING
Support for non-uniform user name/user ID spaces is provided
for the execution of batch jobs through the
$(HOME)/.lsfhosts file. The .lsfhosts file is similar to the
.rhosts file used by rlogin(1) and rsh(1). Whereas the
.rhosts file only specifies which users are allowed to run
under a given account, .lsfhosts also specifies which
account to use when a given host is selected to run the job.
The .lsfhosts file contains multiple lines of the following
format:
hostname username [send|recv]
A hostname or username of '+' indicates any host or user,
respectively. Blank lines and lines beginning with '#' are
ignored. The keyword send indicates that, if the job is
sent to execute on hostname, then the account username
should be used. The keyword recv indicates that user user-
name who submitted a job from hostname is allowed to run
under the local account. If neither send nor recv is speci-
fied, then the account can both send jobs to and receive
jobs from username on hostname.
The .lsfhosts file is consulted at job submission time and
at execution time. At submission time all information about
accounts to use on potential execution hosts is extracted
and saved as part of the job. At execution time the system
will determine which account to use on the execution host
and verify that the submission user is authorized to run
under that account.
When using account mapping, the job is always run under a
login shell so that the startup files of the execution
account are read.
If the user attempts to map to an account for which he has
no permission, the job is put into PSUSP state. The user can
modify the .lsfhosts file for the execution account to give
appropriate permissions and resume the job. Since the
.lsfhosts file of the submission user is only read at sub-
mission time, changes will not affect the account used for
already submitted jobs. Subsequently submitted jobs will
pick up the modified .lsfhosts.
The .lsfhosts file must have its permissions set so that it
is read/written only by the user ( 600 ). Otherwise it will
be ignored.
Note that, when using account mapping, the
bpeek(1)commandwillnotwork.
EXAMPLES
% bsub sleep 100
Submit the UNIX command sleep together with its argu-
ment 100 as a batch job.
% bsub -q short -o my_output_file "pwd; ls"
Submit the UNIX command pwd and ls as a batch job to
the queue named short and store the job output in
my_output file.
% bsub -m "host1 host3 host8 host9" my_program
Submit my_program to run on one of the candidate hosts:
host1, host3, host8 and host9.
% bsub -q "queue1 queue2 queue3" -c 5 my_program
Submit my_program to one of the candidate queues:
queue1, queue2, and queue3 which are selected according
to the CPU time limit specified by -c 5.
% bsub -I ls
Submit a batch interactive job which displays the out-
put of ls at the user's terminal.
% bsub -Ip vi myfile
Submit a batch interactive job to edit myfile.
% bsub -Is csh
Submit a batch interactive job that starts up csh as an
interactive shell.
% bsub -b 20:00 -J my_job_name my_program
Submit my_program to run after 8pm and assign it the
job name my_job_name.
% bsub my_script
Submit my_script as a batch job. Since my_script is
specified as a command line argument, the my_script
file is not spooled. Later changes to the my_script
file before the job completes may affect this job.
% bsub < default_shell_script
where default_shell_script contains:
sim1.exe
sim2.exe
The file default_shell_script is spooled, and the com-
mands will be run under the Bourne shell since a shell
specification is not given in the first line of the
script.
% bsub < csh_script
where csh_script contains:
#! /bin/csh
sim1.exe
sim2.exe
csh_script is spooled and the commands will be run
under /bin/csh.
% bsub -q night < my_script
where my_script contains:
#! /bin/sh
#BSUB -q test
#BSUB -o outfile -e errfile # my default stdout and stderr files
#BSUB -m "host1 host2" # my default candidate hosts
#BSUB -f "input > tmp" -f "output << tmp"
#BSUB -D 200 -c 10/host1
#BSUB -t 13:00
#BSUB -k "dir 5"
sim1.exe
sim2.exe
The job is submitted to the priority queue instead of
the test and night queue as the command line options
override both optins in the script and options in the
option file.
% bsub -b 20:00 -J my_job_name
bsub> sleep 1800
bsub> my_program
bsub> CTRL-D
The job commands are entered interactively.
RESTRICTIONS
When using account mapping the command bpeek(1) will not
work.
File transfer via the -f option to bsub(1) requires rcp(1)
to be working between the submission and execution hosts.
Use the -N option to request mail, and/or the -o and -e
options to specify an output file and error file, respec-
tively.
SEE ALSO
bjobs(1), bqueues(1), bhosts(1), bmgroup(1), bchkpnt(1),
brestart(1), sh(1), getrlimit(2), sbrk(2), libckpt.a(3),
lsb.users(5), lsb.alarms(5), lsb.calendars(5),
lsb.queues(5), lsb.params(5), lsb.hosts(5), mbatchd(8)