r53 - 12 Jun 2017 - 10:10:07 - AmirKargerYou are here: HMS Wiki >  Orchestra Web  > IntroductionToLSF

Introduction to LSF

This page gives a brief overview of submitting and monitoring your jobs on the Orchestra cluster. There is a slightly more advanced page at IntermediateLSF. For more complete LSF documentation, consult the full LSF documentation set. The Running Jobs with Platform LSF document is likely the most useful reference for end-users.

Introduction to the Introduction

Learning about a new computer system can be intimidating for some. Some biologists are unfamiliar with UNIX, the command line, and clusters. But don't worry! Literally thousands of biologists have learned how to use the Orchestra cluster. You only need to learn a few things to get started, and if you run into trouble, you can always contact Research Computing for help.

The Orchestra cluster is a collection of hundreds of computers with thousands of processing cores. LSF (load sharing facility) is basically a system for ensuring that the hundreds of Orchestra users "fairly" share the processors and memory in the cluster.

The basic process of running jobs

  1. You login to orchestra.med.harvard.edu.
  2. If necessary, you copy your data to the cluster from your desktop or another location (see FileTransfer). You may also want to copy large data inputs to the scratch filesystem (see Filesystems) for faster processing.
  3. You submit your job - for example, a program to map DNA reads to a genome - specifying how long your job will take to run and what queue to run in. You can modify your job submission in many ways, like requesting a large amount of memory, as described below.
  4. Your job sits in a queue (job status PEND), and when it's your turn to run, LSF finds a computer that's not too busy.
  5. Your job runs on that computer (a "compute node"), and goes to status RUN. While it's running, you don't interact with the program. (If you're running a program that requires user input or pointing and clicking, see Interactive Jobs below.)
  6. The job finishes running (status DONE, or EXIT if it had an error). You get an email when the job is finished.
  7. If necessary, you might want to copy data back from the scratch filesystem to a backed-up location, or to your desktop.

Definitions

Here's some of the terminology that you'll find used in relation to LSF in this and other documents:

  • host, node, and server are all just fancy words for a computer
  • core Cluster nodes can run multiple jobs at a time. Many nodes on Orchestra have 12 cores, meaning they can run 12 simple jobs at a time. (Some jobs use more than one core.)
  • master host: The system that performs overall coordination of the LSF cluster. In our case we have a primary server named ozawa and one backup.
  • submission host: When logging in to orchestra.med.harvard.edu, you'll actually end up on a "login node" called balcony, mezzanine. You submit your LSF jobs from there.
  • execution host: (or "compute node") A system where LSF jobs will actually run. In Orchestra, compute nodes are named after musical instruments with numbers after the name. Currently our compute nodes are named, clarinet, bassoon, and flute. You will not log into these machines directly.
  • queue: The basic container for LSF jobs. Queues limit the type of jobs that can be run through them, what resources those jobs can access, who can submit jobs to a given queue, and so forth.
  • filesystem: From your perspective, just a fancy word for a big disk drive. The different filesystems Orchestra sees have different characteristics, in terms of the speed of reading/writing data, whether they're backed up, etc. See Filesystems.

Submitting Jobs

Jobs are submitted from the Orchestra command line. See NewUserIntroduction for help with logging in to Orchestra.

The bsub command

You submit jobs to LSF using the bsub command, followed by the command to run your cluster job. When you submit the job, bsub will give you a numeric JobID? that you can later use to monitor your job. The LSF scheduler will then find a computer that has an open slot matching any specifications you gave (see below on requesting resources), and tell that computer to run your job.

bsub also accepts option arguments to configure your job, and Orchestra users should specify all of the following with each job, at a minimum:

  • the queue (using -q)
  • a runtime limit, i.e., the maximum number of minutes (-W 15) or hours and minutes (-W 2:30) the job will run. The job will be killed if it runs longer than this limit, so it's better to overestimate.

You can run practically any valid UNIX command on Orchestra with bsub. Here's an example of running a simple Matlab command (without opening the graphical interface):

mfk8@balcony:~$ bsub -q short -W 10:0 matlab -nojvm -nodisplay -r "2+2"
Job <1464234> is submitted to queue <short>.
mfk8@balcony:~$ 

LSF tells you a jobid for the job, in this case 1464234. That jobid will be handy later in keeping track of your job.

Most users will be submitting jobs to the short, long, priority, or interactive queues, some in the mcore, mpi, or other queues. We have a page that will quickly help you choose a queue and another with more information including an extensive list of queues.

More bsub

We can also specify all the options of bsub in a file and submit. 
[22:11 ac427@clarinet002-147 ~]$ cat myjob.txt
#!/bin/bash     # Uses the bash as a shell. almost everyone on Orchestra uses this 
#BSUB -n 1                        # number of cores the job runs
#BSUB -q short                   # queue name. The short queue is for jobs up to 12 hours
#BSUB -W 00:30                # wall-clock time (hrs:mins). Jobs are killed if they run longer than this
#BSUB -J myjob                    # job name
#BSUB -o myjob.%J.out             # output file name, in which %J is replaced by the job ID
#BSUB -e myjob.%J.err             # error file name, in which %J is replaced by the job ID
sleep 60  # user command to run. This example command just waits 60 seconds.

[22:20 ac427@balcony ~]$ bsub < myjob.txt
Job <8231402> is submitted to queue <short>.
[22:20 ac427@balcony ~]$ 

A standard job on Orchestra will use 1 core, but some jobs need multiple cores. Keep reading for more information on what the different flags above mean.

bsub option quick reference

The bsub command can take many, many different flags (options). This is just a quick description of the most popular flags. They are described in more detail further down the page. Also, more flags are described on the IntermediateLSF page.

Only -q and -W are required. If you are running a multi-threaded or parallel job, which splits a job in pieces over multiple cores/threads/processors to run faster, -n is required. If you are running a parallel MPI job, -a is also required.

-a openmpi
What kind of MPI job is this? (use with mpirun. See LSFParallelJobs)
-e errfile
Send errors (stderr) to file errfile. If file exists, add to it. [Note 1]
-Is
Open an interactive session
-n 4
Run on four cores (Some programs use the term "processors" or "threads" or "CPUs" for this idea)
-N Notify
Send email when job finishes, even if using -e/-o [Note 1]
-o outfile
Send screen output (stdout) to file outfile. If file exists, add to it. [Note 1]
-R "rusage[mem=10000]"
Resource request. Reserve 10,000 MB of memory
-R "select[transfer]"
Resource request. Only run on "transfer" computers
-W 30
Runlimit. Job will be killed if it runs longer than 30 minutes (5:30 means five hours and thirty minutes)

[1] By default, you will get an email when jobs finish. If you use -e, -eo, -o, or -oo, you will not get such an email unless you also use -N.

Monitoring Jobs and Hosts

There are several commands you may wish to use to see the status of your jobs:

  • bjobs lists information about your jobs, including jobids and what status they're in. Status will usually be PEND or RUN, or sometimes SSUSP (suspended) Finished jobs will be shown with DONE or EXIT (error) status for an hour after they finish. bjobs -l 1464234 will give information only about a single job, including the command that's running, how much memory the job used, and when it started running.

  • bhist displays historical information about running and completed jobs (use -a to include both running and finished jobs), such as how long they ran or were suspended. bhist will show jobs that finished in the last few hours, or longer if you use -n.

  • bhosts lists information about LSF hosts.

For much more information about these commands, or almost any UNIX or LSF command, type, for example, man bjobs. You can then use spacebar to go down a page, 'b' to go back a page, or 'q' to get back to the command line.

Job Suspension

In order to keep the multi-day jobs in the long queue from monopolizing compute nodes, jobs in the long queue can be suspended by jobs in the short queue. The long job will pause, its status will be changed to SSUSP for a few minutes or hours, and then it will automatically resume. See TroubleshootingLSFJobs if your job has been suspended for a very long time.

Terminating Jobs Before Completion

The bkill command can be used to kill LSF jobs before they complete normally. If you decide partway through the run that it's not worth completing a job, you can kill it early. Run bkill 1234 to kill job 1234.

In rare cases, for example if a compute node crashes while running your job, a job may be listed in the ZOMBIE state. If this happens, you can force that job's removal from the queue by using bkill -r.

Job Completion

By default, you will receive an execution report by e-mail when a job you have submitted to LSF completes. Click the button for a sample LSF job report:

Subject: Job 1682: <hostname> Done
From: LSF <lsfadmin@orchestra.med.harvard.edu>
Date: 25 Oct 2004 14:22:05 -0000 (Mon 10:22 EDT)
To: mfk8@orchestra.med.harvard.edu

Job <hostname> was submitted from host <balcony.med.harvard.edu> by user <mfk8>.
Job was executed on host(s) <flute000-170.orchestra>, in queue <long>, as user <mfk8>.
</home/mfk8> was used as the home directory.
</home/mfk8> was used as the working directory.
Started at Mon Oct 25 10:22:04 2012
Results reported at Mon Oct 25 10:22:05 2012

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
hostname
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :      0.01 sec.
    Max Memory :         2 MB
    Max Swap   :         4 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

flute000-170

Troubleshooting jobs

This job report is quite useful if your job exits. If a job fails, the subject line will (usually) say "Exit" instead of "Done". If you get a TERM_CPULIMIT error, the CPU time reported will be greater than the -W limit you specified in your job. Similarly, if you get a TERM_MEMLIMIT, the Max Memory will be too high, and you need to rerun the job, asking for more memory. (See "Requesting resources".)

For much more information on figuring out why jobs don't start, or don't finish, see the separate page on troubleshooting LSF jobs

If you are contacting Research Computing because a job did not behave as expected, it's often helpful to include the job report in your request. Just attach it to the email, or paste it into the form.

Job output handling and email size limits

If the size of your job output exceeds 20MB, it is too large for a standard mail server and won't be emailed; the output will be placed in a file in the .lsbatch directory in your home directory, and you will receive email with the exact location of the job output file. These files are subject to the file system quota imposed on your home directory (usually 100 GB).

If you are expecting very large output from your job, direct the output to files using one of the following methods:

1. Specify the -o and -e options to bsub:

  • $ bsub -o myjob.out myjob

The job report will be located at the end of the file myjob.out. (If myjob.out existed before the job ran, your job report will be appended. Use bsub -oo instead of bsub -o to overwrite an existing file.)

If you want to get the email with just the job report (i.e., whether it completed successfully), but you want the program output to go to a file, use both -o and -N:

  • $ bsub -N -o myjob.out myjob

If you are familiar with UNIX stderr and stdout, you can separate the two by specifying both -o and -e. If you're not familiar, don't worry about it.

2. Use command line redirection:

  • $ bsub 'myjob > myjob.out'
  • As in this example, the command and the redirection must have single quotes around them so that the output of the job is redirected rather than the one-line output of bsub itself.

Interactive Jobs

You can submit "interactive" jobs under LSF, which allows you to actually log in to a compute node. You can then test commands interactively, watch your code compile, run programs that require user input while the command is running, or use a graphical interface. (Matlab and R can run either in batch mode or graphical mode, for example.)

mfk8@balcony ~ % bsub -Is -q interactive bash
Job <1672> is submitted to queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on flute000-170.orchestra>>
mfk8@flute000-170:~$

You can then run your commands, and logout when you're done. The interactive queue has a limit of 12 hours. Some individual groups also have their own "int" queues.

You can request extra memory or multiple cores (up to 12) in an interactive session bsub.

Running graphical programs on Orchestra with X11

A number of programs with graphical user interfaces (e.g., R, Matlab) use the X11 system which lets the program run on an Orchestra computer, but show the graphics on your desktop. To do this, you need to have an X11 server running on your desktop, and your SSH connection needs to have X11 forwarding enabled. See this page for X11 instructions.

Running jobs on multiple cores

Modern computers have multiple cores, allowing them to run multiple jobs at once. Many programs allow you to use multiple cores for a single run. (Just to confuse you, different programs talk about running on multiple "processors" or "cores" or "threads" or "CPUs".) This can allow you a substantial speedup, but there are some important issues to be aware of.

Unlike multi-core programs, truly parallel programs use a system like MPI to let you run on more than one computer at a time. For information on that, see LSFParallelJobs.

How many cores?

First, you need to tell LSF the number of cores you want to run on. You do this with the bsub -n flag. The number of cores you request from the program should always be the same as the number of cores you request from LSF. Note that different programs have different options to specify multiple cores. For example, tophat -p 8 asks tophat for eight cores. So you might run bsub -q mcore -W 1:00 -n 8 tophat -p 8 ...

Which queue?

There are special queues set up to improve the waiting time and performance of multi-core jobs. As of mid-2014, you should run in the priority queue if you have just one or two jobs to run, or the mcore queue if you have more. See ChoosingAQueue for details or updates.

Time limits

A job actually has 2 time limits:

Runtime limit
The maximum number of seconds that the job can be in RUN state. Also known as "wall clock time". You specify this limit with bsub -W
CPU limit
the number of seconds the computer's processor is working on your job. This limit is automatically set by LSF when you submit it

If your program is using 8 cores, then in one hour of wall time, the program will consume 8 hours of CPU time. So LSF automatically sets the CPU limit to Ncores * Runlimit when you submit the job. If you run bsub -W 8:00 tophat -p 8 ... (asking tophat for eight cores but not telling LSF you need multiple cores) then your job will actually be killed after just one hour.

Requesting resources

You may want to request a node with specific resources for your job. For example, your job may require 4 GB of free memory in order to run. Or you might want one of the nodes set aside for file transfers or other purposes.

You can add resource requirements to bsub or bhosts using the -R option, as shown below.

Memory requirements

Every job requires a certain amount of memory (RAM, "active" memory, not disk storage) to run. If the jobs on a given node use too much memory, this memory exhaustion can adversely impact other running jobs, prevent the dispatch of new jobs, cause nodes to crash, and unfairly benefit a few intensive users at the expense of all other users of that node.

Jobs that use excessive memory without requesting resources may be terminated by Research Computing to allow other jobs to run.

If your job requires more than 2 GB of memory, please override the defaults. For example, to submit a job to the short queue and reserve 8 GB of memory for the duration of the job:

mfk8@balcony:~$ bsub -q short -R "rusage[mem=8000]" your_job

Note that the units for memory reservation values are specified in MB.

For parallel jobs or jobs requiring multiple slots to run, the memory limit will be the product of the memory request in the -R option multiplied by the number of processors requested with the -n option. For example, if you ask for 4 processors and 4 GB per processor, the limit will be 16 GB (or about 16000 MB):

As of 2013, many nodes have 96 GB memory available, although some do have less than that. If you ask for large amounts of memory, you may have to wait longer before your job runs, so look at the memory usage in your job report to know what limits to give on your next job. In addition, if you ask for as much memory as a node has, your job will never run, as the node needs a few G to run system processes. For example, on a 96G node, it's best to ask for no more than 90 or 91G.

If you need more than 96 GB of RAM, please contact Research Computing.

Filesystem Resources

Filesystem resources make sure that your job is only dispatched to a node that has the appropriate network filesystem mounted and accessible to it. All users are encouraged to use filesystem resources. During planned maintenance events or unplanned filesystem outages, using a filesystem resource requirement will keep your job from being unnecessarily dispatched and subsequently failing.

The filesystem resources are:

scratch1 (/hms/scratch1)
groups (/groups)
log (/log - for web hosting users)
files  (/files - currently unused)
testfs (/testfs - an experimental filesystem, not available to the general community)

Example Usage:

bsub -R "select[scratch1]"

If your job requires multiple filesystems:

bsub -R "select[scratch1]" -R "select[groups]"
or
bsub -R "select[scratch1&&groups]"

Requesting particular nodes for file transfer

As described in more detail on the FileTransfer page, you can request a node specially set aside for larger file transfers using bsub -R "select[transfer]".

Requesting particular nodes

As described in more detail on the FileTransfer page, you can request a node specially set aside for larger file transfers using bsub -R "select[transfer]".

Feedback Wanted

We expect to make adjustments to the LSF configuration as we learn more about how Orchestra is used. Your feedback will be the best way for us to learn what adjustments we should be considering to make the cluster most useful to the widest audience. We're also interested in feedback on what information is most useful for new users, so that we can expand and improve this documentation. Please give us feedback early and often. Please use the support form on the Research Computing web site.

What to Read Next

There is much more information available on the IntermediateLSF page, including such topics as job arrays, job dependencies, GPUs, modifying running jobs,

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r53 < r52 < r51 < r50 < r49 | More topic actions
 
HMS Wiki
The HMS Wiki is a service of the Research Information Technology Group at Harvard Medical School.
Copyright © 2004-2020 Presidents and Fellows of Harvard College and contributing authors.
Do you have ideas, requests, or need help regarding the HMS Wiki? Request support from RITG.