-
Notifications
You must be signed in to change notification settings - Fork 2
MSKCC cBio Cluster User Guide
- Overview
- System access
- Storage
- Running jobs
- Managing jobs
- Cleaning up after jobs
- Globally-installed software
- Datasets and repositories
- Frequently asked questions (FAQ)
New cluster head node DNS name: hal.cbio.mskcc.org
The cBio cluster consists of 30 Exxact dual-socket E5-2665 2.4GHz nodes (32 hyperthreads/node) with 256GB of memory. Each node is configured with 4x NVIDIA GTX-680 [18 nodes], 4x GTX-Titan [10 nodes], 4x GTX-980 [1 node], or 4x GTX-TITAN-X [1 node] GPUs.
No PHI is allowed on this cluster.
User accounts are requested internally on the HPC website at MSKCC. For external collaborator account requests, please visit: https://www.mskcc.org/collaborator-access
If you are having problems please email hpc-request 'at' cbio.mskcc.org
There are 30 compute nodes in the cluster, each containing
- 2x Intel Xeon E5-2665 2.4GHz CPUs in hyperthreaded mode, providing a total of 32 threads per node
- 256GB memory (16 x 16GB DDR3 1600MHz ECC/Reg server memory)
- 4x NVIDIA GPUs per node, either 4GB GTX-680s [18 nodes], 6GB GTX-Titans [10 nodes], 4GB GTX-980s, 12GB GTX-TITAN-Xs
- 10GE ethernet interfaces [Intel E10G42BTDA Server Adapter X520-DA2 10Gbps PCIe 2.0 x8 2xSFP+ ports] connected to a Dell PowerConnect 8164F 10GE 48-port switch
- 1GE ipKVM and IMPI2.0 compliant management interfaces connected to a Dell PowerConnect 2848 1GE 48-port switch
The compute nodes are provided by Exxact Corp. See this link for an image of the node chassis.
In addition, there are 2 compute nodes (Dell R820), each containing
- 4x [Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40G] in hyperthreaded mode, providing a total of 64 threads per node
- 512GB memory (32 x 16GB DDR3 1600MHz ECC/Reg server memory)
- 10GE ethernet interfaces [Intel E10G42BTDA Server Adapter X520-DA2 10Gbps PCIe 2.0 x8 2xSFP+ ports] connected to a Dell PowerConnect 8164F 10GE 48-port switch
- 1GE ipKVM and IMPI2.0 compliant management interfaces connected to a Dell PowerConnect 2848 1GE 48-port switch
Fast local filesystem storage for the cluster is provided by a GPFS filesystem hosted on Dell servers. This filesystem is intended for caching local datasets and production storage, rather than long-term data archival. This storage is shared by all cBio groups, and quotas will be enforced after a friendly user period ends.
Home directories are located in /cbio/xxlab/home
, where xxlab
denotes the laboratory designation (e.g. grlab
or jclab
). Once the backup server is activated, home directories will be backed up via frequent snapshotted backups. Limited backup space is available, so large project datasets that change frequently should not be stored here, or else your group will rapidly run out of backup space. See "group project directories" below.
Please keep only critical files in your home directory to ensure your home directory usage is below 100GB; otherwise, you will break our frequent remote backups of user home directories. You can check home directory space usage with du -sh ~
. Use "group project directories" for larger backed-up storage---there is much more space available there.
Group shared directories are located in /cbio/xxlab/share
, and are intended for storing software or datasets that are shared among the group. This directory and its contents are made group-writeable by default. This directory will also be backed up, but less frequently than home directories.
Group project directories are located in /cbio/xxlab/projects
, and are intended to host large active research projects that may be shared within the laboratory. These directories will also be backed up, though less frequently than home directories.
MSK collaborators should use /cbio/xxlab/projects/collab/
to create project directories in.
Groups also have non-backup group storage located in /cbio/xxlab/nobackup
that can host large dataset mirrors that are not irreplaceable. These directories will not be backed up.
The total storage space accessible to all groups is limited. Quotas are not currently enforced, but we will be charging for storage use on the amount of space you use at the rate of $35/TB/month. This is posted and discussed further on the internal HPC website for MSKCC.
We are currently running with un-enforced soft quotas at the user level only. A simple wrapper around the GPFS mmlsquota command is available on HAL if you type:
mskcc-ln1> cbioquota
Please note currently all the quotas below are soft quotas and are not enforced.
This is purely to give users a way of determining their disk usage quickly.
Quotas for uid=9999(username) gid=9999(usergroup)
This is your current usage in GB
|
V
Block Limits
Filesystem type GB quota limit
gpfsdev USR 200 51200 0
GROUP usergroup
Disk quotas for group usergroup (gid 9999):
Block Limits
Filesystem type GB quota limit
gpfsdev GRP no limits
It attempts to present a friendly summary of your user and future group quotas.
Each node has 3.3TB of scratch space accessible in /scratch
. Note that /tmp
very limited in space, and only has a few GB free, so you should use /scratch
as your temporary directory instead. Node scratch directories are local only to the compute node, are never backed up, are shared amongst all groups, and are only RAID0 (striping) and therefore provide no guarantees of data persistence.
While running a batch job on the system (see "Batch Queue System Overview" for more information), the local scratch space is automatically assigned to the running job and is removed when this job finishes. The environment variable TMPDIR is set to the user's directory.
For example for job 380886 it is:
TMPDIR=/scratch/380886.mskcc-fe1.local
Important: If your application writes to /tmp
instead of /scratch
, it may fail when /tmp
fills up. Please do not use /tmp
.
- An independent storage system (consisting of a MD3260 disk array connected to node04) stores rsync-based snapshots of the following directories:
- /cbio/xxlab/home: taken every 2 hours, available at /cbio/xxlab/snap_home/, excludes **/.sge, **/nobackup, **/tmp
- /cbio/xxlab/share: taken every 6 hours, available at /cbio/xxlab/snap_share/, excludes **/.sge
- /cbio/xxlab/projects: taken every 12 hours, available at /cbio/xxlab/snap_projects/, excludes **/.sge, **/TCGA*.bam, **/CCLE*.bam
- /cbio/xxlab/nobackup: taken every 24 hours, available at /cbio/xxlab/snap_nobackup/, excludes **/.sge, **/TCGA*.bam, **/CCLE*.bam (only continued as long there is sufficient backup space)
- The same disk array also stores a crashplan-based backup of the following directories:
- /cbio/grlab/{home,share}: taken continuously, restores available at request from Gunnar Rätsch, excludes ~/tmp, ~/nobackup, *.bam, *.sam*, *.fastq*, *.fq*, core.*, .mozilla
- /cbio/jclab/{home,share}: taken continuously, restores available at request from Gunnar Rätsch, excludes ~/tmp, ~/nobackup
- /cbio/cllab/{home,share}: taken continuously, restores available at request from Gunnar Rätsch, excludes ~/tmp, ~/nobackup, *.bam, *.fastq*, *.sam*
- /cbio/cslab/{home,share}: taken continuously, restores available at request from Gunnar Rätsch, excludes ~/tmp, ~/nobackup, *.bam, *.fastq*, *.sam*
- /cbio/galab/{home,share}: taken continuously, restores available at request from Gunnar Rätsch
- /cbio/jxlab/{home,share}: taken continuously, restores available at request from Gunnar Rätsch
Archives of finished projects and alike can be accessed at /cbio/xxlab/archives. In order to create a new archive, login to node04, go to /export/archives1/xxlab and move the archive data to this location. Archives are not necessarily snapshotted (hence, be careful!).
The storage subsystem consists of
- 2x Dell PowerEdge R720 metadata servers containing 15K RPM SAS disks, providing metadata redundancy
- 4x Dell PowerEdge R620 file servers with 24GB RAM and Broadcom 57800 2x10Gb DA/SFP+ network cards
- 4x Dell PowerVault MD3260 SAS disk arrays each containing 60 3TB 7.2K RPM 6Gbps hot-plug SAS hard drives in 10-disk RAID5 configurations to provide fault tolerance to loss of single disks or an entire drawer of disks
HPC Request<[email protected]>
with an account creation request. The following details will be needed:
- full name
- sponsoring cBio laboratory
- email address
- reason for access (eg.. grant proposal, funded grant, data sharing)
- mobile phone number (in case we have to contact you in emergencies)
- BIC cluster UID/GID (if you have one already)
-
ssh
public key
Your ssh
public key on *nix systems is generally found in your home directory under one of the following filenames:
~/.ssh/id_rsa.pub
~/.ssh/id_dsa.pub
If you do not have such a file, generate an ssh public key:
ssh-keygen -t dsa -N ''
and follow the prompts. Be aware the -N argument is creating the key without a passphrase. Consult the ssh-keygen and ssh manpages for adding a passphrase and using it during login.
Password-based login is not currently supported. Login via activation of trusted ssh
access is currently the only login method supported. Your ssh
public keys for any machines you log in from must be submitted and used for access.
You can ssh
to hal.cbio.mskcc.org
to log in:
If you would like to log into HAL from another machine, you will need to add your SSH public key to your ~/.ssh/authorized_keys
file on HAL. You can do this yourself---there is no need to contact the sysadmins to do this for you.
From the new machine you would like to log in from, locate your public key (generally ~/.ssh/id_rsa.pub
, or ~/.ssh/id_dsa.pub
on linux
or osx
systems). Note that you also have private keys (~/.ssh/id_rsa
and ~/.ssh/id_dsa
) that should not be used here. If you cannot find the public keys, you can generate a new one using
ssh-keygen -t dsa -N ''
Your public key will look something like
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQ8iicpuXcHZn9ppdnDxBSu9VugPXTBSke3eG6tTm+vrAHTDZaCNYAV87ejFfEyrRRISOAFSA5m6xkMki3WlC/ebI3m595GKxHL447Lme37mENZ4IV9K1X/aMkVhvOiFaEIFs6yZWteAi9VakQP5M5DG2ul8i/sJ4NNd/JU+00QvcrHc4D1DhMaNy6vbmlF7USLS6z/8NlQSWjxo+pA5HnSC//azY0ZFZLqVIqwtzZGlTe94e36BUqGyoD3ndGFLFjZbEufHxX3l47RNM+XUaM9BDUSRKeFJtWg1Cx7IWTPiKDBrhozx5yzGZSGVf1dn/Vn4p1SaDEXnbj44aam9U9 username@systemname
or
ssh-dss AAAAB3NzaC1kc3MAAACBAMoBcHWy/pWU1s3c+dgUfZMl1ldu8eXTKL27Kc5yFmQxD0PB3qLwoUb+T6HK5EVL/WM/nKS7umYFeFlNF7YZjOKnzErUVqbGi9U48coprnGl88goBqJiBQAp2mxfbl8EhPvXK58K7LqPwXoWh1ssSOGipC1+hxJUz0SjOR4zQo8hAAAAFQD/qqz9T6K13HaWtylXAuCLsxHpYQAAAIAuQ9ZG5AfmbdGlluH7nMTuyCyJSYsoddXYwL6RJtvLEaqE2vG7341bWG9j9gZzU2LED59rJUnB0HCrP9pXL957wH6ajto82aGkIOmzAzzLZh5oPWPiVkc9o3wZod86IdQwnclu77f6LY/Q+J1wzihgY+lTsqrbKN/ai5JH21BRWgAAAIBzwOaFCni2k+2IdhtMGyKW0iXqVCyXZL4h8NqoVZHv9ys+H15tegT8Hsd36DRMFcqshGl/nQEAuByXwfKy0l/a0uYFi21hyJcUR8wNv02hlM7Z7V4jkfAPd8c/5X91VLdRG18dy6tJkZQ/AZ0jAdjAAW2KdfU13pSgPSJo1F3bjw== username@systemname
Append the key (being careful to preserve it all on one line if using copy-paste) to your ~/.ssh/authorized_keys
file on HAL. Do NOT delete any keys already there. You can use a text editor or cat
for this purpose:
cat >> ~/.ssh/authorized_keys
{paste your key here, press enter, and press ctrl-D}
Test the new key by trying to log in from the new machine. If you are asked for a password, go back and carefully check you don't have line breaks or other issues with the added key.
If you have issues with logging in, please email the support email alias of hpc-request 'at' cbio.mskcc.org.
While the head node is available to run short, non-memory-intensive tasks (e.g. editor sessions, archival, compilation, etc.), longer-running jobs or jobs that require more resources should be run on the nodes through the batch queue system (either in batch or interactive mode)---see below.
PLEASE DO NOT RUN MEMORY OR CPU INTENSIVE JOBS ON THE HEAD NODE. Use interactive logins to the compute nodes (see below) for this purpose.
Shell processes on the head node are currently restricted to 10 hours of runtime and 4GB of RAM.
The batch queue system now uses the Torque resource manager with the Moab HPC suite from Adaptive Computing. The old slurm system is no longer in use.
Example single CPU job with 24-hour time limit:
#!/bin/tcsh
# Batch script for single thread CPU job.
#
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=24:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify queue
#PBS -q batch
#
# nodes: number of nodes
# ppn: number of processes per node
#PBS -l nodes=1:ppn=1
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# Launch my program.
./myprog
Example 6-process MPI job on a single node with 12-hour time limit:
#!/bin/tcsh
# Batch script for 6-process MPI CPU job.
#
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=12:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify queue
#PBS -q batch
#
# nodes: number of nodes
# ppn: number of processes per node
#PBS -l nodes=1:ppn=6
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# Launch MPI job.
mpirun -rmk pbs progname
where the -rmk pbs
instructs the hydra mpirun
version to use the PBS resource manager kernel to take information about the number of processes to launch from PBS environment variables.
If you want to allow the MPI proceses to run on any node (rather than force them to run on the same node), you can change
# nodes: number of nodes
# ppn: number of processes per node
#PBS -l nodes=1:ppn=6
to
# nodes: number of processes
# tpn: number of threads per node
#PBS -l nodes=6,tpn=1
Submit an array job with 80 tasks, with 10 active at a time:
qsub -t 1-80%10 script.sh
#!/bin/sh
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=36:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=1:gpus=1:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# start spark workers
/some_path_to_script/some_executable $PBS_ARRAYID
NOTE: The wallclock limit for the gpu
queue is 72 hours (72:00:00
)
Using a single GPU is easy, since the CUDA_VISIBLE_DEVICES
environment variable will be set automatically so that the CUDA driver only allows access to a single GPU.
#!/bin/tcsh
# Batch script for MPI GPU job on the cbio cluster
# utilizing 4 GPUs, with one thread/GPU
#
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=12:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=1:gpus=1:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# Launch GPU job.
./myjob
If you need to manually control CUDA_VISIBLE_DEVICES
, create a file in your home directory:
touch $HOME/.dontsetcudavisibledevices
This will turn off the automatic setting of CUDA_VISIBLE_DEVICES
. You will need to set this environment variable manually if you use a single GPU, using:
export CUDA_VISIBLE_DEVICES=`cat $PBS_GPUFILE | awk -F"-gpu" '{ printf A$2;A=","}'`
You can add a constraint to your Torque request to specify which class of GPU you want to run on, though limited availability may mean that your job takes longer to start. Available classes are:
-
gtx680
: NVIDIA GTX-680 (18 nodes) -
gtxtitan
: NVIDIA GTX-TITAN (10 nodes) -
gtx980
: NVIDIA GTX-980 (1 node) -
gtxtitanx
: NVIDIA GTX-TITAN-X (1 node)
For example, to request a GTX-TITAN-X for interactive benchmarking, use something like
qsub -I -l walltime=02:00:00,nodes=1:ppn=1:gpus=1:gtxtitanx:exclusive -l mem=4G -q active
Since the GPUs available on different nodes may differ, you will need to manually control CUDA_VISIBLE_DEVICES
. Create a file in your home directory:
touch $HOME/.dontsetcudavisibledevices
You should use conda or miniconda to install clusterutils to add some helpful MPI configfile building scripts to your path:
conda install -c omnia clusterutils
You will then need to use build_mpirun_configfile
to set the CUDA_VISIBLE_DEVICES for each process individually based on the $PBS_GPUFILE
contents.
Example running an MPI job across 4 GPUs:
#!/bin/tcsh
# Batch script for MPI GPU job on the cbio cluster
# utilizing 4 GPUs, with one thread/GPU
#
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=12:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=4:gpus=4:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# Set CUDA_VISIBLE_DEVICES for this process
build_mpirun_configfile progname args
# Launch MPI job.
mpirun -configfile configfile
If you don't need the GPUs to be on a single node, you can change
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=4:gpus=4:shared
to
# nodes: number of process sets
# tpn: number process sets to launch on each node
# gpus: number of gpus per process set
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=4,tpn=1,gpus=1:shared
Sometimes it is necessary to specify dependencies on the order in which jobs must be run. For example, if there are two tasks, job_A and job_B, and job B must only be executed after job A is completed, this can be achieved by specifying a job dependency. To do so, keep track of the job IDs and pass these to qsub:
#submit job_A, keep track of its ID so you can add a dependency
job_A_ID=`qsub job_A.sh`;
#submit job_B, specifying that it should only be executed after job_A is completed
qsub -W depend=after:${job_A_ID} job_B.sh
Descriptions of other dependencies that can be specified are available from the 'qsub -W' man page.
qstat
will print out information about what is running in the queue, and takes typical arguments such as -u username
.
Example:
$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
161110.mskcc-fe1 ...o_target-temp [user] 141:01:0 R gpu
161111.mskcc-fe1 ...o_target-temp [user] 140:46:4 R gpu
161112.mskcc-fe1 ...o_target-temp [user] 140:56:2 R gpu
189342.mskcc-fe1 ...ssembly_iso_3 [user] 70:28:15 R gpu
189344.mskcc-fe1 ...target-temp_2 [user] 70:24:51 R gpu
189345.mskcc-fe1 ...target-temp_2 [user] 70:14:16 R gpu
189346.mskcc-fe1 ...target-temp_2 [user] 70:07:30 R gpu
189348.mskcc-fe1 ...target-temp_2 [user] 68:28:11 R gpu
198319.mskcc-fe1 gpu [user] 70:26:31 R batch
198320.mskcc-fe1 gpu [user] 70:22:06 R batch
198664.mskcc-fe1 QLOGIN [user] 04:31:57 R batch
qstat -a
prints in an alternative format, with additional info for each job such as node and task counts, and requested memory size.
showstart <job_id>
will give you an estimated starting time if the load is high. Keep in mind that the cluster runs on EST.
The Moab command showstate
gives a quick overview of the cluster and running jobs.
The Moab command showres -n -g
provides a useful reservation summary by node.
> showres -n -g
reservations on Fri Oct 31 13:49:46
NodeName Type ReservationID JobState Task Start Duration StartTime
gpu-1-4 Job 2495114 Running 1 -2:06:43:05 3:00:00:00 Wed Oct 29 07:06:41
gpu-1-4 Job 2497256 Running 1 -2:01:13:54 3:00:00:00 Wed Oct 29 12:35:52
gpu-1-4 Job 2497264 Running 1 -2:01:00:30 3:00:00:00 Wed Oct 29 12:49:16
gpu-1-4 Job 2498684 Running 1 -1:22:30:43 3:00:00:00 Wed Oct 29 15:19:03
gpu-1-4 Job 2499850 Running 4 -1:17:40:57 3:00:00:00 Wed Oct 29 20:08:49
gpu-1-4 Job 2499902 Running 1 -1:14:31:32 3:00:00:00 Wed Oct 29 23:18:14
gpu-1-4 Job 2499904 Running 1 -1:14:26:23 3:00:00:00 Wed Oct 29 23:23:23
gpu-1-4 Job 2499906 Running 1 -1:13:25:38 3:00:00:00 Thu Oct 30 00:24:08
tracejob
utility reads the accounting data file and produces a summary of information about the finished job. Which includes cpu-time
, exec_hostname
, owner-name
, job-name
, job-ID
, Exit_status
and resource requirements
as specified.
Usage: tracejob -lsm <job_id>
You can use qsub
to start interactive jobs using qsub -I -q active
, which requests an interactive session (-I
) in the special interactive high-priority queue (-q active
).
Example: To start an interactive job on one core with time limit of 1 hour:
qsub -I -q active -l walltime=01:00:00 -l nodes=1:ppn=1
Example: To start an interactive job running bash
on one core with a time limit of 4 hours and 4GB of memory:
qsub -I -q active -l walltime=04:00:00 -l nodes=1:ppn=1 -l mem=4G bash
Example: To start an interactive job with time limit of 1 hour, requesting one GPU in "shared" mode:
qsub -I -q active -l walltime=01:00:00 -l nodes=1:ppn=1:gpus=1:shared
To request a particular kind of GPU, you can specify either 'gtx680' or 'gtxtitan'.
Example: To start an interactive job with time limit of 1 hour, requesting one GTX-680 GPU in "shared" mode:
qsub -I -q active -l walltime=01:00:00 -l nodes=1:ppn=1:gpus=1:shared:gtx680
Sometimes an interactive job contains a graphical component. In order to forward X11 from a job running on a compute node interactively these prerequisites must exist.
First, your SSH session to HAL must be doing X11 forwarding properly to your X11 display. An example is below.
ssh -X hal.cbio.mskcc.org
mskcc-ln1 ~]$ echo $DISPLAY
localhost:15.0
mskcc-ln1 ~]$ xeyes
If you do not see big silly eyes looking at your cursor consult the manpage for ssh to make sure you are forwarding X11 properly. Sometimes the "-Y" argument is needed depending on your X11 desktop configuration.
Once that works, interactive qsub sessions support adding the "-X" flag. For example:
qsub -X -I -l walltime=02:00:00,nodes=1:ppn=1 -q active
Should start a job with forwarded X11 on the node it is scheduled on. Please note, forwarding X11 requires you keep the SSH session running to HAL. You cannot exit and expect graphical applications to appear on your desktop.
Use qdel <jobid>
to kill your batch job by job id.
On heavily overloaded nodes, this may take up to half an hour to actually kill and purge the job.
Array jobs have a few extra items that can be done when deleting them.
To delete the entire array job be sure to include the trailing [] syntax:
qdel 282829[] # deletes entire array job
qdel 282829[1] # deletes single entry in array job
qdel -t 2-5 282829[] # deletes range of items in array job
qdel -t 1,5,6 282829[] # deleting specific array job members
There is a Ganglia system installed on the system. On Mac OS one can access ganglia via the web browser using these commands (hint: create an alias; Linux: replace open with firefox or your favorite browser):
ssh -L 8081:localhost:80 140.163.0.215 -f sleep 1000
open http://localhost:8081/ganglia/
If you are located on the MSKCC network ranges, the same data is available at
http://hal.cbio.mskcc.org/ganglia/
Useful Torque and Moab commands for managing and monitoring batch jobs
While normally most items involving a completed job are cleaned up by the process simply exiting a couple of items unless the user cleans them up will be handled as follows.
If a user job leaves a shared memory segment on the system it will persist until a nightly cron job evaluates its impact. If the memory consumed by the shared memory segment is larger than 1GB the user will be emailed about it and the memory segment will be deleted. This will only be done if the shared memory segment has no processes attached to it.
Smaller shared memory segments will be removed without emailing after one week.
PROPOSED CHANGE FOLLOWS. THIS IS NOT IMPLEMENTED AT THIS TIME
Torque by default creates a TMPDIR variable and area for jobs to automatically use if they want. For example on a submitted job there will be automatically made a directory in /scratch
similar to the below.
TMPDIR=/scratch/7048239.hal-sched1.local
In most cases when a job exits that directory will be deleted. In some situations we've seen that not happen and as such we will be starting the following cron based cleaning routines of the /scratch
area soon.
- The directory
/scratch/shared
is consider a persistent non-managed place for manually sync'd user data on a node. The cron script will NOT apply age based delete to anything in this directory. So if you have placed items in /scratch itself please consider moving them to /scratch/shared. - No directory matching the above pattern of the jobid will be touched.
- The directory
/scratch/docker
which is where the docker images and state is kept will not be touched. - ALL OTHER items in
/scratch
shortly older than for a first pass of 365 days will be deleted.
CURRENT POLICY Currently no automated processes go and clean the /scratch areas on the nodes. So if your job makes use of the /scratch areas on the nodes it is up to the user to go remove leftover items there.
Matlab R2013a is installed as a module. To use it, use
module add matlab
To get an interactive login for Matlab, you can do the following:
qsub -I -l nodes=1:ppn=4:matlab -l mem=40gb -q active
Be sure to specify the active
queue, or else your jobs will not start promptly. The active
queue should only be used for interactive jobs, and not for running batch jobs. There is a default time limit of two hours on the active queue, though shorter time limits can be specified (e.g. -l walltime=00:30:00
for a 30-minute time limit).
(The attribute matlab in the call will make sure that you are scheduled on a node with an available license.)
You will get an interactive shell. You can start matlab with:
/opt/matlab/R2013a/bin/matlab
There are a limited number of floating and node-locked licenses which may be requested as resources in the batch queue environment. TODO: Document batch queue license requests for Matlab
The Ruby Version Manager (RVM) https://rvm.io/ is installed as a module and gives you many options for dealing with Ruby variations and development needs.
The recommended procedure for using RVM on the cluster is:
module initadd rvm
This will add the rvm binary to your path on subsequent logins. Then it is wish to choose a Ruby version from your .bash_profile along the lines of:
rvm use default
Or the more current ones can be chosen after viewing:
rvm list
If a user wants to self-maintain ruby versions or gems the rvm user command allows the selection of homedir maintained items. Most users will probably want to minimally select:
rvm user gemsets
Which then allows the user to add the various gems they would need easily.
The cluster makes use of a module
environment for managing software packages where it is important to provide users with several different versions. The module
environment makes it easy to configure your software versions quickly and conveniently.
To see the various options available, type
> module
To list currently loaded modules and versions:
> module list
Currently Loaded Modulefiles:
1) gcc/4.8.1 2) mpich2_eth/1.5 3) cuda/5.5 4) cmake/2.8.10.2
To list all available modules that can be loaded:
> module avail
To add a new module, use module add
:
> module add cuda/5.5
The number that comes after the module name followed by a slash is the version number of the software.
More information about available modules can be obtained with the module show
command:
> module show cuda/5.5
-------------------------------------------------------------------
/etc/modulefiles/cuda/5.5:
module-whatis cuda
module-whatis Version: 5.5 beta
module-whatis Description: cuda toolkit
prepend-path PATH /usr/local/cuda-5.5/bin
prepend-path LD_LIBRARY_PATH /usr/local/cuda-5.5/lib:/opt/cuda-5.5/lib64
-------------------------------------------------------------------
#Datasets and Repositories
##PDB database in pdb format
The PDB database is present at /cbio/jclab/share/pdb
It can be retrieved using the following command:
rsync -rlpt -v -z --delete --port=33444 rsync.wwpdb.org::ftp_data/structures/divided/pdb/ <destdir>
For more information and options (such as other file formats), go to this page on wwPDB
It was last retrieved on 7 Nov 2013.
During non-business hours, Gunnar and John can make emergency interventions by running the following commands on the head node (mskcc-ln1
):
* take nodes offline: sudo /opt/torque/bin/pbsnodes -o NodeName
* take nodes out of service and flag them for sysadmin sudo /opt/torque/bin/pbsnodes -oN "text for investigation here" NodeName
* purge jobs from the queue sudo /opt/torque/bin/qdel -p JobID
The purge command will remove all accounting information for that job, tracejob will only show
purging job 9999999.mskcc-fe1 without checking MOM
but no resources used at all.