Contents
WORK IN PROGRESS
Using the Unicorn Cluster
Unicorn cluster uses SchedMD’s SLURM scheduler on Ubuntu 24.04.
Email the ITSG for assistance.
Connect
Log into the cluster login node using your Cornell NetID and password from on-campus or Cornell VPN.
Use your favorite SSH client (Terminal, VSCode, MobaXterm, etc.)
ssh netid@unicorn-login-01.coecis.cornell.edu
Replace netid with your NetID
Login Node Notes:
- The login node(s) are only designed to support logging into and submitting jobs to the scheduler.
- Please use a quick interactive job to perform any tasks like compiling code or building conda environments.
- To ensure the login nodes can serve the needs of the community, any process that consumes a large amount of resource is automatically terminated and the user notified by email that includes suggestions of some best practices.
Submit Jobs
Interactive job:
Get an interactive shell:
salloc
This uses the cluster default: 1 CPU, 1GB of RAM, 4-hour time limit on default_partition
You can request additional resources like this:
salloc --mem=5g --gres=gpu:1 --cpus-per-task=2
Disconnect from interactive sessions by pressing CTRL-D or typing exit
Scheduled job:
Create a SLURM submission script:
Example: test-gpu.sub
#!/bin/bash #SBATCH -J test_file # Job name #SBATCH -o test_file_%j.out # output file (%j expands to jobID) #SBATCH -e test_file_%j.err # error log file (%j expands to jobID) #SBATCH -N 1 # Total number of nodes requested #SBATCH -n 1 # Total number of cores requested #SBATCH --cpus-per-task=1 # Total number of cores requested per task #SBATCH --get-user-env # retrieve the users login environment #SBATCH --mem=2000 # server memory requested in MB (per node) #SBATCH -t 2:00:00 # Time limit (hh:mm:ss) #SBATCH --partition=default_partition # Request partition #SBATCH --gres=gpu:r6000:1 # Type/number of GPUs needed echo "Hello, world! This is the GPU I'm using:" # The commands or script to run nvidia-smi -L
Submit the job:
sbatch --requeue test-gpu.sub
SBATCH Notes
- Use –requeue for resubmission if preempted.
- No blank lines between top of script and last #SBATCH line.
- Use full paths for scripts (/home/netid/test.sh).
Monitor Jobs
See your active jobs:
squeue --me
To monitor your job, you can log into any node where you have a job running.
Cancel a job
scancel <jobid>
Review Jobs
The above example will generate output file in the form:
test_file_###.out with contents
Hello, world! This is the GPU I'm using: GPU 0: Quadro RTX 6000 (UUID: GPU-a3f29002-16ac-d69e-185e-bec63d41ed44)
Important SLURM Commands
Command | Purpose |
---|---|
srun | Interactive job |
squeue -l | List active/pending jobs |
scancel | Cancel a job |
sinfo | Resource information |
sinfo -o “%30N %10c %15m %30G” | GPU information |
Resource Limits and Defaults
Default resources (if unspecified):
- 4-hour time limit
- 1 CPU
- 1 GB RAM
- Default_partition
Note: Accurately specify resources (memory, CPU, GPU, partition, and time limit) to avoid termination by scheduler.
More Information
Directories
- Home directories: /home/<NETID> from group NFS server.
- Shared Data: /share/DATASERVER/ (DATASERVER = your NFS server).
- Shared datasets in /scratch/datasets.
- Temporary storage for active jobs:
- Tmp directory: /tmp, auto-monitored and cleared.
- Scratch directories: Local GPU storage (/scratch), users manage cleanup.
- Disk quotas are in place for some research groups; check usage with:
-
quota -s
-
Software
- Slurm v24.11.4 (workload manager/job scheduler)
- Anaconda3 (Python environment — /share/apps/anaconda3/2022.10 )
- Apptainer v1.4.0 (formerly known as Singularity) (Docker compatibility — /share/apps/apptainer/1.4.0)
- OpenMPI 5.0.7 (default MPI capability)
- CUDA v12.8.1 – cuDNN v8.9.7
- CUDA v12.0 – cuDNN v8.8.1 (Default CUDA installation)
Licensed Software
Often installed on specific nodes per research group request
- Ansys Fluent
- Ansys Lumerical
- Gaussian
- Gurobi Optimizer v12.0.1
- MATLAB R2023a
Additional software
For software not listed above, we recommend using Anaconda (conda) which supports installing different software packages into virtual environments in your home directory for use across the cluster. Search for packages on https://anaconda.org/
Example conda environment creation for pytorch
Run this the first time you use conda:
/share/apps/software/anaconda3/bin/conda init
Follow prompts, then launch your shell again; ie: bash
Proceed here if you’ve already initialized conda
conda create -p ~/myenv python=3.12 conda activate ~/myenv conda install pytorch::pytorch
Partitions and Priority
- All jobs, including interactive jobs, must be submitted to a specific partition (or queue).
- Preemption order. Please submit your job to the lowest priority that is needed.
- default_partition for Low-Priority
- For batch and interactive jobs requiring CPUs and/or GPUs, use the “default_partition” partition.
- This partition can send jobs to any node on the cluster.
- gpu for Medium-priority, GPU required
- For batch and interactive jobs requiring GPUs, use the “gpu” partition.
- This partition can send jobs to any node with a GPU on the cluster.
- This partition will preempt any jobs running on a GPU server which were submitted to the Low-Priority partition.
- High-priority: for batch and interactive jobs requiring CPUs and/or GPUS, use the priority queue that belongs to your group.
- Only the servers owned by the faculty to whom this priority partition belongs will be available through these partitions.
- This partition will preempt any jobs running on any server (owned by faculty to which it belongs) which were submitted to the Low/Medium-Priority partitions.
You are limited to the resources that your original job requested Ex: If your original job asked for 1 GPU, that is all you will be able to see/use if you log into the node. Once the job ends, you’ll no longer be able to directly access the node.