Using the Graphite CPU Environment

These instructions apply to the Graphite CPU environment.  For information on the Graphite GPU environment, please see: https://it.coecis.cornell.edu/graphitegpu/

Introduction

  • The Graphite cluster is a collection of GPU, CPU and storage servers running Ubuntu 16.04 LTS using a SLURM workload manager.  The original Graphite environment consisted of faculty owned servers containing GPUs and faculty owned NAS storage servers for their research group storage needs (similar to the Magma NAS environment).
  • To provide an MPI environment similar to the Magma cluster, a number of servers not containing GPUs were added to the cluster to act as MPI enabled compute nodes.  The college is currently providing several servers, each with 8 cores and 32G of memory and faculty / departments will be able to add purchased compute nodes and storage as desired.
  • The structure of the cluster is such that a user will use ssh to log into a login node (graphite-login.coecis.cornell.edu).  Compute nodes and storage servers are attached to a private network and faculty owned storage servers export filesystems that are mounted on all compute nodes as well as the head and login nodes.  Users must be associated with a faculty member to be able to access storage on their servers.  Jobs are submitted to the compute nodes by scripts submitted to either a default partition (queue) or a high-priority partition.  Jobs submitted to a default partition can be loaded onto any compute node with the needed resources.  Jobs submitted to a faculty owned high-priority partition (only for users associated with that faculty owner) will be put onto one of the faculty’s purchased compute nodes and this will preempt any existing jobs that were placed on that compute node by the default partition.  If the default jobs were submitted with a –requeue option, they will be requeued … otherwise, they are simply terminated.

Using the CPU Environment

  • Users log into the cluster via SSH at graphite-login.coecis.cornell.edu using their Cornell NetID and password.  Users create a submission script (containing Slurm specific directives) and request all resources needed, such as number of nodes, cpus, amount of memory and a shell script (containing the executable and shell info).  Using the “sbatch” command, the user will submit a job to either a default partition (mpi-cpus) or a high-priority partition, which will then be scheduled and executed by the SLURM workload manager and results will be placed in the location specified by the job. You must ALWAYS specify a partition to use as the default partition is for GPU use only.
  • It is important to tell the scheduler what resources that your job will need. The scheduler does not necessarily use the numbers that you give it to control your job, but it makes sure that (if each job accurately requests the resources needed) jobs won’t be scheduled on compute nodes that cannot support them or that have not enough resources unscheduled.
  • It is also important to tell your application what resources it can use.  For example, if you do not limit a MATLAB job, it will use every core on every server that it is running on.  Either you need to request exclusive access to the node for your job, or you need to tell MATLAB to limit its use. The scheduler is currently set up to terminate a job that tries to use more memory than a job asks for.
  • Anaconda is installed into /share/apps, you can use it to configure (in your home directory) the version of python you want with the packages (such as tensorflow) you want.

Create a SLURM Submission Script:

Example: mpi_hello_world.sub
#!/bin/bash
#SBATCH -J hello_world                              # Job name
#SBATCH -o /home/netid/output/hello_world_%j.out    # Name of stdout output file(%j expands to jobId)
#SBATCH -e /home/netid/output/hello_world_%j.err    # Name of stderr output file(%j expands to jobId)
#SBATCH --nodes=7                                   # Total number of nodes requested
#SBATCH --ntasks=7                                  # Total number of tasks to be configured for
#SBATCH --tasks-per-node=1                          # Sets number of tasks to run on each node
#SBATCH --cpus-per-task=1                           # Number of cpus needed by each task (if task is "make -j3" number should be 3)
#SBATCH --get-user-env                              # Tells sbatch to retrieve the users login environment
#SBATCH -t 00:10:00                                 # Time limit (hh:mm:ss)
#SBATCH --mem-per-cpu=1000                          # Memory required per allocated CPU
#SBATCH --mem=1000M                                 # Memory required per node
#SBATCH --partition=mpi-cpus                        # Which partition/queue it should run on
/home/netid/mpi_hello_world.sh                      # Executable script containing desired commands

Create a shell script to be run on the cluster:

Example: mpi_hello_world.sh
#!/bin/bash
source /etc/profile.d/modules.sh
module load openmpi-4.0.0
mpirun -np 7 /bin/hostname

Submit the job to the SLURM batch queue:

sbatch --requeue mpi_hello_world.sub

SLURM Commands:

  • srun                                                                   # When using srun, select the “interactive” partition
  • squeue -l                                                           # Get list of active or recently completed jobs
  • scancel 183                                                      # Cancel an existing job, where 183 is the job number retrieved by the squeue command
  • sinfo -o %N,%P                                                 # Get info on nodes available and their associated partition
  • sinfo                                                                  # Get info on compute resources