Skip to content
Ariel Balter edited this page Aug 23, 2017 · 2 revisions

Introduction to Slurm

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. (from https://slurm.schedmd.com/overview.html)

If you need to learn to use Slurm, it is because you will be doing computation on a cluster. So learning about cluster computing is a good place to start.

Cluster Computing

What is a cluster?

A cluster consists of a "head node" and a number of "child nodes" or "compute nodes". The head node is a computer that you can access through the network using ssh. You can use the head node to do tasks like creating directories, creating files, moving files around, and editing files. The head node is a powerful machine, but is not designed for computation. It's job is to run a continuously operating program called a "job scheduler." When you are ready to run one or more jobs, you tell the job scheduler what jobs you want to run, how much memory, disk space, and CPU power those jobs need, and any other needed information. The job scheduler finds the most efficient and fair way to allocate nodes on the cluster to run your jobs along with the ones that others have requested.

Why use a cluster

  • Randomized simulation
    You are performing a simulation that you run many times and produces randomized out put which you will then use for statistical results.

    Rather than running the simulations one at a time, you run them all at once, with each simulation running on a different compute node.

  • Parameter sweeps
    You want to run an analysis or simulation with a variety of different parameters and look at the various outcomes.

    Rather than running them one at a time, you send them out to run simultaneously
    on multiple compute nodes

  • Same analysis on multiple samples
    You have an analysis pipeline you want to run on multiple samples.

    Instead of running them sequentially, you run each sample on a different node.

  • Memory, Disk, or CPU intensive jobs
    You have a process that would bog down your workstation, or not run at all.

    You can run this process on a single compute node while you continue to do other work.

How to use a cluster

When you perform basic operations like managing files and directories, and running small tasks, you work in the same way you would use any other remote computer.

  • Connect through the internet using ssh
  • Do your work
  • Exit the system

It's the Do your work part that you need to learn about.

To begin, go to Lesson 1: Getting to know the cluster