Positive Definite Ecology - Introduction to Hyak

Author

John Best

Published

February 24, 2020

NOTE: This information was accurate as of early 2020, but is now (April 2022) slightly out of date; e.g. the klone cluster is now available and interactive nodes are no longer a separate partition.

Use Hyak when you need additional resources

More processing power
Models or data don’t fit in RAM

Think carefully about how to use Hyak

Do you really need to use it?
How can you use it efficiently?

Parallelization

Which steps are independent?
Which steps are bottlenecks?
How to minimize overhead?

Example: Simulation study

Specify operating and estimation models
Simulate 100 operating models
Fit 300 estimation models
Calculate metrics
Summarize results

Which steps are independent? Which step is the bottleneck? Are there steps that could be done together?

Example: Simulation study

Parallelization strategies

What step takes most time?
Which step is performed most often?
Are there natural save points?
Which computations can be reused?
Are there other limitations?

Checkpointing

Save progress
Restart from saved position
Use the ckpt queue

What is Hyak?

Mox
~~IKT~~

Logging in

ssh jkbest@mox.hyak.uw.edu

Keep your phone handy for 2FA!

How to choose a node type

Request an interactive node

build node:

srun -p build --mem=10G --time=1:00:00 --pty /bin/bash

stf-int node:

srun -p stf-int -A stf --cpus-per-task=28 --mem=120G\
    --time=1:00:00 --pty /bin/bash

Relevant SLURM arguments

srun: Request in interactive node
sbatch: Submit a batch job
-p: Partition; node type to use
-A: Account; usually stf
--cpus-per-task: Threads per program
--tasks-per-node: Processes to run on each node
--nodes: Number of nodes to request
--mem: Amount of RAM to request
--time: Time to reserve
--pty: Program to start

Connecting a second time

To see which node you are using:

squeue -u jkbest

  JOBID PARTITION ... NODELIST(REASON)
1838469     build ... n2232

From login node:

ssh n2232

Now monitor usage with e.g. htop

Modules

Key commands:

module --help: Reminders
module avail: List available modules
module apropos srchstr: Search modules for srchstr
module load abcdef: Attach module abcdef
module unload abcdef: Detach module abcdef

Using a `build` node

Request a build node.
Load the appropriate R module (or have your custom version accessible).
Run R at the command line to open R.
Use install.packages to install what you need.

Using an `stf-int` node

Request an stf-int node (see above).
Load the R module.
Open R and run:

library(TMB)

runExample("simple")

Submit a batch job to `stf`

The “driver script” runs your analysis, including loading packages, reading data, and sourceing other files with functions etc. The SLURM script specifies the resources you need and how to run your driver script. Examples of scripts that I have used:

Note that these are just example scripts, I haven’t made them self-contained so they won’t run for you. That said, the fitsims.slurm file should be a good starting point for your own SLURM scripts. In the directory with the SLURM script, you can submit the job:

sbatch fitsims.slurm

Tips:

Write a single “driver” script
Use Rscript to run the driver script

There are three places to store files

Home

\home\xyz
Persistent, but low performance
Configuration files

GScratch

\gscratch\stf\xyz
Create your own directory.
30-day limit
Scratch directory for computations

Lolo

Long-term tape storage
Must bundle files together

How to transfer files

sftp: Connect to mox.hyak.uw.edu
sshfs: Attach to local file system
git: Code managed in e.g. a GitHub repo

Advanced usage

Add module imports to your .bashrc file
Add other customizations
Set up sshfs
Compile your own R with MKL
Write your own module for everyone to use

Use Hyak when you need additional resources

Think carefully about how to use Hyak

Parallelization

Example: Simulation study

Example: Simulation study

Parallelization strategies

Checkpointing

What is Hyak?

Logging in

How to choose a node type

Request an interactive node

Relevant SLURM arguments

Connecting a second time

Modules

Using a build node

Using an stf-int node

Submit a batch job to stf

There are three places to store files

Home

GScratch

Lolo

How to transfer files

Advanced usage

Using a `build` node

Using an `stf-int` node

Submit a batch job to `stf`