Nextflow acts as a middle layer between the user and the job scheduler Slurm. Nextflow v24.04.0 and newer offers native support for job arrays for grid schedulers, but it’s an “experimental” feature as stated on their doc and still under development.
Terms:
Below is an example of how a multi-step Nextflow-based workflow interacts with Slurm. Source: USRSE’25 paper

Rapid fairshare score consumption & difficulty of backfilling child jobs
→ increased waiting time
→ increased time for the main job
→ main job crashes due to timeout
Because the fairshare score will drop to zero very fast, so the child jobs have to use the backfilling mechanism of Slurm.

Intolerance of the child job failures
→ main job crashes
Below is an example of how a child job can crash the main job (many ways).

seff140 error code for “not enough resources”Reduce queueSize and array size during test runs
Consider Phx for CPU-intensive workflow
Below is a screenshot of the Phx supercomputer when an efficient Nextflow workflow was running and taking up most of the public CPU nodes.


purple line: without job array
green: with job array

Using nextflow.config to control the workflow settings:
// Nextflow config for lastz jobs
executor {
queueSize = 10000 // default: 100, how many living jobs (pending or running) slurm will hold all the time for a main job. Nextflow dev is working on fixing some small bugs around this parameter.
retry.maxAttempt = 3 // default: 3, for slurm to resubmit any jobs.
killBatchSize = 1000000 // default: 100
}
process {
executor = 'slurm' // using "local" on a single node will mimic the workstation setting
memory = { 4.GB * task.attempt } // dynamic allocation
time = { 1.hour * task.attempt } // dynamic allocation
queue = 'public'
cpus = 1
array = 2000 // how many sub-jobs a job array will hold, must be int, cannot be a variable
maxRetries = 5 // for nextflow to retry a child job.
errorStrategy = 'retry'
maxErrors = '-1' // total errors a job step can have, "-1" to be unlimited
clusterOptions = '--signal=USR2@180' // early stop
}
Phx cluster
public/public, pc[005-337]
total CPU = 333*28 = 9324 --> queueSize = 10000
memory per node --> can support the queueSize
122G/4 = 30 jobs, 30 jobs * 293 = 8790 jobs
249G/4 * 36 = 62 * 36 = 2232 jobs
184G/4 * 4 = 46 * 4 = 184 jobs
Array size = 2000
Implementation of make_lastz_chains on the Phx cluster