HPC / SLURM Array Jobs¶
NeuroDAGs processes files independently — each file is self-contained. This maps naturally onto SLURM array jobs: one array task per file, all tasks running in parallel across nodes.
The key parameter is only_index, which restricts a run to specific file indices from the full file list. Combined with $SLURM_ARRAY_TASK_ID, each array task processes exactly one file.
Generating submission scripts¶
Use neurodags slurm-script to generate a ready-to-edit SLURM template populated with your pipeline’s actual derivative names and dependency order:
# per-file (default) — all derivatives per file task
neurodags slurm-script pipeline.yml --output run_array.sh
# flat — file x derivative flat array (max parallelism, independent derivatives only)
neurodags slurm-script pipeline.yml --pattern flat --output run_array_per_deriv.sh
# chained — per-derivative sequential arrays linked via --dependency=afterok
neurodags slurm-script pipeline.yml --pattern chained --output submit_pipeline.sh
The chained pattern writes two files: submit_pipeline.sh (the submission script) and run_one_derivative.sh (the worker). Edit the #SBATCH resource lines before submitting.
Counting files¶
neurodags count prints the number of unique files the pipeline will process — useful in submission scripts:
N=$(neurodags count pipeline.yml)
sbatch --array=0-$((N - 1)) run_array.sh
per-file: One Array Task per File (All Derivatives)¶
Each task runs all derivatives in dependency order for a single file.
N=$(neurodags count pipeline.yml)
sbatch --array=0-$((N - 1)) run_array.sh
#!/bin/bash
# run_array.sh
#SBATCH --job-name=neurodags
#SBATCH --time=02:00:00
#SBATCH --mem=16G
#SBATCH --cpus-per-task=1
#SBATCH --output=logs/neurodags_%A_%a.out
#SBATCH --error=logs/neurodags_%A_%a.err
source activate myenv
python - <<'EOF'
import os
from neurodags.loaders import load_configuration
from neurodags.orchestrators import run_pipeline
config = load_configuration("pipeline.yml")
run_pipeline(
config,
only_index=int(os.environ["SLURM_ARRAY_TASK_ID"]),
raise_on_error=True,
skip_errors=True,
)
EOF
raise_on_error=True makes the SLURM task exit with a non-zero code on failure, so sacct correctly reports failed tasks. skip_errors=True skips files whose previous run wrote a .error marker.
flat: One Array Task per File × Derivative¶
Use this when derivatives are independent (no inter-derivative dependencies) and you want maximum parallelism.
N=$(neurodags count pipeline.yml)
N_DERIVATIVES=3
TOTAL=$(( N * N_DERIVATIVES ))
sbatch --array=0-$((TOTAL - 1)) run_array_per_deriv.sh
#!/bin/bash
# run_array_per_deriv.sh
#SBATCH --job-name=neurodags
#SBATCH --time=02:00:00
#SBATCH --mem=8G
#SBATCH --output=logs/neurodags_%A_%a.out
# Task ID encodes both file index and derivative index
# e.g. 3 derivatives × N files → array size = 3 × N
# SLURM_ARRAY_TASK_ID = file_index * n_derivatives + derivative_index
DERIVATIVES=("Preprocessed" "Spectrum" "BandPower")
N_DERIVATIVES=${#DERIVATIVES[@]}
FILE_INDEX=$(( SLURM_ARRAY_TASK_ID / N_DERIVATIVES ))
DERIV_INDEX=$(( SLURM_ARRAY_TASK_ID % N_DERIVATIVES ))
DERIVATIVE=${DERIVATIVES[$DERIV_INDEX]}
source activate myenv
python - <<EOF
from neurodags.loaders import load_configuration
from neurodags.orchestrators import run_pipeline
config = load_configuration("pipeline.yml")
run_pipeline(
config,
derivatives=["$DERIVATIVE"],
only_index=$FILE_INDEX,
raise_on_error=True,
)
EOF
Note: When derivatives have inter-dependencies (e.g.
SpectrumreadsPreprocessedoutput), this pattern requiresPreprocessedto finish beforeSpectrumstarts. Use SLURM--dependency=afterokbetween derivative-level job arrays, or use Pattern 1 which respects dependency order automatically.
chained: Per-Derivative Sequential Array¶
Run derivatives one at a time in dependency order, with all files parallelised within each derivative. Use SLURM job dependencies to chain them.
#!/bin/bash
# submit_pipeline.sh
N=$(neurodags count pipeline.yml)
ARRAY="0-$((N - 1))"
JOB1=$(sbatch --parsable --array=$ARRAY \
--export=DERIVATIVE=Preprocessed \
run_one_derivative.sh)
JOB2=$(sbatch --parsable --array=$ARRAY \
--dependency=afterok:$JOB1 \
--export=DERIVATIVE=Spectrum \
run_one_derivative.sh)
sbatch --array=$ARRAY \
--dependency=afterok:$JOB2 \
--export=DERIVATIVE=BandPower \
run_one_derivative.sh
#!/bin/bash
# run_one_derivative.sh
#SBATCH --job-name=neurodags
#SBATCH --time=01:00:00
#SBATCH --mem=8G
#SBATCH --output=logs/%x_%A_%a.out
source activate myenv
python - <<EOF
import os
from neurodags.loaders import load_configuration
from neurodags.orchestrators import run_pipeline
config = load_configuration("pipeline.yml")
run_pipeline(
config,
derivatives=["$DERIVATIVE"],
only_index=int(os.environ["SLURM_ARRAY_TASK_ID"]),
raise_on_error=True,
skip_errors=True,
)
EOF
Checking Failed Tasks¶
After a run, find failed files via .error markers:
find derivatives/ -name "*.error" | sort
Each .error file contains the derivative name, step, and exception. To retry only failed files, delete their .error markers and resubmit — or use skip_errors=False (the default) which retries them automatically.
To skip known-bad files and process only new/uncached ones:
run_pipeline(config, only_index=index, skip_errors=True)
Tips¶
Concern |
Recommendation |
|---|---|
Cluster paths differ from local |
Set |
Node has multiple cores |
Set |
Partial reruns |
Caching is automatic — resubmit the full array, cached files are skipped |
Debugging |
Run |
Array size limit |
Most clusters cap at 1000–10000 tasks; split |