Comparison with Other Workflow Managers

NeuroDAGs has two layers with different scopes:

  • Orchestration layer — file discovery, caching, dependency ordering, HPC templates, dataframe assembly. This is fully domain-agnostic. Any per-file analysis pipeline (audio, fMRI, genomics, tabular time series, images) can use it by providing a custom loader and custom nodes.

  • Built-in nodes — target EEG/MEG/ECG via MNE-Python and xarray. These are optional; you do not need them to use the framework.

If your data comes in files and you process each file independently, NeuroDAGs orchestration applies regardless of domain.

Quick Comparison Table

Feature

NeuroDAGs

Snakemake / Snakebids

Pydra

Philosophy

Derivative-centric (Push)

File-centric (Pull)

Task-centric Dataflow

Config Style

YAML + Python nodes

Python-based DSL

Pure Python API

Output Naming

Automatic `@Derivative`

Manual Wildcards / `bids()`

Managed Caching (machine-readable)

BIDS Relation

BIDS-Preserving (Flexible)

BIDS-Aware (Strict)

BIDS-Agnostic (Glue code needed)

Aggregation

Built-in `dataframe`

Manual “Reduce” Rules

Manual “Combiners”

Data Types

MNE / xarray first-class; generic artifacts supported

Generic Files

Generic Python Objects

Cross-file Operations

Not supported — each file is independent

Supported via reduce rules

Supported via combiners

Cluster Scheduling

Template generation (neurodags slurm-script); orchestration is manual

Native SLURM/SGE/PBS profiles

Native SLURM/PBS via Dask/CF

Cache Invalidation

Existence-based (file present = skip)

Content-hash-based

Content-hash-based

Provenance Tracking

Human-readable .error markers; no full provenance graph

Full provenance via DAG

Hash-based provenance

Maturity / Community

Early-stage, small community

Mature, large community

Active, growing community


Why Choose NeuroDAGs?

1. Convention Over Configuration

In general-purpose managers, you spend significant time managing file paths and wildcards (e.g., `{subject}{session}{task}`). NeuroDAGs uses a “Path-Preserving Prefix” strategy. It automatically appends `@DerivativeName` to the original filename. This ensures that your output directory structure perfectly mirrors your input, regardless of how many BIDS entities (run, acquisition, etc.) your data contains, without you ever writing a regex.

2. “Push” vs. “Pull” Logic

Tools like Snakemake are “Pull-based”: you define what output you want, and it works backward. This is powerful for heterogeneous pipelines but can be complex for irregular datasets. NeuroDAGs is “Push-based”: you define a pipeline and “push” your files through it. Derivatives are automatically sorted by dependency order before execution. This is often more intuitive for standard preprocessing and feature extraction workflows.

3. Built-in Data Aggregation

The goal of most signal processing pipelines is a tidy dataframe for statistical analysis. NeuroDAGs includes the `build_derivative_dataframe` utility which understands `xarray` coordinates and metadata. It can crawl your derivatives and assemble them into a CSV/Parquet file in a single step, a process that usually requires manual script-writing in other frameworks.

4. Human-Readable Caching

While Pydra and others use cryptographic hashes for caching (e.g., `_task_f83e2b1c/`), NeuroDAGs creates human-readable files (e.g., `sub-01_task-rest@Preprocessing.nc`). You can browse your derivatives folder and immediately know what is what. Failed runs write a `.error` marker alongside the expected output; successful retries clean it up automatically.

5. Custom Nodes in Plain Python

Nodes are plain Python functions decorated with `@register_node`. They can be defined in external files loaded via `new_definitions` in the YAML — no forking or subclassing required. MNE and xarray are first-class, but the artifact system accepts any writer function, so other file formats work too.


When NOT to Use NeuroDAGs

  • Cross-file or group-level operations mid-pipeline: NeuroDAGs processes each file independently — group ICA, normalization to a group mean, atlas registration using a subject-average template, or any operation that needs to read from multiple files cannot be expressed as a derivative. These require post-processing outside the framework. If cross-file operations are central to your pipeline, use Snakemake or Pydra.

  • Cache invalidation on code change: Caching is existence-based. NeuroDAGs does not detect when a node’s implementation changed and silently reuses stale outputs. You must manually set overwrite: true or delete cached files after modifying node code. If hash-based invalidation matters, use Pydra.

  • Strict BIDS-App compliance required: Snakebids has native BIDS validation and output layout enforcement. NeuroDAGs preserves input paths but does not validate against the BIDS spec.

  • Production-grade cluster scheduling: neurodags slurm-script generates submission templates but does not integrate with cluster schedulers. Snakemake and Pydra have native SLURM/SGE/PBS profiles with automatic job monitoring, resubmission, and resource accounting.

  • Full provenance tracking: NeuroDAGs has no provenance graph — it cannot tell you which version of a node produced a given file. If audit trails matter, Pydra’s hash-based caching or DVC are better fits.

  • Non-file-based pipelines: NeuroDAGs assumes each unit of work is an input file discoverable via a glob pattern. Database records, streaming data, and large-scale generic ETL do not fit this model.

  • Large, mature community support: NeuroDAGs is early-stage with no ecosystem of reusable nodes. Snakemake and Nipype have years of community wrappers, tutorials, and battle-tested HPC configurations.


Concrete BIDS Examples

The same task — smooth every bold file in a BIDS dataset — shown in each framework.

Snakebids

Snakebids wraps Snakemake with a generate_inputs() call that parses the BIDS dataset via pybids. You then write a Snakemake rule using the bids() helper for output naming, and a rule all that calls expand() to iterate over every discovered subject/session/run combination.

# config.yaml
pybids_inputs:
  bold:
    filters:
      suffix: bold
      extension: .nii.gz
      datatype: func
    wildcards:
      - subject
      - task
      - run
# Snakefile
from snakebids import generate_inputs, bids

inputs = generate_inputs(bids_dir=config['bids_dir'],
                         pybids_inputs=config['pybids_inputs'])

rule all:
    input:
        inputs['bold'].expand(
            bids(root='results', fwhm='{fwhm}', suffix='bold.nii.gz',
                 **inputs['bold'].wildcards),
            fwhm=config['fwhm'],
        )

rule smooth:
    input:  inputs['bold'].path
    output: bids(root='results', fwhm='{fwhm}', suffix='bold.nii.gz',
                 **inputs['bold'].wildcards)
    shell:  'fslmaths {input} -s {params.sigma} {output}'

A run.py BIDS-App entry point is also required:

from snakebids import bidsapp, plugins
app = bidsapp.app([plugins.SnakemakeBidsApp(Path(__file__).resolve().parent)])

Verdict: Excellent BIDS-App compliance and strict output validation. Requires learning Snakemake DSL, expand() mechanics, wildcard management, and pybids filter config. Best when strict BIDS-App compliance or heterogeneous file transformations are the primary goal.


Pydra

Pydra uses Python decorators to define tasks. BIDS file discovery relies on external helpers (e.g. Nilearn’s first_level_from_bids). The pipeline is a two-level workflow executed via a concurrent submitter. Results are cached under ~/.cache/pydra/ using cryptographic hashes.

from pydra.mark import python

@python.define(outputs=["smoothed_path"])
def SmoothFile(bold_path: str, fwhm: float) -> str:
    import nibabel as nib
    from nilearn.image import smooth_img
    img = smooth_img(bold_path, fwhm=fwhm)
    out = bold_path.replace('.nii.gz', f'_smooth-{fwhm}.nii.gz')
    img.to_filename(out)
    return out

# BIDS file discovery requires an external call
from nilearn.glm.first_level import first_level_from_bids
models, run_imgs, events, confounds = first_level_from_bids(
    dataset_path=bids_dir, task_label='rest', space_label='MNI152')

# Build and run workflow
wf = pydra.Workflow(name="smooth_wf", input_spec=["bold_files"])
wf.add(SmoothFile(name="smooth", bold_path=wf.lzin.bold_files, fwhm=6.0))

with pydra.Submitter(worker='cf', n_procs=4) as sub:
    results = sub(wf)

Output files are saved under manually constructed paths — the framework manages caching internally but does not enforce human-readable derivative naming.

Verdict: Industrial-strength for complex branching dataflows in pure Python. Significant “glue code” needed to handle BIDS file discovery, human-readable output paths, and aggregation. Hash-based caching is robust but opaque.


NeuroDAGs

File discovery, output naming, dependency ordering, and aggregation are handled by the framework. You write the signal processing logic; the YAML wires it up.

# datasets.yml
my_dataset:
  name: MyStudy
  file_pattern: /data/bids/**/*.vhdr
  derivatives_path: /data/derivatives
# pipeline.yml
DerivativeList:
  - Preprocessed
  - BandPower

DerivativeDefinitions:
  Preprocessed:
    nodes:
      - id: 0
        derivative: SourceFile
      - id: 1
        node: basic_preprocessing
        args:
          mne_object: id.0
          filter_args: {l_freq: 1.0, h_freq: 40.0}

  BandPower:
    for_dataframe: true
    nodes:
      - id: 0
        derivative: Preprocessed.fif   # depends on Preprocessed
      - id: 1
        node: bandpower
        args:
          psd_like: id.0
          bands: {alpha: [8, 13], beta: [13, 30]}
from neurodags.loaders import load_configuration
from neurodags.orchestrators import run_pipeline

config = load_configuration("pipeline.yml")
run_pipeline(config)   # discovers files, sorts by dependency, runs all

Output files are named automatically: sub-01_task-rest.vhdr@Preprocessed.fif, sub-01_task-rest.vhdr@BandPower.nc. No wildcards, no expand(), no bids() helper needed.

Verdict: Lowest orchestration overhead for MNE-Python / xarray signal processing pipelines. Not a BIDS-App; does not validate BIDS compliance. No native cluster backend — parallelism is joblib within a single job.


Nipype

The established standard for wrapping neuroimaging CLIs (FSL, FreeSurfer, ANTs, SPM). Best when you need to call existing command-line tools. NeuroDAGs targets Python-native signal processing rather than CLI wrapping.