pipeline.yml Reference

pipeline.yml is the top-level declarative configuration. It defines all derivative computations, links to datasets, and controls execution.

Top-Level Keys

datasets: datasets.yml          # path to datasets.yml
mount_point: local              # active mount point (matches key in datasets.yml)
new_definitions: custom_nodes.py  # optional: custom node module(s)

n_jobs: null                    # optional: parallelism (null = serial, -1 = all cores)
joblib_backend: loky            # optional: joblib backend
joblib_prefer: processes        # optional: joblib prefer hint

DerivativeDefinitions:
  <DerivativeName>:
    ...

DerivativeList:
  - <DerivativeName>
  - ...

datasets

Path to datasets.yml. Relative paths are resolved from the pipeline YAML location.

mount_point

Selects which environment’s paths to use from datasets.yml. Must match a key in each dataset’s file_pattern / derivatives_path maps.

new_definitions

Path (or list of paths) to Python modules that register custom nodes. Loaded before any derivatives execute.

new_definitions: custom_nodes.py

# or multiple files:
new_definitions:
  - custom_nodes/nodes_a.py
  - /abs/path/to/nodes_b.py

Relative paths resolved from the pipeline YAML location. Each module is executed once on import.


DerivativeDefinitions

Each key is a derivative name (CamelCase by convention). The value is a derivative definition:

DerivativeDefinitions:
  MyDerivative:
    save: True             # default True
    overwrite: False       # default False
    for_dataframe: False   # default False
    nodes:
      - id: 0
        ...
      - id: 1
        ...

Derivative Flags

Key

Type

Default

Description

save

bool

True

Persist output artifacts to disk. False = compute but don’t write.

overwrite

bool

False

Force recompute even if cached output exists.

for_dataframe

bool

False

Include this derivative when calling build_derivative_dataframe.

Node Steps

Each entry in nodes is a step with a unique id. Steps execute in topological order. A step is either a compute step (runs a node function) or a reuse step (loads a derivative from disk).

Compute Step

- id: 1
  node: my_node_name
  args:
    input_data: id.0     # reference to step 0's output
    param_a: 42
    param_b: [1, 40]
  • node: name of a registered node function

  • args: keyword arguments passed to the node; values of the form id.<N> resolve to the artifact produced by step N

Reuse Step (load derivative from disk)

- id: 0
  derivative: CleanedEEG.fif
  • derivative: <DerivativeName>.<ext> — loads the named derivative for the current file from disk

  • Use SourceFile to load the raw input file

id.N References

id.<N> in args resolves to the output of step N. When a step produces multiple artifacts (multiple extensions), the first artifact is returned unless the extension is specified.


DerivativeList

Controls which derivatives execute and in what order. Derivatives not listed here are defined but never run. Comment out entries to skip without removing definitions:

DerivativeList:
  - CleanedEEG
  - CrossSpectralDensity
  - PowerSpectrum
  # - SpectralEntropy     # skip this one
  - BandPower

Complete Example

datasets: datasets.yml
mount_point: local
new_definitions: custom_nodes.py

DerivativeDefinitions:
  CleanedEEG:
    nodes:
      - id: 0
        derivative: SourceFile
      - id: 1
        node: basic_preprocessing
        args:
          mne_object: id.0
          resample: 256
          filter_args:
            l_freq: 0.5
            h_freq: 110

  CrossSpectralDensity:
    nodes:
      - id: 0
        derivative: CleanedEEG.fif
      - id: 1
        node: welch_csd
        args:
          data: id.0
          n_fft: 1024

  Coherence:
    nodes:
      - id: 0
        derivative: CrossSpectralDensity.nc
      - id: 1
        node: compute_coherence
        args:
          csd: id.0
          bands:
            alpha: [8, 12]

  PowerSpectrum:
    for_dataframe: True
    nodes:
      - id: 0
        derivative: CrossSpectralDensity.nc
      - id: 1
        node: extract_auto_spectra
        args:
          csd: id.0

  SpectralEntropy:
    overwrite: True
    nodes:
      - id: 0
        derivative: PowerSpectrum.nc
      - id: 1
        node: spectral_entropy
        args:
          psd: id.0

  BandPower:
    save: False
    nodes:
      - id: 0
        derivative: PowerSpectrum.nc
      - id: 1
        node: extract_bands
        args:
          psd: id.0
          bands:
            alpha: [8, 12]
            beta: [13, 30]

  AlphaNetworkCoupling:
    nodes:
      - id: 0
        derivative: BandPower.nc
      - id: 1
        derivative: Coherence.nc
      - id: 2
        node: correlate_power_connectivity
        args:
          bandpower: id.0
          coherence: id.1
          band: alpha

DerivativeList:
  - CleanedEEG
  - CrossSpectralDensity
  - PowerSpectrum
  # - SpectralEntropy
  - Coherence
  - BandPower
  - AlphaNetworkCoupling

This pipeline corresponds to the following DAG:

SourceFile
  └─ CleanedEEG
       └─ CrossSpectralDensity
            ├─ PowerSpectrum (for_dataframe)
            │    ├─ SpectralEntropy (overwrite=True)
            │    └─ BandPower (save=False)
            │         └─ AlphaNetworkCoupling ←─ Coherence
            └─ Coherence