pipeline.yml Reference¶
pipeline.yml is the top-level declarative configuration. It defines all derivative computations, links to datasets, and controls execution.
Top-Level Keys¶
datasets: datasets.yml # path to datasets.yml
mount_point: local # active mount point (matches key in datasets.yml)
new_definitions: custom_nodes.py # optional: custom node module(s)
n_jobs: null # optional: parallelism (null = serial, -1 = all cores)
joblib_backend: loky # optional: joblib backend
joblib_prefer: processes # optional: joblib prefer hint
DerivativeDefinitions:
<DerivativeName>:
...
DerivativeList:
- <DerivativeName>
- ...
datasets¶
Path to datasets.yml. Relative paths are resolved from the pipeline YAML location.
mount_point¶
Selects which environment’s paths to use from datasets.yml. Must match a key in each dataset’s file_pattern / derivatives_path maps.
new_definitions¶
Path (or list of paths) to Python modules that register custom nodes. Loaded before any derivatives execute.
new_definitions: custom_nodes.py
# or multiple files:
new_definitions:
- custom_nodes/nodes_a.py
- /abs/path/to/nodes_b.py
Relative paths resolved from the pipeline YAML location. Each module is executed once on import.
DerivativeDefinitions¶
Each key is a derivative name (CamelCase by convention). The value is a derivative definition:
DerivativeDefinitions:
MyDerivative:
save: True # default True
overwrite: False # default False
for_dataframe: False # default False
nodes:
- id: 0
...
- id: 1
...
Derivative Flags¶
Key |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Persist output artifacts to disk. |
|
bool |
|
Force recompute even if cached output exists. |
|
bool |
|
Include this derivative when calling |
Node Steps¶
Each entry in nodes is a step with a unique id. Steps execute in topological order. A step is either a compute step (runs a node function) or a reuse step (loads a derivative from disk).
Compute Step¶
- id: 1
node: my_node_name
args:
input_data: id.0 # reference to step 0's output
param_a: 42
param_b: [1, 40]
node: name of a registered node functionargs: keyword arguments passed to the node; values of the formid.<N>resolve to the artifact produced by stepN
Reuse Step (load derivative from disk)¶
- id: 0
derivative: CleanedEEG.fif
derivative:<DerivativeName>.<ext>— loads the named derivative for the current file from diskUse
SourceFileto load the raw input file
id.N References¶
id.<N> in args resolves to the output of step N. When a step produces multiple artifacts (multiple extensions), the first artifact is returned unless the extension is specified.
DerivativeList¶
Controls which derivatives execute and in what order. Derivatives not listed here are defined but never run. Comment out entries to skip without removing definitions:
DerivativeList:
- CleanedEEG
- CrossSpectralDensity
- PowerSpectrum
# - SpectralEntropy # skip this one
- BandPower
Complete Example¶
datasets: datasets.yml
mount_point: local
new_definitions: custom_nodes.py
DerivativeDefinitions:
CleanedEEG:
nodes:
- id: 0
derivative: SourceFile
- id: 1
node: basic_preprocessing
args:
mne_object: id.0
resample: 256
filter_args:
l_freq: 0.5
h_freq: 110
CrossSpectralDensity:
nodes:
- id: 0
derivative: CleanedEEG.fif
- id: 1
node: welch_csd
args:
data: id.0
n_fft: 1024
Coherence:
nodes:
- id: 0
derivative: CrossSpectralDensity.nc
- id: 1
node: compute_coherence
args:
csd: id.0
bands:
alpha: [8, 12]
PowerSpectrum:
for_dataframe: True
nodes:
- id: 0
derivative: CrossSpectralDensity.nc
- id: 1
node: extract_auto_spectra
args:
csd: id.0
SpectralEntropy:
overwrite: True
nodes:
- id: 0
derivative: PowerSpectrum.nc
- id: 1
node: spectral_entropy
args:
psd: id.0
BandPower:
save: False
nodes:
- id: 0
derivative: PowerSpectrum.nc
- id: 1
node: extract_bands
args:
psd: id.0
bands:
alpha: [8, 12]
beta: [13, 30]
AlphaNetworkCoupling:
nodes:
- id: 0
derivative: BandPower.nc
- id: 1
derivative: Coherence.nc
- id: 2
node: correlate_power_connectivity
args:
bandpower: id.0
coherence: id.1
band: alpha
DerivativeList:
- CleanedEEG
- CrossSpectralDensity
- PowerSpectrum
# - SpectralEntropy
- Coherence
- BandPower
- AlphaNetworkCoupling
This pipeline corresponds to the following DAG:
SourceFile
└─ CleanedEEG
└─ CrossSpectralDensity
├─ PowerSpectrum (for_dataframe)
│ ├─ SpectralEntropy (overwrite=True)
│ └─ BandPower (save=False)
│ └─ AlphaNetworkCoupling ←─ Coherence
└─ Coherence