Extensibility¶
The Climate API is designed around a consistent plugin pattern: built-in behaviour lives in the package, and custom behaviour is layered on top through a plugins_dir directory and Python dotted paths — without forking or patching core code.
The same pattern applies at every extension point:
| Extension point | How to extend | Plugin location |
|---|---|---|
| Dataset templates | YAML files | plugins_dir/datasets/ |
| Ingestion functions | Python function, dotted path in YAML | any importable path |
| Transform functions | Python function, dotted path in YAML | any importable path |
| Processes | YAML file + Python function | plugins_dir/processes/ |
Dataset templates¶
Dataset templates are YAML files that describe a data source. Built-ins live in the package (climate_api/data/datasets/). Custom templates are loaded from plugins_dir/datasets/.
All *.yaml files in plugins_dir/datasets/ are merged with the built-ins. A custom template with the same id as a built-in overrides it — useful for adjusting lag times, display ranges, or availability settings on an existing dataset.
See Adding custom datasets for the full template field reference.
Ingestion functions¶
The ingestion.function field in a dataset template is a dotted Python path to the download function that fetches data for that dataset.
The function must follow the download function contract (see Adding custom datasets). It can live anywhere that is importable — either an installed package or a module placed directly under plugins_dir (which is automatically added to sys.path).
Transform functions¶
Transforms are functions applied to a dataset after download and before the Zarr store is written. They are declared as a list of dotted paths in the dataset template:
Each transform receives the xr.Dataset and the dataset template dict, and returns a (possibly modified) xr.Dataset:
import xarray as xr
from typing import Any
def clamp_negatives(ds: xr.Dataset, dataset: dict[str, Any]) -> xr.Dataset:
varname = dataset["variable"]
return ds.assign({varname: ds[varname].clip(min=0)})
Dict entries with params are also supported:
The params dict is forwarded as keyword arguments to the transform function. Custom transforms can live in any importable package or in a module under plugins_dir.
For the built-in transforms and a full description of the pipeline, see Transforms.
Processes¶
Processes are named operations that produce derived datasets (e.g. temporal resampling). They are backed by YAML files and dispatched via POST /processes/{id}/execution.
Built-in processes live in climate_api/data/processes/. Custom processes are loaded from plugins_dir/processes/.
Process YAML¶
- id: my_process
title: My custom process
description: Describe what this process does.
version: "0.1.0"
expose: true
jobControlOptions:
- sync-execute
execution:
function: mypackage.processes.my_process.execute
| Field | Required | Description |
|---|---|---|
id |
Yes | Unique process identifier. Used in POST /processes/{id}/execution |
title |
Yes | Human-readable title exposed through the public process catalogue |
description |
No | Longer description shown in API responses |
version |
No | Process version string exposed through the public process description |
expose |
No | Whether the process appears in the public /processes listing. Default: true |
jobControlOptions |
No | Supported execution modes exposed publicly. Default: ["sync-execute"] |
execution.function |
Yes | Dotted path to the Python function that runs the process |
A custom process with the same id as a built-in overrides it.
Execution function¶
The current built-in execution path accepts the raw JSON request body as keyword arguments and returns a JSON-serialisable dict:
from typing import Any
def execute(*, source_dataset_id: str, factor: float, **_ignored: Any) -> dict[str, Any]:
...
return {"status": "completed", "artifact_id": "..."}
Invalid or missing arguments raise TypeError, which the route dispatcher catches and returns as HTTP 400.
For the built-in resample process and usage examples, see Processes.
What is not pluggable¶
Availability functions (sync.availability.latest_available_function) accept a dotted path but only resolve built-in functions in climate_api.providers.availability. Plugin paths are not reliably supported — the path is resolved without plugins_dir on sys.path. Use one of the built-in availability functions instead, or open an issue if a new provider cadence is needed.
See issue #95 for the planned fix.