Skip to content

Extensibility

The Climate API is designed around a consistent plugin pattern: built-in behaviour lives in the package, and custom behaviour is layered on top through a plugins_dir directory and Python dotted paths — without forking or patching core code.

The same pattern applies at every extension point:

Extension point How to extend Plugin location
Dataset templates YAML files plugins_dir/datasets/
Ingestion functions Python function, dotted path in YAML any importable path
Transform functions Python function, dotted path in YAML any importable path
Processes YAML file + Python function plugins_dir/processes/

Dataset templates

Dataset templates are YAML files that describe a data source. Built-ins live in the package (climate_api/data/datasets/). Custom templates are loaded from plugins_dir/datasets/.

plugins/
└── datasets/
    └── enacts_rainfall.yaml
# climate-api.yaml
plugins_dir: ./plugins/

All *.yaml files in plugins_dir/datasets/ are merged with the built-ins. A custom template with the same id as a built-in overrides it — useful for adjusting lag times, display ranges, or availability settings on an existing dataset.

See Adding custom datasets for the full template field reference.


Ingestion functions

The ingestion.function field in a dataset template is a dotted Python path to the download function that fetches data for that dataset.

ingestion:
  function: mypackage.sources.enacts.download

The function must follow the download function contract (see Adding custom datasets). It can live anywhere that is importable — either an installed package or a module placed directly under plugins_dir (which is automatically added to sys.path).


Transform functions

Transforms are functions applied to a dataset after download and before the Zarr store is written. They are declared as a list of dotted paths in the dataset template:

transforms:
  - climate_api.transforms.kelvin_to_celsius
  - mypackage.transforms.clamp_negatives

Each transform receives the xr.Dataset and the dataset template dict, and returns a (possibly modified) xr.Dataset:

import xarray as xr
from typing import Any

def clamp_negatives(ds: xr.Dataset, dataset: dict[str, Any]) -> xr.Dataset:
    varname = dataset["variable"]
    return ds.assign({varname: ds[varname].clip(min=0)})

Dict entries with params are also supported:

transforms:
  - function: mypackage.transforms.scale
    params:
      factor: 0.01

The params dict is forwarded as keyword arguments to the transform function. Custom transforms can live in any importable package or in a module under plugins_dir.

For the built-in transforms and a full description of the pipeline, see Transforms.


Processes

Processes are named operations that produce derived datasets (e.g. temporal resampling). They are backed by YAML files and dispatched via POST /processes/{id}/execution.

Built-in processes live in climate_api/data/processes/. Custom processes are loaded from plugins_dir/processes/.

plugins/
└── processes/
    └── my_process.yaml

Process YAML

- id: my_process
  title: My custom process
  description: Describe what this process does.
  version: "0.1.0"
  expose: true
  jobControlOptions:
    - sync-execute
  execution:
    function: mypackage.processes.my_process.execute
Field Required Description
id Yes Unique process identifier. Used in POST /processes/{id}/execution
title Yes Human-readable title exposed through the public process catalogue
description No Longer description shown in API responses
version No Process version string exposed through the public process description
expose No Whether the process appears in the public /processes listing. Default: true
jobControlOptions No Supported execution modes exposed publicly. Default: ["sync-execute"]
execution.function Yes Dotted path to the Python function that runs the process

A custom process with the same id as a built-in overrides it.

Execution function

The current built-in execution path accepts the raw JSON request body as keyword arguments and returns a JSON-serialisable dict:

from typing import Any

def execute(*, source_dataset_id: str, factor: float, **_ignored: Any) -> dict[str, Any]:
    ...
    return {"status": "completed", "artifact_id": "..."}

Invalid or missing arguments raise TypeError, which the route dispatcher catches and returns as HTTP 400.

For the built-in resample process and usage examples, see Processes.


What is not pluggable

Availability functions (sync.availability.latest_available_function) accept a dotted path but only resolve built-in functions in climate_api.providers.availability. Plugin paths are not reliably supported — the path is resolved without plugins_dir on sys.path. Use one of the built-in availability functions instead, or open an issue if a new provider cadence is needed.

See issue #95 for the planned fix.