Processes¶

Processes are named operations that produce derived datasets from existing ingested data. They are exposed through the native /processes endpoints and are shaped to align progressively with the OGC API Processes standard.

How processes work¶

A process takes parameters from the JSON request body, executes a computation, and returns a JSON response. The result is typically a new derived dataset artifact that can be opened via the Zarr endpoint or published to the OGC API catalog.

POST /processes/{id}/execution
Content-Type: application/json

{ ...parameters... }

Available processes are listed at GET /processes.

Built-in process: `resample`¶

The resample process aggregates a source dataset to a coarser temporal resolution. It is the primary way to produce daily totals from hourly data, weekly averages from daily data, and so on.

Parameters¶

Parameter	Type	Required	Description
`source_dataset_id`	string	Yes	ID of the source managed dataset to resample
`frequency`	string	Yes	Target temporal resolution as a pandas frequency alias (e.g. `1D`, `W-MON`, `MS`)
`method`	string	Yes	Aggregation method: `mean`, `sum`, `min`, or `max`
`start`	string	Yes	Start of the period range to resample (ISO 8601 date or datetime)
`end`	string	No	End of the period range. Defaults to today's UTC date
`overwrite`	boolean	No	Re-materialize the derived dataset even if it already exists. Default: `false`
`publish`	boolean	No	Publish the result to the OGC API catalog after materializing. Default: `true`

Supported frequency aliases:

Alias	Meaning
`1D`, `7D`, `10D`, …	Every N calendar days
`W-MON`	ISO weeks starting Monday
`MS`	Calendar months (month-start)
`QS`	Calendar quarters (quarter-start)
`YS`	Calendar years (year-start)

Derived dataset ID¶

The derived dataset is stored under an auto-generated ID:

{source_dataset_id}_{frequency_slug}_{method}

Where frequency_slug is the frequency alias lowercased with non-alphanumeric characters replaced by _ and leading/trailing underscores stripped. For example:

Source	Frequency	Method	Derived ID
`chirps3_precipitation_daily`	`W-MON`	`sum`	`chirps3_precipitation_daily_w_mon_sum`
`era5land_temperature_hourly`	`1D`	`mean`	`era5land_temperature_hourly_1d_mean`
`chirps3_precipitation_daily`	`MS`	`sum`	`chirps3_precipitation_daily_ms_sum`

Idempotency¶

If a derived dataset artifact already exists for the requested source_dataset_id, frequency, method, and time range, the process returns the existing artifact without re-materializing it. Pass overwrite: true to force a rebuild.

Example: daily mean temperature from hourly ERA5-Land¶

curl -s -X POST http://127.0.0.1:8000/processes/resample/execution \
  -H "Content-Type: application/json" \
  -d '{
    "source_dataset_id": "era5land_temperature_hourly",
    "frequency": "1D",
    "method": "mean",
    "start": "2024-01-01",
    "end": "2024-01-31",
    "publish": true
  }' | jq

Response:

{
  "artifact_id": "3f2a1b4c-8e7d-4f9a-b2c1-0d5e6f7a8b9c",
  "status": "completed",
  "dataset": {
    "dataset_id": "era5land_temperature_hourly_1d_mean",
    "dataset_name": "era5land_temperature_hourly_1d_mean",
    "variable": "t2m",
    "period_type": "daily",
    ...
  }
}

Example: weekly precipitation totals from CHIRPS daily¶

curl -s -X POST http://127.0.0.1:8000/processes/resample/execution \
  -H "Content-Type: application/json" \
  -d '{
    "source_dataset_id": "chirps3_precipitation_daily",
    "frequency": "W-MON",
    "method": "sum",
    "start": "2024-01-01",
    "end": "2024-03-31",
    "publish": true
  }' | jq

Incomplete edge periods¶

The resampler automatically drops leading and trailing periods that are not fully covered by the source data. For example, if the source daily dataset starts on a Wednesday and you resample to weekly (Monday–Sunday), the first Monday-anchored week is dropped because it only has data from Wednesday onward.

This means the realized time range of the derived artifact may be shorter than the requested range if the source data does not fully cover the first or last target period.

Opening the derived dataset¶

Once materialized, the derived dataset can be opened like any other managed dataset:

from climate_api.client import Client

api = Client("http://127.0.0.1:8000")
ds = api.open("era5land_temperature_hourly_1d_mean")
print(ds)

Or directly via the Zarr endpoint:

# open in Python
import xarray as xr
ds = xr.open_zarr("http://127.0.0.1:8000/zarr/era5land_temperature_hourly_1d_mean", consolidated=True)

Custom processes¶

You can register additional processes from a plugins_dir/processes/ directory. See Extensibility — Processes for the YAML format and execution function contract.