Skip to content

User Guide

This guide shows how to discover and access data from a running Climate API instance. It assumes the API is running locally at http://127.0.0.1:8000 and that at least one dataset has been ingested and published.

For configuring a new instance for your country, see setup_guide.md. For the full ingestion and sync API reference, see managed_data_api_guide.md.

Discovering datasets

The STAC catalog is the starting point for data discovery. It lists all published GeoZarr datasets as STAC Collections.

curl -s http://127.0.0.1:8000/stac/catalog.json | jq

Each entry in links with "rel": "child" points to one dataset collection. Use the href from the catalog to fetch it:

# Replace {dataset_id} with any id from the catalog above, e.g. chirps3_precipitation_daily
curl -s http://127.0.0.1:8000/stac/collections/{dataset_id} | jq

The assets.zarr field contains everything needed to open the dataset:

{
  "assets": {
    "zarr": {
      "href": "http://127.0.0.1:8000/zarr/chirps3_precipitation_daily",
      "xarray:open_kwargs": { "consolidated": true }
    }
  }
}

Opening a dataset with xarray

The climate_api.client module provides a Client class for discovering and opening datasets:

from climate_api.client import Client

api = Client("http://127.0.0.1:8000")

datasets = api.catalog()
for link in datasets:
    print(link["id"], "—", link["title"])

ds = api.open(datasets[0]["id"])  # open whichever dataset is published first
print(ds)

The base_url defaults to the CLIMATE_API_BASE_URL environment variable (falling back to http://127.0.0.1:8000), so module-level functions work without any argument when the env var is set:

from climate_api.client import list_datasets, open_dataset  # reads CLIMATE_API_BASE_URL

dataset_id = list_datasets()[0]["id"]
ds = open_dataset(dataset_id)

Each dataset has a time dimension, x and y spatial dimensions, and a data variable matching the variable (e.g. precip for CHIRPS, t2m for ERA5-Land temperature).

Select the first time step:

snapshot = ds.isel(time=0)
print(snapshot)

Select a spatial point by sampling the centre of the domain:

variable = list(ds.data_vars)[0]  # precip, t2m, tp, or pop_total depending on the dataset
centre_y = ds.y.mean().item()
centre_x = ds.x.mean().item()
point = ds.sel(y=centre_y, x=centre_x, method="nearest")
print(point[variable].values)

Compute the spatial mean over the first 10 time steps (slicing first avoids reading the full dataset over HTTP):

spatial_mean = ds[variable].isel(time=slice(10)).mean(dim=["y", "x"])
print(spatial_mean.to_dataframe())

What's next