DHIS2 Climate API¶

Climate Data & Earth Observation Integration Platform¶

Project Description · 2026

1. Background¶

Climate change and extreme weather events pose severe threats to low- and middle-income countries, but data fragmentation makes it difficult to analyze these threats and forecast future impacts in an accurate and timely way. Climate and earth observation data is distributed across dozens of providers — each with different APIs, data formats, and access mechanisms. Integrating this fragmented landscape into operational systems requires specialised expertise for every data source, a barrier that limits its use across sectors in low- and middle-income countries.

This project develops the DHIS2 Climate API — an open-source, standards-based, and decentralised integration platform that unifies this fragmented space behind a single, consistent interface. By abstracting data access across heterogeneous sources and harmonising outputs into a common format, the platform enables organisations to automatically ingest, process, and analyse global, national, and local climate and environmental data without requiring specialised expertise for each data provider. Although developed in close alignment with DHIS2 to support seamless integration with national health information systems, the platform is designed to operate independently of any specific software platform, and can serve as shared infrastructure for any sector that requires systematic access to harmonised climate and environmental data — including health, agriculture, land management, and forest monitoring.

A key design principle is data sovereignty: the platform is deployable on national or regional infrastructure without dependency on proprietary services, ensuring that countries retain full control over their data. Outputs are stored and served using open geospatial standards, making the platform an open foundation rather than a closed system. Local data providers and developers can connect their own data sources, build custom analytical workflows, and extend the platform to address specific needs. This openness and interoperability is a deliberate design choice — the goal is not a single monolithic tool, but shared infrastructure that countries and communities can adapt and innovate upon.

The platform is being developed in close collaboration with HISP groups in the countries themselves — the people who understand local data landscapes, institutional arrangements, and technical constraints. The initial dataset catalogue will include open data on climate, earth observation, and population. By making this data systematically accessible through open, sovereign infrastructure, the platform lowers the barrier for countries to conduct integrated analysis, run predictive models, and establish early warning systems.

2. Overview¶

The DHIS2 Climate API is a no-code data integration platform that enables earth observation (EO) and climate data from multiple upstream sources to be downloaded, processed, harmonised, and loaded into DHIS2 and the CHAP Modelling Platform.

The platform is built as a Python-based REST API (FastAPI) and exposes both native endpoints and OGC API-compliant endpoints (via pygeoapi). Data is stored in cloud-native GeoZarr format and can be consumed by the DHIS2 Climate App, the DHIS2 Maps App, DHIS2 Climate Tools, the CHAP Modelling Platform, and third-party tools such as QGIS.

The Climate API is envisioned as the shared data infrastructure layer for the DHIS2 climate and health ecosystem — a single, well-defined source of spatiotemporal raster data that any DHIS2 application or external tool can build on.

2.1 Relationship to existing DHIS2 climate work¶

The Climate API supplements and extends the existing DHIS2 climate data integration work documented at dhis2.org/climate/climate-data/. It builds on the same broader ecosystem — especially dhis2eo and related DHIS2 climate tooling. The platform wraps climate and earth observation workflows in a standardised API so that data access, processing, and publication can be configured and run without writing code.

2.2 Scope of this document¶

This document describes the project vision, design constraints, user stories, functional requirements, technical architecture, and data pipeline approach. It is intended for technical contributors, DHIS2 country implementers, and stakeholders evaluating the Climate API for deployment.

3. Vision and goals¶

The Climate API aims to:

Provide a unified API through which EO and climate data can be requested, downloaded, processed, and uploaded to DHIS2 — with all complexity handled behind the scenes.
Serve as a no-code alternative to DHIS2 Climate Tools for standard data integration workflows, built on the same underlying libraries.
Allow DHIS2 Climate/Maps app and CHAP to act as frontends consuming the Climate API.
Support custom orchestration — users can build pipelines with pre- and post-processing steps.
Work independently of a DHIS2 instance.
Follow the requirements for being a Digital Public Good (DPG) and adhere to the FAIR principles (Findable, Accessible, Interoperable, Reusable).

4. User stories¶

ID	Actor	Goal
US-A	Data manager	Import daily temperature and precipitation data into DHIS2 at a user-defined scheduled interval (e.g. nightly), automatically aggregated to org units.
US-B	Data manager	Import population data for the current year, automatically aggregated to org units.
US-C	Analyst	Visualise high-resolution population data on DHIS2 Maps with styles adapted to the population density of the country.
US-D	Analyst	Preview climate data for an org unit of interest before importing it to DHIS2.
US-E	Data manager	Add a custom pre- or post-processing step (e.g. calculate consecutive rainy days) before importing the result to DHIS2.

5. Design constraints¶

The following constraints apply to the first version of the Climate API. They represent deliberate architectural decisions and are open for discussion as the platform matures.

5.1 Single spatial extent¶

Each Climate API instance is configured with one or more named extents, defined at setup time. Each extent has a required id and bbox, and an optional org_unit_id for linking to a DHIS2 org unit.

Extents are not expected to change after setup. For the first version, only a single extent is supported. Larger countries may configure a sub-national extent (e.g. a district) to limit initial download volume. The extent_id is passed as a parameter to ingestion alongside the dataset_id — ingestion is not tied to a DHIS2 instance.

5.2 No temporal gaps¶

Downloaded datasets must not contain temporal gaps, unless gaps exist in the original upstream data source. All subsequent scheduled updates look at the last period with data and import from there until today, ensuring continuity. The /sync endpoint validates temporal continuity before appending new time steps.

5.3 One period type per dataset¶

Each dataset has a single period type (daily, weekly, monthly, yearly, etc.). The period type is included in the dataset ID (e.g. chirps3_precipitation_daily_sle). It is possible to construct derived datasets with a different period type from an existing dataset (e.g. daily → weekly aggregation), which will result in a separate dataset ID.

5.4 One artifact per dataset¶

Each dataset ID maps to exactly one output artifact in the form of a GeoZarr store. New time steps are appended to the existing store on sync rather than creating parallel stores.

5.5 DHIS2-independent operation¶

Core parts of the Climate API must function without a connected DHIS2 instance. Spatial extent is defined via instance configuration rather than a DHIS2 org unit query. Aggregation accepts GeoJSON features from any source and outputs CSV or JSON as well as DHIS2 data values.

5.6 Dataset templates and published datasets¶

Internally, the Climate API distinguishes between dataset templates and published datasets:

Dataset templates — YAML definitions describing a dataset type (source, variable, period type, processing steps). These are internal and align closely with the OGC API Collections specification. They act as blueprints for ingestion.
Published datasets — actualised, ingested datasets for a specific extent and time range, exposed under /datasets and /ogcapi/collections. These are what end users and client applications discover and consume.

This mirrors the approach used in the CHAP Modelling Platform, where generic model template YAMLs are distinguished from specific initialised instances.

6. Functional requirements¶

6.1 Data pipeline¶

Allow EO data to be requested through a unified API where download, processing, and optional upload to DHIS2 happen behind the scenes.
Each step in the pipeline is also available as a separate API endpoint with clear input and output definitions: data extraction, aggregation, and upload to DHIS2.
Support scheduling — data can be downloaded, processed, and imported at fixed user-defined intervals.
Support orchestration — users can compose custom data pipelines, including pre- and post-processing steps.

6.2 Data storage and serving¶

Store all datasets as GeoZarr — cloud-native, chunked, multiscale, EPSG:4326.
Expose datasets via a /zarr/{dataset_id} endpoint using HTTP range requests, enabling chunk-level access by any compatible client.
Expose datasets through OGC API-compliant endpoints (Coverages, EDR, Processes, Tiles, Collections) via pygeoapi under /ogcapi.
Expose dataset discovery metadata via a /datasets endpoint and a STAC catalogue.

6.3 Visualisation¶

Support on-the-fly map tile rendering with custom styling.
Support image tile generation using TiTiler (following the OGC API — Tiles specification as closely as possible).
Support direct browser rendering of Zarr data via zarr-layer (MapLibre custom layer) with GPU reprojection from EPSG:4326 to Spherical Mercator and client-side dynamic colour classification.
Support point queries (single location time series) for preview before import.

6.4 Aggregation¶

Aggregate raster data to org unit polygons (or any GeoJSON feature collection).
Support async execution for long-running aggregation jobs (OGC API Processes pattern).
Support demographic disaggregation for WorldPop data (age/sex bands as additional Zarr dimensions).

6.5 Integration¶

Upload aggregated data values directly to DHIS2 (optional — can be skipped for standalone use).
Accept GeoJSON features from external sources (not only DHIS2 org units).
Output results as DHIS2 data values, CSV, or JSON.
Provide a client library / SDK for programmatic access by DHIS2 apps and third-party tools.

6.6 Non-functional requirements¶

Handle simultaneous and long-running requests without blocking.
Follow FAIR principles: Findable, Accessible, Interoperable and Reusable.
Build on existing open-source solutions — the team is small and sustainability matters.
Support deployment via Docker for local, cloud-hosted, and sovereign country environments.
Storage backend configurable via environment variables — no code changes required to switch between local filesystem, different cloud providers, AWS S3 (including Africa and Asia regions), and self-hosted Ceph/RGW for sovereign deployments.

7. Supported data sources¶

The Climate API ingests data from multiple upstream Earth Observation and climate sources. Current and planned sources include:

CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) — daily and pentadal precipitation.
ERA5 / Climate Data Store (Copernicus CDS) — temperature, humidity, wind, and other atmospheric variables at multiple temporal resolutions.
WorldPop — annual gridded population estimates, with optional age and sex disaggregation at 5-year intervals.

The dataset ID schema encodes the source, variable, period type, and spatial extent ID. The extent ID is an ISO country code or named sub-national identifier defined in the instance configuration. Examples:

chirps3_precipitation_daily_sle
era5_temperature_daily_sle
worldpop_population_yearly_sle

Sub-national extents use the same schema (e.g. chirps3_precipitation_daily_bo for the Bo district of Sierra Leone), allowing larger countries to configure a district-level extent to limit initial download volume.

8. Technical architecture¶

8.1 API layer¶

The API is built on FastAPI and exposes the following endpoint groups:

Endpoint	Description
`/ingestions`	Trigger data download from an upstream source for the configured extent and date range. Parameters: `dataset_id`, `start`, `end`, `extent_id`. Creates or updates the corresponding Zarr store.
`/sync`	Check for more recent data from the upstream source and append new time steps to the existing Zarr store. Validates temporal continuity before writing.
`/datasets`	List and describe available published datasets — metadata, period type, extent, last updated, and access links.
`/zarr/{dataset_id}`	Serve the GeoZarr store by returning a directory listing at the dataset path and file responses for Zarr contents beneath it.
`/ogcapi/...`	OGC API-compliant endpoints served by pygeoapi: Coverages, EDR, Processes, Collections.

8.2 Storage layer¶

All datasets are stored as GeoZarr. Key properties:

EPSG:4326 coordinate reference system.
CF-compliant coordinate attributes and _ARRAY_DIMENSIONS metadata.
Multiscale pyramid overview levels declared under the multiscales key in .zattrs — required for efficient zoom-level-aware chunk fetching by zarr-layer.
Chunk shape tuned per dataset to balance three access patterns: time series queries, polygon aggregation, and browser tile rendering.
Blosc/Zstd compression.

The storage backend is abstracted via fsspec, enabling the following backends with environment-variable configuration only:

Backend	Notes
Local filesystem	Default for development. `STORAGE_BACKEND=file`
European S3-compatible	Hetzner, Scaleway, IONOS, OVHcloud. GDPR-native. `STORAGE_BACKEND=s3` + `endpoint_url`.
AWS S3 (af-south-1)	Cape Town region — lowest latency for Southern/Eastern Africa deployments.
AWS S3 (ap-southeast-1)	Singapore — lowest latency for Laos, Sri Lanka, and Southeast Asia deployments.
Ceph / RGW (self-hosted)	S3-compatible. For sovereign deployments requiring data to remain within national borders. Runs on university and research network infrastructure (AfricaConnect / GÉANT).

8.3 Data pipeline model¶

The Climate API follows an ETL (Extract, Transform, Load) pattern — transformation occurs on the processing server before data is loaded into DHIS2. An ELT approach (transformation in a cloud data warehouse) may be supported in a future version.

The pipeline stages are:

Extract — download raw data from the upstream source for the configured extent and time range.
Transform — reproject, rechunk, apply temporal aggregation (if needed), compute derived variables, and write to GeoZarr.
Load — aggregate to org unit polygons and upload data values to DHIS2, or output as CSV/JSON for standalone use.

Each stage is independently accessible as an API endpoint, allowing custom pipelines to be constructed by combining steps in different sequences.

Long-running jobs (ingestion, sync, aggregation) are executed asynchronously. This ensures the API remains responsive under concurrent load. Dask is used for parallel computation within each job, processing Zarr chunks concurrently across CPU cores or threads.

8.4 Technology stack¶

Technology	Role in the Climate API
FastAPI (Python)	Core REST API framework. Handles ingestion, sync, dataset, and OGC endpoints. Each pipeline step is exposed as a separate endpoint.
Xarray + Zarr	In-memory dataset model and cloud-native chunked storage format. GeoZarr conventions applied for geospatial metadata and multiscale pyramid support.
Dask	Parallel computation within jobs — processes Zarr chunks concurrently for aggregation, reprojection, and derived variable computation. Works natively with Xarray.
rioxarray	Raster operations on Xarray datasets — reprojection, clipping, resampling, and CRS management.
exactextract	Polygon aggregation (zonal statistics) to org unit features. Supports weighted partial-pixel coverage for accurate population aggregation.
xarray-multiscale	Generates multiscale pyramid overview levels at ingest time, required for zarr-layer zoom-level-aware chunk fetching.
rechunker	Reshapes existing Zarr stores to a new chunk layout without full rewrite. Used for per-dataset chunk shape tuning.
cf-xarray	CF convention handling — maps standard dimension names and attributes across source datasets.
numba	JIT compilation for custom processing functions (e.g. consecutive rainy days, heat index) applied pixel-wise over large arrays.
pygeoapi	OGC API standards exposure (Coverages, EDR, Processes, Tiles, Collections). Mounted under `/ogcapi`.
TiTiler	On-the-fly raster tile server. Serves map tiles with dynamic styling, following OGC API - Tiles specification.
fsspec	Unified filesystem abstraction for storage backends (local, S3-compatible, Azure Blob, GCS, Ceph/RGW). Backend is environment-variable configuration only.
zarr-layer (MapLibre)	TypeScript library for rendering Zarr directly as a native MapLibre Custom Layer in the browser. GPU reprojection from EPSG:4326 to Spherical Mercator; uses multiscale levels per zoom.
Docker	Containerised deployment. Supports local, cloud-hosted, and country sovereign deployments.
dhis2eo	Core climate/EO extraction library used by the Climate API for upstream dataset access and processing integration.
dhis2-python-client	Planned DHIS2 Web API integration library for future data value push and related DHIS2 write workflows.
STAC	Complementary discovery and metadata catalogue layer. Each dataset exposed as a STAC Item with temporal, spatial, and access metadata.

9. Standards compliance¶

The Climate API is designed to be standards-compliant and interoperable. Key standards:

OGC API — Coverages: raw grid access and subsetting.
OGC API — EDR (Environmental Data Retrieval): point and area time series queries.
OGC API — Processes: async zonal aggregation execution.
OGC API — Tiles and Maps: raster tile serving with dynamic styling.
OGC API — Collections: unified dataset discovery.
GeoZarr specification: geospatial metadata conventions for Zarr stores.
STAC (SpatioTemporal Asset Catalog): dataset discovery and asset linking.
CF Conventions: coordinate metadata for Xarray/Zarr datasets.
FAIR principles: datasets are Findable (STAC + /datasets), Accessible (open HTTP range requests), Interoperable (OGC APIs + standard formats), and Reusable (documented metadata and provenance).

10. Deployment and sovereignty¶

The Climate API is distributed as a Docker image and can be deployed in several configurations:

Hosted by HISP Centre — a centrally managed instance for demo purposes.
Country-hosted — deployed within a country's own infrastructure, with local storage or the nearest available regional cloud provider (AWS af-south-1 for Africa, AWS ap-southeast-1 for Southeast Asia).
Sovereign — deployed on local or research network infrastructure. Data never leaves national borders. Suitable for countries with data residency requirements.

The storage backend is configured entirely via environment variables — no code changes are required to switch between backends. This ensures the same Docker image can be deployed across all contexts.

Resource	Link / Description
Climate API GitHub	https://github.com/dhis2/climate-api
DHIS2 climate data	https://dhis2.org/climate/climate-data/
CHAP Modelling Platform	https://chap.dhis2.org/
dhis2eo	https://github.com/dhis2/dhis2eo
dhis2-python-client	https://github.com/dhis2/dhis2-python-client
GeoZarr roadmap	https://geozarr.org/roadmap.html
pygeoapi	https://pygeoapi.io/