Skip to article frontmatterSkip to article content

Aggregate climate data to DHIS2 organisation units

In this notebook we will show how to load daily temperature data from NetCDF using earthkit and aggregate the data to DHIS2 organisation units.

import earthkit.data
from earthkit.transforms import aggregate
from dhis2eo.integrations.pandas import dataframe_to_dhis2_json

Loading the data

Our sample NetCDF file contains daily temperature data for Sierra Leone in July 2025. Let’s load the file using earthkit:

file = "../data/era5-land-daily-mean-temperature-2m-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)

See more examples for how you can load data with eartkit, or see a video of how to get data with earthkit.

How to get data with earthkit

To more easily work with and display the contents of the dataset we can convert it to an xarray. It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and one data variable “t2m” (temperature at 2m above sea level). The data source is European Centre for Medium-Range Weather Forecasts (ECMWF).

data_array = data.to_xarray()
data_array
Loading...

Loading the organisation units

Eartkit can also be used to load the organisation units from DHIS2 that we saved as a GeoJSON file.

district_file = "../data/sierra-leone-districts.geojson"
features = earthkit.data.from_source("file", district_file)

The GeoJSON file contains the boundaries of 13 named organisation units in Sierra Leone. For the aggregation, we are particularly interested in the id and the geometry (polygon) of the org unit:

features
Loading...

Aggregating the data to organisation units

To aggregate the data to the org unit features we use the aggregate package of earthkit-transforms. We keep the daily period type and only aggregate the data spatially to the org unit features. The parameter mask_dim is the dimension (org unit id) that will be created after the reduction of the spatial dimensions (longitude/latitude grid).

agg_data = aggregate.spatial.reduce(data, features, mask_dim="id")
agg_data
Loading...

We see that the aggregated data is returned as an xarray with two dimensions (id representing the org unit id and valid_time as the time period), and the same temperature variable.

To more easily work with tabular aggregated data, we convert the results to a pandas.DataFrame and inspect the results:

agg_df = agg_data.to_dataframe().reset_index()
agg_df
Loading...

Post-processing

Next, we convert temperatures in kelvin to celcius by subtracting 273.15 from the values.

agg_df['t2m'] -= 273.15
agg_df
Loading...

Two decimals is sufficient for our use so we round all the temperature values:

agg_df['t2m'] = agg_df['t2m'].astype('float64').round(decimals=2)
agg_df
Loading...

Converting to DHIS2 Format

Use the dhsi2eo utility function dataframe_to_dhis2_json to translate the pandas.DataFrame into the JSON structure used by the DHIS2 Web API:

json_dict = dataframe_to_dhis2_json(
    df = agg_df,                    # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 't2m',              # column containing the value
    data_element_id = 'VJwwPOOvge6' # id of the DHIS2 data element
)

We can display the first 3 items to see that we have one temperature value for each org unit and period combination.

json_dict['dataValues'][:3]
[{'orgUnit': 'O6uvpzGd5pu', 'period': '20250701', 'value': 23.68, 'dataElement': 'VJwwPOOvge6'}, {'orgUnit': 'fdc6uOvgoji', 'period': '20250701', 'value': 23.96, 'dataElement': 'VJwwPOOvge6'}, {'orgUnit': 'lc3eMKXaEfw', 'period': '20250701', 'value': 24.52, 'dataElement': 'VJwwPOOvge6'}]

At this point we have successfully aggregated temperature data in a JSON format that can be used by DHIS2. To learn how to import this JSON data into DHIS2, see our guide for uploading data values using the Python DHIS2 client.