Add and use custom metadata in python_metadata
Here we show how the use the custom metadata reader class to add additional variables to the python metadata stored with the ISMN time series.
Data setup
Here we use one of the testdata samples provided in this package (stored in the test_data
folder). This archive contains 2 sensors at 2 stations in the COSMOS
network and 2 sensors at the fraye
station of the FR_Aqui
network. The goal is to assign an additional metadata variable to the sensors at the ‘fray’ station. The data is taken from the VODCA archive (https://zenodo.org/record/2575599) and describes vegetation density on Jan 1st 2010. We store the value in a csv
file
(vod.csv
in the same directory as this notebook) structured like this (in our example only for one station, but normally we would add a line for as many ISMN stations as possible):
network;station;vod_k;vod_x
FR_Aqui;fraye;0.64922965;0.39021793
Set metadata reader
Then we set up the metadata reader. Here we use one of the predefined readers, but you can (and usually have to) also write your own reader as long as it inherits from the abstract class ismn.custom.CustomMetaReader
and implements a function read_metadata
which uses the information from previously loaded metadata for a station to find the matching entries in the provided data, and either returns a ismn.meta.MetaData
object or a dictionary of metadata variables and the according
values. Normally you use either the station latitude, longitude and sometimes also the sensor depth information; maybe even the station name. We also assign a fill value for one of the 2 VOD variables, which is used for stations / sensors for which no counterpart is found in the csv file.
[1]:
from ismn.interface import ISMN_Interface
import shutil
import tempfile
from ismn.custom import CustomStationMetadataCsv
[11]:
my_meta_reader = CustomStationMetadataCsv('vod.csv', fill_values={'vod_k': -9999})
This custom metadata reader is now passed to the ISMN Interface (you can also pass more than one). Upon collecting metadata for all sensors, it will compare the station and network name with the ones provided in the csv file, and add the new metadata variable to the python_metadata
when a matching case is found. If the python_metadata
folder already exists, it must be deleted before the collection can happen.
[14]:
with tempfile.TemporaryDirectory() as meta_path:
ds = ISMN_Interface('../../tests/test_data/Data_seperate_files_20170810_20180809', custom_meta_reader=(my_meta_reader,), meta_path=meta_path)
Processing metadata for all ismn stations into folder ../../tests/test_data/Data_seperate_files_20170810_20180809.
This may take a few minutes, but is only done once...
Hint: Use `parallel=True` to speed up metadata generation for large datasets
Files Processed: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 11.62it/s]
Metadata generation finished after 0 Seconds.
Metadata and Log stored in /tmp/tmp3zk16t_7
Found existing ismn metadata in /tmp/tmp3zk16t_7/Data_seperate_files_20170810_20180809.csv.
The newly added values are now found in the metadata for the ‘fraye’ station.
[17]:
ds['FR_Aqui']['fraye'].metadata[['vod_k', 'vod_x']]
[17]:
MetaData([
MetaVar([vod_k, 0.64922965, None]),
MetaVar([vod_x, 0.39021793, None])
])
But not for other stations.
[20]:
ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]
[20]:
MetaData([
MetaVar([vod_k, -9999.0, None]),
MetaVar([vod_x, nan, None])
])
The station wide variable is also available for sensors at the station (here we simply pick the first available sensor at the station, with index 0).
[21]:
ds['FR_Aqui']['fraye'][0].metadata[['vod_k', 'vod_x']]
[21]:
MetaData([
MetaVar([vod_k, 0.64922965, None]),
MetaVar([vod_x, 0.39021793, None])
])
For stations where no VOD was assigned, the fill value (or np.NaN if no fill value is provided) is used (here we simply pick the first available sensor at the station, with index 0).
[23]:
ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]
[23]:
MetaData([
MetaVar([vod_k, -9999.0, None]),
MetaVar([vod_x, nan, None])
])
We can now use them as any other metadata variable, e.g. to find the station with a specific value.
[24]:
ids = ds.get_dataset_ids(variable='soil_moisture', filter_meta_dict={'vod_k': 0.64922965})
data, meta = ds.read(ids, return_meta=True)
meta
[24]:
2 | ||
---|---|---|
variable | key | |
clay_fraction | val | 4.0 |
depth_from | 0.0 | |
depth_to | 0.3 | |
climate_KG | val | Cfb |
climate_insitu | val | unknown |
elevation | val | 52.42 |
instrument | val | ThetaProbe-ML2X |
depth_from | 0.05 | |
depth_to | 0.05 | |
latitude | val | 44.467 |
lc_2000 | val | 70 |
lc_2005 | val | 70 |
lc_2010 | val | 70 |
lc_insitu | val | unknown |
longitude | val | -0.7269 |
network | val | FR_Aqui |
organic_carbon | val | 2.18 |
depth_from | 0.0 | |
depth_to | 0.3 | |
sand_fraction | val | 87.0 |
depth_from | 0.0 | |
depth_to | 0.3 | |
saturation | val | 0.49 |
depth_from | 0.0 | |
depth_to | 0.3 | |
silt_fraction | val | 9.0 |
depth_from | 0.0 | |
depth_to | 0.3 | |
station | val | fraye |
timerange_from | val | 2013-08-13 10:00:00 |
timerange_to | val | 2020-01-01 00:00:00 |
variable | val | soil_moisture |
depth_from | 0.05 | |
depth_to | 0.05 | |
vod_k | val | 0.64923 |
vod_x | val | 0.390218 |
[ ]: