{ "cells": [ { "cell_type": "markdown", "id": "5f97f9c6-5263-4345-b155-398d4e51f873", "metadata": {}, "source": [ "# Add and use custom metadata in python_metadata" ] }, { "cell_type": "markdown", "id": "56959fc5-fa29-4ba5-b793-d32797a1a2c2", "metadata": {}, "source": [ "Here we show how the use the custom metadata reader class to add additional variables to the python metadata stored with the ISMN time series." ] }, { "cell_type": "markdown", "id": "d31974ce-35ce-4a2d-a5c7-1f78d63b48e7", "metadata": {}, "source": [ "## Data setup\n", "Here we use one of the testdata samples provided in this package (stored in the `test_data` folder).\n", "This archive contains 2 sensors at 2 stations in the `COSMOS` network and 2 sensors at the `fraye` station of the `FR_Aqui` network.\n", "The goal is to assign an additional metadata variable to the sensors at the 'fray' station. The data is taken from the VODCA archive (https://zenodo.org/record/2575599) and describes vegetation density on Jan 1st 2010. We store the value in a `csv` file (`vod.csv` in the same directory as this notebook) structured like this (in our example only for one station, but normally we would add a line for as many ISMN stations as possible):\n", "\n", "```\n", "network;station;vod_k;vod_x\n", "FR_Aqui;fraye;0.64922965;0.39021793\n", "```\n", "\n", "## Set metadata reader\n", "\n", "Then we set up the metadata reader. Here we use one of the predefined readers, but you can (and usually have to) also write your own reader as long as it inherits from the abstract class `ismn.custom.CustomMetaReader` and implements a function `read_metadata` which uses the information from previously loaded metadata for a station to find the matching entries in the provided data, and either returns a `ismn.meta.MetaData` object or a dictionary of metadata variables and the according values. Normally you use either the station latitude, longitude and sometimes also the sensor depth information; maybe even the station name.\n", "We also assign a fill value for one of the 2 VOD variables, which is used for stations / sensors for which no counterpart is found in the csv file." ] }, { "cell_type": "code", "execution_count": 1, "id": "b7a5944e-0632-4095-a0c0-90f44d2ae4e8", "metadata": {}, "outputs": [], "source": [ "from ismn.interface import ISMN_Interface\n", "import shutil\n", "import tempfile\n", "from ismn.custom import CustomStationMetadataCsv" ] }, { "cell_type": "code", "execution_count": 11, "id": "ddf2b498-8601-4f89-9d37-51eba7a12efc", "metadata": {}, "outputs": [], "source": [ "my_meta_reader = CustomStationMetadataCsv('vod.csv', fill_values={'vod_k': -9999})" ] }, { "cell_type": "markdown", "id": "06b0938f-231c-4048-83f9-b9f4a9dfabaa", "metadata": {}, "source": [ "This custom metadata reader is now passed to the ISMN Interface (you can also pass more than one). Upon collecting metadata for all sensors, it will compare the station and network name with the ones provided in the csv file, and add the new metadata variable to the `python_metadata` when a matching case is found. If the `python_metadata` folder already exists, it must be deleted before the collection can happen." ] }, { "cell_type": "code", "execution_count": 14, "id": "eb5786c4-c0f9-49de-990a-5db18793f987", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing metadata for all ismn stations into folder ../../tests/test_data/Data_seperate_files_20170810_20180809.\n", "This may take a few minutes, but is only done once...\n", "Hint: Use `parallel=True` to speed up metadata generation for large datasets\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Files Processed: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 11.62it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Metadata generation finished after 0 Seconds.\n", "Metadata and Log stored in /tmp/tmp3zk16t_7\n", "Found existing ismn metadata in /tmp/tmp3zk16t_7/Data_seperate_files_20170810_20180809.csv.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "with tempfile.TemporaryDirectory() as meta_path:\n", " ds = ISMN_Interface('../../tests/test_data/Data_seperate_files_20170810_20180809', custom_meta_reader=(my_meta_reader,), meta_path=meta_path)" ] }, { "cell_type": "markdown", "id": "4222cfed-69f1-4cae-8c06-10a7f876853b", "metadata": {}, "source": [ "The newly added values are now found in the metadata for the 'fraye' station." ] }, { "cell_type": "code", "execution_count": 17, "id": "7f293998-a3f0-4ac2-a673-d55e03229f3f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, 0.64922965, None]),\n", " MetaVar([vod_x, 0.39021793, None])\n", "])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['FR_Aqui']['fraye'].metadata[['vod_k', 'vod_x']]\n" ] }, { "cell_type": "markdown", "id": "2a7bbf34-ffd4-42cd-832d-9c3fe473ae06", "metadata": {}, "source": [ "But not for other stations. " ] }, { "cell_type": "code", "execution_count": 20, "id": "8af19018-884a-4aad-9984-a0a52227eae8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, -9999.0, None]),\n", " MetaVar([vod_x, nan, None])\n", "])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]" ] }, { "cell_type": "markdown", "id": "53565bc8-f74c-4a4d-aa05-842fb5778584", "metadata": {}, "source": [ "The station wide variable is also available for sensors at the station (here we simply pick the first available sensor at the station, with index 0)." ] }, { "cell_type": "code", "execution_count": 21, "id": "d7630de7-bbb1-4d73-a664-9dc9eba8404f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, 0.64922965, None]),\n", " MetaVar([vod_x, 0.39021793, None])\n", "])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['FR_Aqui']['fraye'][0].metadata[['vod_k', 'vod_x']]" ] }, { "cell_type": "markdown", "id": "56ebd643-83c8-4ea2-b1b2-9a052fff5e99", "metadata": {}, "source": [ "For stations where no VOD was assigned, the fill value (or np.NaN if no fill value is provided) is used (here we simply pick the first available sensor at the station, with index 0)." ] }, { "cell_type": "code", "execution_count": 23, "id": "e4ae3da5-3c82-48c4-9c60-e899a329bad3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, -9999.0, None]),\n", " MetaVar([vod_x, nan, None])\n", "])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]" ] }, { "cell_type": "markdown", "id": "2666f3f5-8545-4154-83bf-725de4c24465", "metadata": {}, "source": [ "We can now use them as any other metadata variable, e.g. to find the station with a specific value." ] }, { "cell_type": "code", "execution_count": 24, "id": "4312549b-bcf4-4eca-b5cd-786abe7fe5bd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2
variablekey
clay_fractionval4.0
depth_from0.0
depth_to0.3
climate_KGvalCfb
climate_insituvalunknown
elevationval52.42
instrumentvalThetaProbe-ML2X
depth_from0.05
depth_to0.05
latitudeval44.467
lc_2000val70
lc_2005val70
lc_2010val70
lc_insituvalunknown
longitudeval-0.7269
networkvalFR_Aqui
organic_carbonval2.18
depth_from0.0
depth_to0.3
sand_fractionval87.0
depth_from0.0
depth_to0.3
saturationval0.49
depth_from0.0
depth_to0.3
silt_fractionval9.0
depth_from0.0
depth_to0.3
stationvalfraye
timerange_fromval2013-08-13 10:00:00
timerange_toval2020-01-01 00:00:00
variablevalsoil_moisture
depth_from0.05
depth_to0.05
vod_kval0.64923
vod_xval0.390218
\n", "
" ], "text/plain": [ " 2\n", "variable key \n", "clay_fraction val 4.0\n", " depth_from 0.0\n", " depth_to 0.3\n", "climate_KG val Cfb\n", "climate_insitu val unknown\n", "elevation val 52.42\n", "instrument val ThetaProbe-ML2X\n", " depth_from 0.05\n", " depth_to 0.05\n", "latitude val 44.467\n", "lc_2000 val 70\n", "lc_2005 val 70\n", "lc_2010 val 70\n", "lc_insitu val unknown\n", "longitude val -0.7269\n", "network val FR_Aqui\n", "organic_carbon val 2.18\n", " depth_from 0.0\n", " depth_to 0.3\n", "sand_fraction val 87.0\n", " depth_from 0.0\n", " depth_to 0.3\n", "saturation val 0.49\n", " depth_from 0.0\n", " depth_to 0.3\n", "silt_fraction val 9.0\n", " depth_from 0.0\n", " depth_to 0.3\n", "station val fraye\n", "timerange_from val 2013-08-13 10:00:00\n", "timerange_to val 2020-01-01 00:00:00\n", "variable val soil_moisture\n", " depth_from 0.05\n", " depth_to 0.05\n", "vod_k val 0.64923\n", "vod_x val 0.390218" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ids = ds.get_dataset_ids(variable='soil_moisture', filter_meta_dict={'vod_k': 0.64922965})\n", "data, meta = ds.read(ids, return_meta=True)\n", "meta" ] }, { "cell_type": "code", "execution_count": null, "id": "4664e11a-7c3e-4574-9a92-d1ca0bccafae", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:ismn] *", "language": "python", "name": "conda-env-ismn-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" } }, "nbformat": 4, "nbformat_minor": 5 }