{ "cells": [ { "cell_type": "markdown", "id": "5f97f9c6-5263-4345-b155-398d4e51f873", "metadata": {}, "source": [ "# Add and use custom metadata in python_metadata" ] }, { "cell_type": "markdown", "id": "56959fc5-fa29-4ba5-b793-d32797a1a2c2", "metadata": {}, "source": [ "Here we show how the use the custom metadata reader class to add additional variables to the python metadata stored with the ISMN time series." ] }, { "cell_type": "markdown", "id": "d31974ce-35ce-4a2d-a5c7-1f78d63b48e7", "metadata": {}, "source": [ "## Data setup\n", "Here we use one of the testdata samples provided in this package (stored in the `test_data` folder).\n", "This archive contains 2 sensors at 2 stations in the `COSMOS` network and 2 sensors at the `fraye` station of the `FR_Aqui` network.\n", "The goal is to assign an additional metadata variable to the sensors at the 'fray' station. The data is taken from the VODCA archive (https://zenodo.org/record/2575599) and describes vegetation density on Jan 1st 2010. We store the value in a `csv` file (`vod.csv` in the same directory as this notebook) structured like this (in our example only for one station, but normally we would add a line for as many ISMN stations as possible):\n", "\n", "```\n", "network;station;vod_k;vod_x\n", "FR_Aqui;fraye;0.64922965;0.39021793\n", "```\n", "\n", "## Set metadata reader\n", "\n", "Then we set up the metadata reader. Here we use one of the predefined readers, but you can (and usually have to) also write your own reader as long as it inherits from the abstract class `ismn.custom.CustomMetaReader` and implements a function `read_metadata` which uses the information from previously loaded metadata for a station to find the matching entries in the provided data, and either returns a `ismn.meta.MetaData` object or a dictionary of metadata variables and the according values. Normally you use either the station latitude, longitude and sometimes also the sensor depth information; maybe even the station name.\n", "We also assign a fill value for one of the 2 VOD variables, which is used for stations / sensors for which no counterpart is found in the csv file." ] }, { "cell_type": "code", "execution_count": 1, "id": "b7a5944e-0632-4095-a0c0-90f44d2ae4e8", "metadata": {}, "outputs": [], "source": [ "from ismn.interface import ISMN_Interface\n", "import shutil\n", "import tempfile\n", "from ismn.custom import CustomStationMetadataCsv" ] }, { "cell_type": "code", "execution_count": 11, "id": "ddf2b498-8601-4f89-9d37-51eba7a12efc", "metadata": {}, "outputs": [], "source": [ "my_meta_reader = CustomStationMetadataCsv('vod.csv', fill_values={'vod_k': -9999})" ] }, { "cell_type": "markdown", "id": "06b0938f-231c-4048-83f9-b9f4a9dfabaa", "metadata": {}, "source": [ "This custom metadata reader is now passed to the ISMN Interface (you can also pass more than one). Upon collecting metadata for all sensors, it will compare the station and network name with the ones provided in the csv file, and add the new metadata variable to the `python_metadata` when a matching case is found. If the `python_metadata` folder already exists, it must be deleted before the collection can happen." ] }, { "cell_type": "code", "execution_count": 14, "id": "eb5786c4-c0f9-49de-990a-5db18793f987", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing metadata for all ismn stations into folder ../../tests/test_data/Data_seperate_files_20170810_20180809.\n", "This may take a few minutes, but is only done once...\n", "Hint: Use `parallel=True` to speed up metadata generation for large datasets\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Files Processed: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 11.62it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Metadata generation finished after 0 Seconds.\n", "Metadata and Log stored in /tmp/tmp3zk16t_7\n", "Found existing ismn metadata in /tmp/tmp3zk16t_7/Data_seperate_files_20170810_20180809.csv.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "with tempfile.TemporaryDirectory() as meta_path:\n", " ds = ISMN_Interface('../../tests/test_data/Data_seperate_files_20170810_20180809', custom_meta_reader=(my_meta_reader,), meta_path=meta_path)" ] }, { "cell_type": "markdown", "id": "4222cfed-69f1-4cae-8c06-10a7f876853b", "metadata": {}, "source": [ "The newly added values are now found in the metadata for the 'fraye' station." ] }, { "cell_type": "code", "execution_count": 17, "id": "7f293998-a3f0-4ac2-a673-d55e03229f3f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, 0.64922965, None]),\n", " MetaVar([vod_x, 0.39021793, None])\n", "])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['FR_Aqui']['fraye'].metadata[['vod_k', 'vod_x']]\n" ] }, { "cell_type": "markdown", "id": "2a7bbf34-ffd4-42cd-832d-9c3fe473ae06", "metadata": {}, "source": [ "But not for other stations. " ] }, { "cell_type": "code", "execution_count": 20, "id": "8af19018-884a-4aad-9984-a0a52227eae8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, -9999.0, None]),\n", " MetaVar([vod_x, nan, None])\n", "])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]" ] }, { "cell_type": "markdown", "id": "53565bc8-f74c-4a4d-aa05-842fb5778584", "metadata": {}, "source": [ "The station wide variable is also available for sensors at the station (here we simply pick the first available sensor at the station, with index 0)." ] }, { "cell_type": "code", "execution_count": 21, "id": "d7630de7-bbb1-4d73-a664-9dc9eba8404f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, 0.64922965, None]),\n", " MetaVar([vod_x, 0.39021793, None])\n", "])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['FR_Aqui']['fraye'][0].metadata[['vod_k', 'vod_x']]" ] }, { "cell_type": "markdown", "id": "56ebd643-83c8-4ea2-b1b2-9a052fff5e99", "metadata": {}, "source": [ "For stations where no VOD was assigned, the fill value (or np.NaN if no fill value is provided) is used (here we simply pick the first available sensor at the station, with index 0)." ] }, { "cell_type": "code", "execution_count": 23, "id": "e4ae3da5-3c82-48c4-9c60-e899a329bad3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MetaData([\n", " MetaVar([vod_k, -9999.0, None]),\n", " MetaVar([vod_x, nan, None])\n", "])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]" ] }, { "cell_type": "markdown", "id": "2666f3f5-8545-4154-83bf-725de4c24465", "metadata": {}, "source": [ "We can now use them as any other metadata variable, e.g. to find the station with a specific value." ] }, { "cell_type": "code", "execution_count": 24, "id": "4312549b-bcf4-4eca-b5cd-786abe7fe5bd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | \n", " | 2 | \n", "
|---|---|---|
| variable | \n", "key | \n", "\n", " |
| clay_fraction | \n", "val | \n", "4.0 | \n", "
| depth_from | \n", "0.0 | \n", "|
| depth_to | \n", "0.3 | \n", "|
| climate_KG | \n", "val | \n", "Cfb | \n", "
| climate_insitu | \n", "val | \n", "unknown | \n", "
| elevation | \n", "val | \n", "52.42 | \n", "
| instrument | \n", "val | \n", "ThetaProbe-ML2X | \n", "
| depth_from | \n", "0.05 | \n", "|
| depth_to | \n", "0.05 | \n", "|
| latitude | \n", "val | \n", "44.467 | \n", "
| lc_2000 | \n", "val | \n", "70 | \n", "
| lc_2005 | \n", "val | \n", "70 | \n", "
| lc_2010 | \n", "val | \n", "70 | \n", "
| lc_insitu | \n", "val | \n", "unknown | \n", "
| longitude | \n", "val | \n", "-0.7269 | \n", "
| network | \n", "val | \n", "FR_Aqui | \n", "
| organic_carbon | \n", "val | \n", "2.18 | \n", "
| depth_from | \n", "0.0 | \n", "|
| depth_to | \n", "0.3 | \n", "|
| sand_fraction | \n", "val | \n", "87.0 | \n", "
| depth_from | \n", "0.0 | \n", "|
| depth_to | \n", "0.3 | \n", "|
| saturation | \n", "val | \n", "0.49 | \n", "
| depth_from | \n", "0.0 | \n", "|
| depth_to | \n", "0.3 | \n", "|
| silt_fraction | \n", "val | \n", "9.0 | \n", "
| depth_from | \n", "0.0 | \n", "|
| depth_to | \n", "0.3 | \n", "|
| station | \n", "val | \n", "fraye | \n", "
| timerange_from | \n", "val | \n", "2013-08-13 10:00:00 | \n", "
| timerange_to | \n", "val | \n", "2020-01-01 00:00:00 | \n", "
| variable | \n", "val | \n", "soil_moisture | \n", "
| depth_from | \n", "0.05 | \n", "|
| depth_to | \n", "0.05 | \n", "|
| vod_k | \n", "val | \n", "0.64923 | \n", "
| vod_x | \n", "val | \n", "0.390218 | \n", "