High resolution water table modeling of the shallow groundwater using a knowledge-guided gradient boosting decision tree model

Research output: Contribution to journalArticleResearchpeer-review

17 Citations (Scopus)


Detailed knowledge of the uppermost water table representing the shallow groundwater system is critical in order to address societal challenges that relate to the mitigation and adaptation to climate change and enhancing climate resilience in general. Machine learning (ML) allows for high resolution modeling of the water table depth beyond the capabilities of conventional numerical physically-based hydrological models with respect to spatial resolution and overall accuracy. For this, in-situ well and proxy observations are used as training data in combination with high resolution covariates. The objective of this study is to model the depth of the uppermost water table for a typical summer and winter condition at 10 m spatial resolution over entire Denmark (43,000 km2). CatBoost, a state of the art implementation of gradient boosting decision trees, is employed in this study to model the water table depth and the associated uncertainties. The groundwater domain has not been the most prominent field of applications of recent hydrological ML advances due to the lack of big data. This study brings forward a novel knowledge-guided ML framework to overcome this limitation by integrating simulation results from a physically-based groundwater flow model. The simulation data are utilized to (1) identify wells that represent the uppermost water table, (2) augment missing training data by accounting for simulated water level seasonality, and (3) expand the list of covariates. The curated training dataset contains around 13,000 wells, 19,000 groundwater proxy observations at lakes, streams and coastline as well as 15 covariates. Cross validation attests that the ML model generalizes well with a mean absolute error of around 115 cm considering solely well observations and a MAE of <50 cm taking also the proxy observations into consideration. Quantile regression is applied to estimate confidence intervals and the estimated uncertainty is largest for moraine clay soils that are characterized with a distinct geological heterogeneity. This study highlights a novel research avenue of knowledge-guided ML for the groundwater domain by efficiently supporting a ML model with a physically-based hydrological model to predict the depth of the water table at unprecedented spatial detail and accuracy.

Original languageEnglish
Article number701726
Number of pages14
JournalFrontiers in Water
Publication statusPublished - 1 Sept 2021


  • CatBoost
  • high resolution
  • machine learning
  • quantile regression
  • water table depth
  • DK-model

Programme Area

  • Programme Area 2: Water Resources


Dive into the research topics of 'High resolution water table modeling of the shallow groundwater using a knowledge-guided gradient boosting decision tree model'. Together they form a unique fingerprint.

Cite this