<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>I. Capralova);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Explainable Spatial Modeling of Groundwater Nitrate Concentrations in the Netherlands</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iulia Capralova</string-name>
          <email>i.capralova@student.rug.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Cardenas-Cartagena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Groningen, Faculty of Science and Engineering</institution>
          ,
          <addr-line>Nijenborgh 9, 9747 AG Groningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Nitrate leaching from soil is a major source of groundwater pollution that threatens both environmental and public health. In the Netherlands, intensive agricultural practices make the country vulnerable to nitrate contamination, requiring constant monitoring of groundwater quality. Since there are few measurement sites, there is a need for interpretable estimation methods via spatial modeling. This paper develops an explainable spatial regression model for estimating nitrate concentrations in groundwater in the Netherlands, using spatial and environmental factors. Three models were tested: Ridge Linear Regression, Random Forest, and Extreme Gradient Boosting (XGBoost). The ensemble of Random Forest and XGBoost explained 66% of the variance in nitrate levels, indicating competitive performance suitable for interpolation and map-based interpretation of spatial nitrate variability in the Netherlands. Key factors influencing nitrate leaching, including soil type, loam content in the soil, and groundwater depth, were identified using both model-specific and model-agnostic methods. Modeled spatial estimated maps show that between 2017 and 2023, nitrate concentrations decreased in the east and northeast, while they increased in the south and parts of the central Netherlands. This modeling framework can support targeted environmental policy decisions aimed at reducing groundwater pollution.</p>
      </abstract>
      <kwd-group>
        <kwd>Nitrate leaching</kwd>
        <kwd>groundwater monitoring</kwd>
        <kwd>geospatial analysis</kwd>
        <kwd>spatial regression</kwd>
        <kwd>explainable artificial intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Motivation</title>
        <p>Nitrate (NO3 – ) is a form of inorganic nitrogen that is found in soil and groundwater. It acts as a
primary source of nitrogen for plants, supporting their growth and development [1]. Nitrate occurs
naturally in groundwater through processes such as rock weathering and biological activity, resulting
in a background level of approximately 2.5 mg/L [2]. However, in recent decades, nitrate levels in
groundwater have increased worldwide due to human activities such as heavy use of nitrogen-based
fertilizers and animal manure in areas with intensive farming. This causes nitrate to leak from the
soil into the groundwater, leading to pollution of water bodies. Today, nitrate leaching is a serious
environmental problem in many farming regions across Europe and beyond [3].</p>
        <p>The imbalance of nitrate concentration in groundwater has consequences for both nature and human
health. When nitrate enters surface waters such as lakes and rivers, it causes eutrophication (i.e., rapid
growth of algae) which reduces oxygen levels and disrupts aquatic life [4]. Over time, this contamination
afects the quality of drinking water, causing health issues such as methemoglobinemia and increasing
the risk of gastric cancer [5, 6, 7, 8].</p>
        <p>To protect water quality, the European Union (EU) created the Nitrates Directive (91/676/EEC), which
sets a safe limit of 50 mg/L of nitrate concentration in groundwater [9]. It requires each member country
to monitor nitrate levels, identify vulnerable areas, and take steps to reduce pollution. While this
program has helped to improve the overall situation in many EU countries, in some local areas, nitrate
contamination still remains an issue. The Netherlands, as one of the leading agricultural producers in
the world, regularly monitors and reports nitrate levels through the Dutch National Minerals Policy
Workshop on AI-driven Data Engineering and Reusability for Earth and Space Sciences (DARES’25), co-located with the 28th</p>
        <p>CEUR
Workshop
Proceedings</p>
        <p>ceur-ws.org</p>
        <p>ISSN1613-0073
Monitoring Program (LMM). Between 1992 and 2019, a 50% reduction in hotspot areas was achieved
[10]. Despite this progress, challenges remain. For example, in sandy agricultural regions in the South
of the Netherlands, shallow groundwater (5 - 15 meters) still exceeds the EU nitrate limit [11].</p>
        <p>These localized spikes in nitrate levels indicate that policies should be tailored to local conditions
rather than relying solely on national targets. To achieve this, authorities require an understanding
of nitrate distribution across the country. However, monitoring nitrate levels across the country at
high resolution and frequency remains a challenge. Taking nitrate samples is both expensive and
time-consuming. Thus, it is measured once or twice per year at limited locations. This leaves gaps in
our understanding and makes it dificult to identify high-risk areas.</p>
        <p>This problem motivates the development and application of data-driven spatial regression models to
explain how both natural conditions and human activities contribute to variations in these patterns
across regions. These models can help identify polluted areas where measurements are unavailable,
support groundwater protection plans, and inform more sustainable farming practices tailored to the
needs of specific regions.</p>
        <p>Furthermore, this study aligns with the United Nations Sustainable Development Goal 6, which aims
to protect and use water resources responsibly. A key target under this goal is to improve water quality
by reducing pollution, minimizing the release of hazardous chemicals, and halving the proportion of
untreated wastewater [12]</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. State-of-the-Art</title>
        <p>Nitrate leaching has been spatially estimated using geostatistical methods that use spatial and temporal
autocorrelation, relying on Tobler’s First Law of Geography, “everything is related to everything else, but
near things are more related than distant things” [13]. Methods such as co-kriging and disjunctive kriging
are based on this principle, as they estimate values at unsampled locations by leveraging the similarity
between nearby observations. Incorporating temporal information has been shown to reduce estimation
uncertainty further and improve spatial risk assessments [14, 15]. However, these methods require a
minimum number of observations to compute reliable variograms1, which limits their applicability in
regions with sparse monitoring data. Moreover, focusing primarily on geographic proximity can result
in omitting important environmental interactions and biogeochemical processes in the nitrogen cycle
(see Section 2).</p>
        <p>In recent years, machine learning (ML) has become a preferred method for environmental and soil
studies. It can handle large datasets and capture complex patterns found in groundwater quality data.
Among ML techniques, tree-based models are commonly used for spatial nitrate estimation due to their
inherent model-specific interpretability and relatively low computational cost [ 16].</p>
        <p>Paper written by Mahlknecht et al. [17] applied Extreme Gradient Boosting (XGBoost) to estimate
nitrate concentrations across Mexico, achieving an accuracy of 0.80 despite limited monitoring data.
They identified rainfall, elevation, and slope as key predictors and produced spatial risk maps that
exposed major nitrate hotspots. Similarly, Covatti et al. [18] used Random Forest (RF) to map nitrate
levels in Swiss groundwater, integrating SHAP values to interpret feature importance. The model
explained 58% of the nitrate variance. The model revealed that factors like seasonal precipitation and
soil organic carbon emerged as strong predictors. In the United States, Ransom et al. [19] developed
a 3D XGBoost model to estimate nitrate across two drinking water zones, achieving an  2 of 0.49
on hold-out data. Their analysis showed that well depth, soil and climate characteristics, and the
absence of developed land were key factors influencing nitrate levels at the national scale. These studies
demonstrate the ability of tree-based models to capture complex nitrate dynamics in the environment.</p>
        <p>In the Dutch context, Spijker et al. [20] have developed an RF model to map nitrate leaching from
agricultural soils using environmental predictors. The model explained 58% of the variance and
highlighted land use and elevation as being key variables. However, the study was limited by its narrow
temporal scope (restricted to 2017), reliance on a model-specific interpretability method only, and
1A tool in geostatistics used to analyze the spatial continuity of data by quantifying the degree of dissimilarity between data
points as a function of distance.
the absence of key factors influencing the nitrogen cycle, such as population pressure, precipitation,
temperature, and detailed soil properties, including chemical composition. These limitations are
addressed in our study through broader environmental inputs, multi-year analysis, and model-agnostic
interpretability methods.</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Contributions</title>
        <p>
          Building on the work of [20], our study aims to advance nitrate leaching estimation in the Netherlands
by (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) increasing the temporal scope of both model training and estimation, (2) incorporating a broader
range of environmental and agricultural predictors based on the nitrogen cycle, and (3) applying
modelagnostic and model-specific interpretability techniques. Therefore, our research has the following
objective: The development of an explainable spatial regression model for estimating nitrate concentrations
in groundwater in the Netherlands, using spatial and environmental data from agricultural soils.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. A Brief Introduction to the Nitrogen Cycle</title>
      <p>The nitrogen cycle describes the natural processes through which nitrogen moves between the
atmosphere, soil, water, and living organisms, which is summarized in Figure 1 [21]. Its understanding helps
in feature selection and data preprocessing steps.</p>
      <p>According to Fowler et al. [22], the nitrogen cycle begins when nitrogen enters the soil through
atmospheric fixation, where lightning converts N2 into reactive nitrogen compounds that are deposited
with rainfall. Moreover, in agricultural areas, natural inputs are enhanced by the application of synthetic
fertilizers, as well as organic sources such as livestock manure and urine. Additionally, urban areas
primarily contribute through wastewater discharge from sewage systems into soil and water bodies. The
efects are seen in densely populated regions, where higher concentrations of people are associated with
greater wastewater production, fertilizer use, and nitrogen emissions [23]. These emissions, particularly
ammonia from agriculture and nitrogen oxides from trafic and industry, settle onto the land as nitrogen
deposition in soil.</p>
      <p>Once in the soil, nitrogen undergoes a series of microbial transformations that convert it into forms
plants can absorb. These processes are shaped by environmental conditions: soil texture influences
oxygen and water flow, temperature afects microbial activity, and moisture levels determine whether
nitrification or denitrification is favored. Eventually, part of this nitrate is taken up by plants, with
uptake varying depending on the type of vegetation and the depth of the roots. For instance, according
to Nouri et al. [24], deep-rooted or fast-growing crops tend to absorb more nitrates.</p>
      <p>When the amount of nitrate in the soil exceeds the plant’s uptake capacity, nitrate leaches into
groundwater. Since nitrate (NO3 – ) is highly soluble in water and poorly retained by soil particles due
to its negative charge, it is easily washed out from the root zone under the influence of precipitation,
and groundwater table [22, 25]. Moreover, soil texture determines how water moves through the soil.
For example, sandy soils, due to large particles, let water with nitrate pass through quickly [26, 27].
Conversely, the fine texture and low permeability of clay soils restrict water flow, which limits nitrate
leaching [28]. Additionally, temperature regulates the activity of bacteria involved in nitrate production,
thereby influencing nitrification and denitrification rates [ 29]. Elevation further influences nitrogen
dynamics by shaping drainage, runof, and the accumulation of water and nutrients in low-lying areas
[30].</p>
      <p>Apart from leaching, nitrogen can also be removed from the soil system through denitrification, a
microbial process that transforms nitrate into gaseous nitrogen compounds (N2, N2O) released into the
atmosphere. This process closes the nitrogen cycle by returning nitrogen to the atmosphere. The overall
cycle ofers insights into relevant data and features for developing explainable ML-based estimators for
nitrate leaching.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>This study models nitrate leaching in agricultural soils across the Netherlands by training regression
models on georeferenced nitrate measurements and spatial-temporal features. Once trained, models
are applied across a spatial grid, where each geographic location contains corresponding values for
all predictors. The outcome is an estimated map, showing the spatial patterns and trends of nitrate
leaching across the country. The following sections describe the data exploration analysis, the modeling
framework, and the evaluation methods used to assess model performance.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Collection</title>
        <p>The target variable is the nitrate concentration in groundwater. Measurements were obtained from well
sensors logged in the DINO Loket platform [31]. The samples were collected according to the Dutch
protocol NTA 8017 [32], which ensures standardization in sampling procedures and data quality. The
sampling frequency for monitoring wells ranges from once to twice a year, which mostly falls in the
winter months. Samples taken within city boundaries were excluded from the analysis.</p>
        <p>Each well consists of multiple filters, which are screened intervals at diferent depths designed to
isolate hydrogeological zones. For consistency and comparability across wells, we use data from Filter
1, the shallowest filter, which directly reflects nitrate leaching from the root zone of agricultural soils.</p>
        <p>The predictor and target variables used in this study are summarized in Table 3 in Appendix A.2.
Their selection was based on the Nitrogen Cycle (see Section 2). These datasets are derived from a
range of sources and measurement techniques, which are described in Appendices A.4 and A.5. In total,
the dataset contained 7620 samples and 18 features.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Study area</title>
          <p>For this study, the focus is on the entire territory of the Netherlands, which covers an area of
approximately 41,543 km2 [33]. It lies within the Rhine–Meuse river basin, which explains relatively shallow
groundwater levels [34]. The elevation ranges from 7 meters below sea level (in South Holland) to
above 322 meters (in Limburg). The country has polders, which are areas of land taken from the sea or
rivers and kept dry by dikes.</p>
          <p>Unconsolidated Quaternary sediments of fluviatile, marine, or glacigenic origin dominate the
subsurface of the Netherlands. The country is characterized by four main soil types: peat, clay, sand, and
loess soils (see Figure 2a). Peat areas are mostly found in low-lying regions, clay soils dominate the
coastal and riverine floodplains, sandy (50% of the area) soils are common in the eastern and southern
parts of the country [35], and loess soil is found only in the South Netherlands (see Figure 2a).</p>
          <p>Approximately 66% of the land is used for agriculture, while around 16% is covered by forests [36, 37].
The remaining area consists of urban settlements, water bodies, and infrastructure.</p>
          <p>The Netherlands has a temperate maritime climate, with mild winters (3–6 °C), cool summers (17–20
°C), and a yearly average temperature of around 10 °C. Rainfall is evenly distributed throughout the
year, averaging 800–900 mm annually, with most of the precipitation occurring in autumn and winter
[38]. These conditions create a predisposition for nitrate leaching, especially after dry summers when
nitrogen accumulates in the soil and is flushed out by winter rains [ 39].</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Data Inspection</title>
          <p>To understand general trends in nitrate leaching and prepare for further analysis, we first look at the
available data.</p>
          <p>The monitoring network covers a total of 874 unique groundwater locations sampled between 2008
and 2023. Table 2 and Figure 9 in Appendix A provide an overview of the monitoring network. The
table summarizes the number of unique sites sampled each year, with coverage peaking in 2015 (591
sites) and reaching its lowest in 2023 (274 sites), while the figure shows the combined spatial distribution
of all monitoring sites across the country.</p>
          <p>Figure 2b displays the distribution of nitrate concentrations across diferent soil regions: clay, loess,
peat, and sand. In clay and peat regions, nitrate levels are low, with median concentrations of 0.05 mg/L
and maximum values well below the EU legal limit of 50 mg/L. On the other hand, anaerobic conditions
in peat lands support the conversion of nitrate back into nitrogen gas through denitrification. The loess
region shows moderate nitrate levels, with a median of 8.37 mg/L, and some values reach the EU limit.
In contrast, sandy soils have the highest nitrate concentrations and variability, with a median value of
0.07 mg/L but maximum values exceeding 100 mg/L, more than twice the EU threshold.</p>
          <p>Appendix A.6 describes the preprocessing data pipeline designed to prepare training and test datasets
for model training and evaluation.</p>
          <p>(a) Map of soil regions in the Netherlands.</p>
          <p>(b) Distribution of nitrate concentrations across soil
regions.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model development</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Algorithm selection</title>
          <p>The goal of this study is to develop an explainable model of nitrate leaching across the Netherlands.
For this reason, models with clear interpretations were prioritized. Three diferent models were chosen:
• Ridge Linear Regression, used as the baseline model, assumes a linear relationship between
predictor and target variables. The resulting coeficients indicate both the direction (positive or
negative) and strength of each variable’s relationship with nitrate concentrations.
• Random Forest, proposed by Breiman [40], is an ensemble model that constructs a number of
decision trees, each trained on random subsets of data and features.</p>
          <p>• XGBoost, introduced by Chen and Guestrin [41], is a gradient boosting model that builds decision
trees sequentially, where each new tree aims to correct the errors made by previous trees.</p>
          <p>To combine the strengths of the best-performing models, an ensemble is created by taking an
inverse-variance weighted average of the models’ estimations [42].</p>
          <p>All models are implemented using the scikit-learn library [43]. For XGBoost, the
scikit-learncompatible XGBRegressor interface from the XGBoost package is used [41].</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Performance evaluation</title>
          <p>In order to gain diferent perspectives on the model performance, the following evaluation metrics were
used:
• Coeficient of determination (  2) measures the proportion of variance in the target variable that
is explained by the model. An  2 value closer to 1 indicates better estimation performance:
• Root Mean Square Error (RMSE) gives higher weight to large errors by squaring the residuals
before averaging and taking the square root:
 2( ,  )̂ = 1 −
∑

=1 (  −  ̂ )2
∑

=1 (  −  )̄ 2
RMSE( ,  )̂ =</p>
          <p>∑(  −  ̂ )2
MAE( ,  )̂ =
∑ |  −  ̂ |
1
√  =1


1
 =1
• Mean Absolute Error (MAE) calculates the average absolute diference between estimated and
actual values:</p>
          <p>
            In all three equations (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ), (2), (3),   is the true value for sample  ,  ̂ is estimated value for sample  ,
and  is number of samples. Additionally, in (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  ̄ is the mean of the true values.
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Hyperparameter optimization</title>
          <p>To control the complexity and regularization strength of the chosen models, a hyperparameter search
is performed. Specifically, the</p>
          <p>
            Randomized Search Cross-Validation implementation from scikit-learn
[43] is used to sample 60 unique hyperparameter combinations from the predefined search space. Each
configuration was evaluated using a 5-fold time series cross-validation scheme ( Time-Series Split) to
preserve temporal structure in the data and avoid information leakage between training and validation
folds. Model performance during hyperparameter tuning was evaluated using MAE in (3). And  2, in
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ), was used as the refitting criterion for the final model selection.
          </p>
          <p>Appendix B gives an overview of the hyperparameter search space along with the chosen ones for
Ridge Linear regression (see Table 4), Random Forest (see Table 5), and XGBoost (see Table 6).</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. Model training</title>
          <p>
            Once the optimal configuration for each model was identified, the corresponding model was retrained
on the whole training set and evaluated on the held-out test set. The objective of the training procedure
was to minimize the MAE, in (3). During model selection,  2, in (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ), was used as the refitting metric
to prioritize models that explain more variance in the data. At the same time, RMSE, in (2), was also
monitored as a secondary evaluation criterion.
          </p>
          <p>
            To understand how each model generalizes as the amount of training data increases, a learning curve
analysis was implemented based on repeated time series cross-validation. Once the best hyperparameters
1–25
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(2)
(3)
were found for each model, the best-performing estimator was cloned and retrained across multiple
increasing subsets of the training data. Specifically, for each training subset fraction, the corresponding
data was split using a 10-fold Time-Series Split, and the model was fitted in each fold. Validation and
training scores were computed using MAE and were stored across the folds to assess the learning
pattern.
          </p>
          <p>Resultant learning curves for Random Forest and XGBoost can be found in Appendix B on Figures
14a, and 14b respectively.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Model results and estimated maps</title>
        <p>The performance of each model was evaluated on the held-out test set (882 samples), using the three
metrics described in Section 3.2.2.</p>
        <p>As shown in Table 1, the Ridge Linear Regression model achieved the lowest performance, with an
 2 of 0.23, an MAE of 3.79 mg/L, and an RMSE of 7.58 mg/L. It was outperformed by both Random
Forest and XGBoost with  2 values of 0.649 and 0.638, MAEs of 2.10 mg/L and 2.04 mg/L, and RMSEs
of 5.13 mg/L and 5.20 mg/L, respectively. Therefore, Random Forest and XGBoost were included in
the Ensemble model, with weights of 0.52 and 0.48, respectively. Resultant ensemble achieved the best
overall performance, with an  2 of 0.66, an MAE of 2.02 mg/L, and an RMSE of 5.05 mg/L.</p>
        <p>To further assess the quality of each model, Figure 3 presents scatter plots comparing the estimated
versus actual nitrate concentrations, and Figure 4 demonstrates the corresponding residuals.</p>
        <p>In Ridge Linear Regression (see Figures 3a and 4a), estimations align with the diagonal for values
below 10 mg/L. For higher concentrations (&gt;15 mg/L), estimated values cluster below the diagonal,
showing underestimation. Residuals increase with expected values, reaching up to 50 mg/L, and show a
downward trend across the x-axis.</p>
        <p>The estimation patterns of Random Forest (see Figures 3b and 4b) and XGBoost (see Figures 3c and
4c) are similar with minor diferences. XGBoost estimates high nitrate concentrations ( &gt;40 mg/L)
more accurately, while Random Forest performs better for mid-range values (20-30 mg/L). Residuals
in XGBoost, in the range of 20-30 mg/L, are less clustered and more dispersed compared to those in
Random Forest. Residual plots for both models show that residuals range approximately from -30 to 40
on the y-axis.</p>
        <p>In the Ensemble model (see Figures 3d and 4d), the estimation and residual patterns are similar to
those of Random Forest and XGBoost. For several points in the 10-30 mg/L and &gt;40 mg/L ranges,
residuals are smaller compared to those in the individual models. Some over- and underestimations
present in RF and XGBoost are reduced.</p>
        <p>Although the performance gains are small, the ensemble provides a more stable estimation. Thus, it
is used for nitrate spatial modeling. Its performance on the test set of the year 2021 can be seen in Figure
5. The Ensemble model captures the spatial pattern of nitrate concentrations across the Netherlands,
specifically lower levels in the north and coastal areas, and high concentrations in the southern region
of North Brabant (highlighted in the red square).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Spatial modeling</title>
        <p>Ensemble, as the best performing model, was used with the predictor variables to generate nitrate maps
at a resolution of 500 by 500 meters for the years 2010, 2017, 2021, and 2023 (see Figure 8).</p>
        <p>Most of the Netherlands shows low nitrate concentrations, with values below 5 mg/L. The spatial
pattern of increased nitrogen levels in region A (Noord-Brabant and Limburg) and region B (east
of Gelderland and Overijssel) remains stable through four maps. However, some changes in nitrate
distribution and intensity are still present over time.</p>
        <p>Figure 7 shows the diference in estimated nitrate concentrations for the years 2017, 2021, and 2023,
each compared to 2010. In 2017 (see Figure 7a), nitrate concentrations increased in areas of region A
(Noord-Brabant) and region B (Overijssel and Drenthe). In these areas, nitrate values are up to 10 mg/L
higher than in 2010.</p>
        <p>In 2021 (see Figure 7b), regions A and B still show an increase in nitrate levels. However, the pattern
is diferent compared with 2017 (see Figure 7a). In region A, the increased nitrate remains, but in region
B there is a significant decrease.</p>
        <p>The pattern in both regions from 2021 is further developed in 2023 (see Figure 7c), with a slight
decrease in nitrate levels in Region A and an even greater decline in Region B. However, a new hotspot
with elevated nitrate concentrations emerged in the northern part of Drenthe.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Importance and influence of predictor variables</title>
        <p>(a) Model-specific feature importance.
(b) Model-agnostic feature importance.</p>
        <p>To understand which predictors influence the nitrate leaching, feature importance was evaluated
using model-specific and model-agnostic interpretability methods.</p>
        <p>Figure 6a presents model-specific measures of feature relevance: Gini importance for both Random
Forest and XGBoost, and feature coeficients for Ridge Linear Regression. Scores from each model
were normalized to sum to 1, and only features shared across all three models are shown. Among
them, “Thick soil type” is highlighted as the most important predictor. The models also agreed on the
importance of “Loam content”, “Sand median”, and “Silt content”.</p>
        <p>Figure 6b presents the model-agnostic interpretability results based on LIME, Permutation Importance,
and SHAP values. Across all three, “Thick soil type” and “Groundwater table” are found to be the most
important features, although the importance values difer. For example, LIME gives “Thick soil type”
the highest score (around 0.3), while SHAP and Permutation give it slightly lower scores. Moreover,
all three methods further agreed on the importance of “Loam content”, “C:N ratio”, and “Elevation”,
however, with lower importance scores.</p>
        <p>Taken together, both model-specific and model-agnostic approaches identified “Thick soil type” and
“Loam content” as important predictors of nitrate leaching. Moreover, the strength of their importance
is consistent across methods, with “Thick soil type” receiving the highest scores, and “Loam content”
receiving the lowest ones.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Interpretation of the results</title>
        <p>The main goal of this study was to develop an explainable spatial regression model for estimating nitrate
concentrations in groundwater in the Netherlands, using spatial and environmental data from soils.</p>
        <p>Among all tested models, the Ensemble demonstrated the best performance, explaining 66% of
the variance in nitrate levels with an RMSE of 5.05mg/L, which is considered strong for large-scale
applications. It outperforms comparable studies such as Ransom et al. [19] in the United States ( 2
= 0.49), Covatti et al. [18] in Switzerland ( 2 = 0.58, RMSE = 7.8 mg/L), and Spijker et al. [20] in the
Netherlands ( 2 = 0.58). As the main model framework was adapted from the approach by Spijker et al.
[20], our results imply that increasing the training dataset and introducing new predictor variables,
such as detailed soil properties, allowed the models to capture complex interactions more accurately.</p>
        <p>From a practical perspective, this level of estimation performance means that the Ensemble model
is suitable for spatial screening and groundwater risk assessment but should not be used as a precise
indicator, as it consistently underestimates high nitrate concentrations (&gt;30 mg/L). This is a common
limitation of regression models [44]. Therefore, policy decisions near regulatory thresholds should not
rely solely on model outputs but use them to guide targeted monitoring eforts.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Nitrate trend through years</title>
        <p>Analyzing the spatial nitrate concentration distribution through the years based on Figure 8, it is clear
that the lowest levels are found in peat, loess, and clay areas of the west and north of the country.
However, as mentioned in Section 4.2, region A (North Brabant and Limburg) and region B (east of
Gelderland and Overijssel) consistently show some of the highest modeled nitrate concentrations in the
Netherlands, reaching 50 mg/L. This trend can be explained by the presence of sandy soils, which are
known for their low capacity to retain water and nutrients [27]. These estimations coincide with [20]
nitrate estimations and online reports from the LMM [45].</p>
        <p>Based on the diference map of 2017 (see Figure 7a) the situation of high nitrate increased in Regions
A and B. This can be explained by the increase in nitrogen surplus after 2015 [ 46] mainly due to higher
emissions and greater use of artificial fertilizers, despite more nitrogen being removed with the crops.</p>
        <p>In the following years, diference maps in 2021 in Figure 7b and 2023 in Figure 7c show that the
increased nitrate concentrations persist in region A, while in region B the levels mostly return to the
level of year 2010. This can be explained by the diferences in sandy soils and land use practices. In
Sand South, there is a predominance of dry, well-drained sandy soils (16% dry soils) and the relatively
low proportion of reclaimed peat soils. Together they lead to faster nitrate leaching compared to wetter
regions [47]. In addition, the agricultural structure difers across the sand subregions: Sand South has
a lower share of grassland (dairy farms account for 46% of the area), which is associated with higher
nitrogen leaching than grassland-dominated areas [48].</p>
        <p>The decline in nitrate concentrations in Sand North and Sand Central after 2017 can be attributed to
both environmental and agricultural factors. In Sand North, dairy farming accounts for 49%, while in
Sand Central, dairy farming accounts for 67% [48, 45]. Both subregions have a relatively high share of
grassland on dairy farms, which retains more nitrogen and results in lower nitrate leaching compared
to arable land. These agricultural patterns, combined with a higher proportion of reclaimed peat soils
in Sand North (49%), promote denitrification. This helps explain why nitrate concentrations peaked in
2017 but subsequently decreased, with Sand North even dropping below the EU threshold of 50 mg/L in
recent years [49, 47].</p>
        <p>The sharp nitrate increase observed in northern Drenthe in the diference map of 2023 (see Figure 7c)
occurred in an area covered by coniferous and deciduous forests. This finding is unexpected, as forests
absorb nitrate in large quantities, resulting in low nitrate leaching to groundwater [50]. This anomaly
may be explained by the limitations of this study in Section 5.4.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Analysis of factors</title>
        <p>Both model-agnostic and model-specific feature importance analyses identified the main soil
classification, “Thick earth soil,” as the most important predictor of nitrate leaching in the Netherlands. Formed
over centuries of deep fertilization and tillage, these soils develop a sandy top layer called the screed
[51], which makes it easier for water and nutrients to move downward (see Section 2). This increases
leaching risk, especially with heavy fertilizer use and irrigation. Although thick soils can store more
nitrate, their structure also allows it to reach groundwater quickly. These soils are mainly found in the
higher cover sand regions of Brabant, Gelderland, and Overijssel, matching the areas with the highest
estimated nitrate levels (see Figure 8).</p>
        <p>Loam content also emerged as a key predictor. Loam, a balanced mix of sand, silt, and clay, holds
water and nutrients well while still draining, which slows nitrate leaching. In this study, lower loam
content was linked to higher nitrate levels, suggesting that sandier, more permeable soils raise leaching
risk, consistent with previous findings [ 27].</p>
        <p>Groundwater depth was also identified as important by the model-agnostic method. SHAP values
show that greater groundwater depth is associated with higher nitrate concentrations, which may seem
counterintuitive, since deeper groundwater could allow more nitrate to be filtered through the soil
[25]. However, in the Netherlands, deeper groundwater tables are typically found in elevated sandy
regions with intensive agriculture and high nitrate inputs, as also noted by Spijker et al. [20]. Thus,
groundwater depth here is likely an indicator of broader landscape characteristics.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Limitations</title>
        <p>The spatial mismatch between nitrate and groundwater measurements (see Appendix A.6) may introduce
uncertainty, as reflected by the moderate correlation between these variables (maximum correlation
coeficient of  = 0.6). This suggests that spatial alignment issues could weaken the precision of the
model’s relationships. Also, missing land use data for some years was filled using maps from nearby
years, which may have hidden short-term changes afecting nitrate leaching. And, while the ensemble
model explained 66% of the variance in nitrate concentrations, a substantial proportion of variability
remains unexplained. This residual variance likely arises from missing explanatory variables, such as
detailed fertilizer application data. While the EU’s Farm Accountancy Data Network (FADN) does collect
farm-level information on fertilizer usage, only aggregated data is publicly available [52]. This lack of
high-resolution fertilizer data restricts the ability to capture field-level variations in nitrate input. These
factors highlight the need for more detailed spatial and temporal data, and the development of more
comprehensive models to further improve the estimation of nitrate leaching in Dutch groundwater.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This project developed an explainable machine learning models for estimating nitrate concentrations in
Dutch groundwater, utilizing a combination of spatial and environmental data. The best performing
model is the Ensemble model that could explain 66% of the variance in nitrate concentrations. By
integrating both model-specific and model-agnostic interpretability methods, the study identified thick
type of soils, loam content, and groundwater depth as the most important predictors of nitrate leaching
risk. The results highlight the vulnerability of elevated sandy regions. Moreover, the results show that
overall nitrate concentrations remain low across most of the country, with high values only in a few
regions, suggesting that the EU regulatory limit has been efective in protecting groundwater quality.</p>
      <p>There are two main directions for future research. First, to improve predictive performance and
better explain nitrate variability, future models should integrate additional predictors related to local
agricultural practices and fertilizer application. Second, in this study the validation methods do not
account for spatial patterns. While a time-based split was applied to prevent temporal leakage, the
training and test samples remain geographically close to one another. This spatial proximity may
introduce correlation, which may result in performance estimates that do not accurately reflect the
models’ ability to generalize to new locations. Therefore, a spatial k-fold cross-validation proposed by
Pohjankukka et al. [53] should be implemented.</p>
      <p>Building on the findings of this project, future development should prioritize decision-support tools
based on these machine learning models to guide local nitrate management and policy decisions. Such
a tool could help policymakers and citizens to identify areas most at risk and tailor interventions more
precisely.</p>
    </sec>
    <sec id="sec-7">
      <title>Reproducibility</title>
      <p>For information about the datasets utilized, refer to Section 3.1 and Table 3 in Appendix A.2. The code
and experiments of this project can be found in the repository:
github.com/IuliaCapralova/NitrateSpatial-Modeling-NL.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>While writing this paper, the authors used ChatGPT (OpenAI) and Grammarly for formatting assistance
and to improve the writing style. All suggestions from these tools were reviewed and edited as needed
by the authors, who take full responsibility for the final content.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments References</title>
      <p>The authors thank Dr. P. Pradhan and Dr. J. Spijker for their discussions on environmental challenges
in the Netherlands, Dr. H. de Weerd for his feedback on the analysis presented in this work, and the
University of Groningen for providing access to the Hábrók High Performance Computing cluster.</p>
    </sec>
    <sec id="sec-10">
      <title>Appendices</title>
    </sec>
    <sec id="sec-11">
      <title>A. Data insights</title>
      <p>This appendix further investigates the dataset by analyzing correlations between target and predictor
variables and examining time series and spatial covariates. It also includes an overview of data sources,
availability, key summary statistics, and highlights outliers through visualization.</p>
      <sec id="sec-11-1">
        <title>A.1. Spatial distribution of monitoring wells</title>
        <p>This section focuses on an overview of the spatial distribution of monitoring locations. Figure 9
shows the combined distribution of all locations on the map, while Table 2 lists the number of unique
monitoring locations per year. The wells are spread throughout the country, with higher densities in
provinces such as Utrecht, North-Brabant, and North Holland.
persons/km2 CBS [56]
Nitrogen deposition
kg/ha/year</p>
        <p>RIVM [57]</p>
        <sec id="sec-11-1-1">
          <title>Variable</title>
          <p>Nitrate (NO3–)
Elevation
Population
Groundwater table
Precipitation
Temperature
Organic matter
Land use
Soil region
Soil type
C:N ratio
Soil calcic content
  -idth
Soil loam content
Sand median diame-  m
ter
Sand silt content
Soil acidity
Bulk density</p>
        </sec>
      </sec>
      <sec id="sec-11-2">
        <title>A.2. Overview of Variables Used in Modeling</title>
      </sec>
      <sec id="sec-11-3">
        <title>A.3. Correlation map</title>
        <p>To explore the relationships between nitrate concentrations and the explanatory variables presented in
Table 3, a Pearson correlation analysis was conducted. Since Pearson correlation is only meaningful
for continuous variables, the analysis was limited to these features. Figure 10 shows the resulting
correlation matrix based on the absolute values of pairwise correlations, allowing comparison of both
positive and negative relationships in terms of strength.</p>
        <p>Unit
mg/L
m
m
WUR [54]
Rijkswaterstaat
[55]</p>
      </sec>
      <sec id="sec-11-4">
        <title>A.4. Exploration of timeseries covariates</title>
        <p>Groundwater table Groundwater table was included as a key explanatory variable as its level
may accelerate nitrate washing out from the soil [25]. The dataset consisted of groundwater depth
measurements collected from monitoring wells between 2008 and 2023. The measurements were
obtained from a network of permanent monitoring wells equipped with automated sensors (pressure
transducers) that recorded water levels at sub-daily to daily intervals. Although only a few wells had
uninterrupted data across all years, the analysis included any well with a continuous (“clean”) segment
of measurements within this period, provided the segment exceeded 60 days. Given the high density
of wells available, only the longest segment per well was used, and the rest were discarded to avoid
redundancy. The raw data from each well often featured variable sampling frequencies, sometimes with
intervals shorter than a day and with occasional missing periods. To standardize the data and address
missing values, the depth measurements were first resampled to a uniform daily frequency using mean
aggregation. Short gaps of up to five days were then filled by linear interpolation, making the time
series consistent. In case this gap is exceeded, rows were considered as missing. The distribution of the
ifnal groundwater table dataset is shown in Figure 12a.</p>
        <p>One challenge in this study was that the locations where groundwater depth was measured did not
always match the nitrate measurement sites. To solve this issue, each nitrate measurement was linked
to the closest groundwater well within 5 km. This radius was chosen based on the correlation between
the distance between wells and groundwater table (see Figure 11). While this method made it possible
to use more data, it also added some uncertainty. The correlation between groundwater depth and the
distance reaches a maximum value of 0.61, highlighting this limitation.</p>
        <p>Temperature and Precipitation Precipitation and temperature were included as explanatory
variables to capture the influence of weather conditions on soil processes and nitrate mobility. Daily
meteorological data were obtained from the Royal Netherlands Meteorological Institute [62], which
operates a network of automatic weather stations (AWSs) that provide synoptic, high-quality
measurements validated according to NEN-EN-ISO/IEC 17025 standards. The original data were recorded at an
hourly frequency and were resampled to daily averages prior to further processing. Data were available
for the period 2008–2023 and were sourced from 18 stations distributed across the area of interest. Each
nitrate sampling location to the nearest weather station within a 20 km radius. The distribution of both
precipitation and temperature across all stations is shown in Figure 12b and 12c, respectively.</p>
        <p>For each unique sampling point, aggregated precipitation and temperature values were computed
based on the nearest available station. Next, both temperature and precipitation variables were
aggregated using the autocorrelation function (ACF) analysis as described in Appendix A.6.</p>
      </sec>
      <sec id="sec-11-5">
        <title>A.5. Exploration of spatial covariates</title>
        <p>Land use The land use data were sourced from the LGN (Landelijk Grondgebruiksbestand Nederland)
dataset [60], which provides nationwide, high-resolution maps derived from a combination of satellite
imagery and aerial photography. For more recent years, Sentinel-2 satellite data and imagery from
the National Satellite Data Portal were utilized, while the 2021 update also included detailed aerial
photographs. The spatial resolution of the maps is 5 meters, enabling precise assignment of land use
classes to each sampling location. The land use map was available for the following years: 2008, 2012,
2018, 2019, 2020, 2021, 2022, and 2023. For years, lacking a land use map, the closest available year was
adopted:
Nitrogen (N) deposition Annual spatial maps of nitrogen deposition for the period 2008–2023 were
obtained from the national Data on Deposition in the Netherlands (GDN), developed by the Dutch
National Institute for Public Health and the Environment [57]. These maps provide estimates of both
wet and dry deposition, which are major sources such as ammonia NH3 and nitrogen oxides NOx.
Deposition values are modeled using the OPS atmospheric transport model, which integrates oficial
emission inventories, land-use information, and year-specific meteorological data. For accuracy, modeled
estimates are calibrated against ground-based measurements from the national monitoring network,
employing a co-kriging approach that addresses spatial uncertainty and variations in measurement
reliability. The resulting calibrated maps, with a spatial resolution of 250 x 250 meters, provide a spatial
representation of nitrogen inputs, distribution of which is available in Figure 12d.</p>
        <p>Population density Annual gridded population maps for the years 2008–2023 were sourced from
Statistics Netherlands (CBS) [33], with a spatial resolution of 500 by 500 meters. The dataset is based on
the Dutch Personal Records Database (BRP), which provides register-based demographic information
and has replaced traditional census approaches. In accordance with privacy regulations, grid squares
containing fewer than five residents are not published. For the purposes of this study, such missing
values were imputed with zeros, reflecting uninhabited areas such as forests, water bodies, or agricultural
ifelds. The distribution of population density across nitrate sampling locations is shown in Figure 12e.
Elevation Elevation data were sourced from the Actueel Hoogtebestand Nederland (AHN) [55], a
national digital elevation model produced using laser altimetry from aircraft. The accuracy of the map
is provided by GPS corrections and merging overlapping data swaths. The resultant map has 0.5 by 0.5
meter resolution. The elevation distribution of nitrate well locations can be seen in Figure 12f.
Soil region The classification of soil regions was based on the Landelijk Meetnet efecten Mestbeleid
(LMM) dataset, a national monitoring network established to evaluate the environmental efects of the
Dutch manure policy. The LMM dataset established soil regions across the Netherlands according to
factors such as dominant soil type, hydrology, and land use. The resultant soil regions can be seen in
Figure 2a and nitrate distribution across soil region in Figure 2b.</p>
        <p>Soil properties For this study, such soil-related attributes were included as bulk density (see Figure
12g), soil acidity (see Figure 12h), C:N molar ratio (see Figure 12i), calcic content (see Figure 12j), Fe-dith
content (see Figure 12k), loam content (see Figure 12l), sand particle diameter (see Figure 12m), and silt
content (see Figure 12n).</p>
        <p>The properties were derived from the Soil Map of the Netherlands (Bodemkaart van Nederland)
[54], which is based on the BIS-4D digital soil dataset [63], both of which are developed under the
Basisregistratie Ondergrond (BRO) framework and released in 2024. These datasets integrate field
sampling (systematic drillings, laboratory analyses, and expert classification) with geospatial modeling
and machine learning, resulting in a 25-meter spatial resolution map. For each variable, values were
assigned based on the top 0–8 cm soil layer.</p>
        <p>Given the 16-year time frame of this study, the static map was used to represent nitrate levels
throughout the entire period. However, it is suficient because most soil properties change over
multidecadal timescales (20–50 years or more) [64]. More dynamic properties like organic matter content,
pH, and C:N ratio typically vary slowly over 10–20 years under stable land use and management [65].
(a) Groundwater table
(b) Precipitation
(c) Temperature
(d) Nitrogen deposition
(e) Population density
(f) Elevation
(g) Bulk density
(h) Soil acidity.
(i) C:N molar ratio
(j) Calcic content
(k) Fe-dith content
(l) Loam content in the soil
(m) Sand particle diameter
(n) Silt content in the soil</p>
      </sec>
      <sec id="sec-11-6">
        <title>A.6. Data preprocessing</title>
        <p>This appendix covers the steps involved in data preparation for modeling. It covers such steps as
handling missing values, feature engineering, spatial and temporal alignment, train-test split, and
scaling.</p>
        <p>First, missing values were handled. For the target variable, all rows with missing values were removed,
as they appeared to be missing at random. Nitrate levels are irregularly sampled throughout the year,
making substantial gaps between measurements (usually several months). However, for the predictor
variables, the issue of missing values was resolved using a combination of resampling, imputation, or
removal, depending on the data source and measurement frequency. For details, refer to Appendices</p>
        <p>A.4 and A.5.</p>
        <p>Second, feature engineering steps were performed. It has been shown that variables such as
temperature, precipitation, and groundwater table have a long-term impact on the nitrate leaching process
[66, 67]. Therefore, these variables were aggregated by a fixed window size, which was uniquely defined
for each variable through an autocorrelation function (ACF). It is a statistical tool that measures the
strength of the relationship between current values in a time series and past values at various time lags.
This way, the patterns in data over time can be identified. The following equation gives the function:
  (  ) =</p>
        <p>−
∑=1 (  −  ) (̄ + −  ) ̄</p>
        <p>∑=1 (  −  ) ̄ 2
where,   is the autocorrelation at lag  ,   is the value of the time series at time  ,  ̄ is the mean of
the time series, and  is the total number of observations. This results in a value between -1 and 1 that
indicates the strength and direction of correlation at lag  .
(a) Autocorrelation of daily temperature. A positive (b) Autocorrelation of daily precipitation. A positive
correlation is observed up to 73 lags.</p>
        <p>correlation is observed up to 4 lags.
(c) Autocorrelation of daily groundwater. A positive</p>
        <p>correlation is observed up to 55 lags.
series values. The dashed line represents 95% confidence bounds.</p>
        <p>
          Based on ACF plots in Figure 13, the following window sizes were determined: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Temperature: 73
days; (2) Precipitation: 4 days; and (3) Groundwater table: 55 days.
        </p>
        <p>Third, data alignment was performed both temporally and spatially. It should be noted that a spatial
mismatch happened between the nitrate and groundwater table datasets, meaning that the measurement
locations for nitrate and groundwater depth did not always coincide. To resolve this, each nitrate
observation was paired with groundwater depth data from the nearest well within a 5 km radius. This
choice was based on observed correlations between groundwater levels and distance (see Appendix
A.4 Figure 11). A similar strategy was applied for weather data: temperature and precipitation. Values
for these variables were taken from the closest weather station within a radius of 20 km of the nitrate
monitoring site.</p>
        <p>Fourth, while the majority of groundwater samples have low nitrate levels, there are several outliers
with high concentrations. The small number of such extreme cases may limit model generalization, as it
will have fewer examples to learn from when estimating these rare cases [68]. Therefore, samples with
nitrate levels higher than 58 mg/L were removed from the dataset, resulting in 20 outlier data points.</p>
        <p>Fifth, the dataset was split chronologically into training and testing sets. The training set consists of
4,439 samples spanning 13 years (2008–2020), while the testing set contains 882 samples from the last 3
years (2021–2023), corresponding to a split of approximately 83/17.</p>
        <p>Sixth, data transformations were applied. Categorical variables were one-hot encoded to avoid
ranking or ordinal assumptions between them. Continuous variables were normalized using Min-Max</p>
        <p>Scaler, bringing them to a [0,1] range using the following equation:</p>
        <p>−  min
 max −  min
,
where  represents an individual raw value from a given feature in the dataset.  min and  max denote
the minimum and maximum values observed in that feature in the training set, respectively.</p>
        <p>Once the entire dataset has undergone all the preprocessing steps, it is ready for use in model training
and evaluation, which are described in the following sections.</p>
        <p>To generate a map of estimated nitrate levels across years, a 500 x 500 m grid covering the country
is created, excluding urban areas. Each grid cell was matched with input features consistent with the
training data.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>B. Training insights</title>
      <sec id="sec-12-1">
        <title>B.1. Hyperparamter values</title>
        <p>Hyperparameter
Search space</p>
        <p>Selected value
Selector estimator alpha ( ) {0.1, 1.0, 10, 20, 30}
Max features for selector {10, 20, 30, 40}
Ridge regression alpha ( ) {0.1, 1.0, 10.0, 15.0}
250
None
2
1
sqrt</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>1. 2008 data was used for 2009 2. 2012 data was used for</article-title>
          <year>2010</year>
          ,
          <year>2011</year>
          ,
          <year>2013</year>
          ,
          <article-title>and 2014 3. 2018 data was used for</article-title>
          <year>2015</year>
          ,
          <year>2016</year>
          , and
          <issue>2017</issue>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>