<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Alexander Zipf. Automatic mapping of national surface water with OpenStreetMap and Sentinel-2
MSI data using deep learning. International Journal of Applied Earth Observation and
Geoinformation</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/TGRS.2016.2584107</article-id>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pavlo Kundenko</string-name>
          <email>pavel.kundenko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktoria Hnatushenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladyslav Tsaryk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Dmytriieva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ukrainian State University of Science and Technologies</institution>
          ,
          <addr-line>2 Lazariana St, 49005, Dnipro</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>28</volume>
      <fpage>807</fpage>
      <lpage>814</lpage>
      <abstract>
        <p>Thе study examines how different activation functions influence the performance of a U-Net model applied to binary water-body segmentation in Sentinel-2 imagery. Using an identical training setup for each experiment, six nonlinearities-ReLU, Leaky ReLU, ELU, PReLU, Swish and RReLU-are individually substituted into the network while all other parameters remain fixed. Comparative evaluation on a held-out validation set reveals that Leaky ReLU provides the most balanced trade-off between precision and recall, making it the preferred choice for accurate water-mask generation. PReLU offers a similar but slightly lower performance, whereas ELU excels at capturing additional water pixels at the cost of more false positives. The findings highlight the importance of activation-function selection in remote-sensing segmentation tasks and suggest further exploration of advanced nonlinearities and larger, more diverse datasets to enhance generalization.</p>
      </abstract>
      <kwd-group>
        <kwd>Water-body segmentation</kwd>
        <kwd>Sentinel-2</kwd>
        <kwd>U-Net</kwd>
        <kwd>activation functions</kwd>
        <kwd>remote sensing</kwd>
        <kwd>deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Monitoring water resources, including rivers, lakes, and coastal areas, plays a crucial role in modern
research related to sustainable environmental management, agrotechnology, and ecology. Thanks to
an extensive satellite network, particularly the Sentinel-2 program, it is now possible to obtain
highresolution multispectral imagery for the regular assessment of water bodies. However, the task of
automatically and accurately separating water from land (segmentation) remains challenging due to
factors such as water turbidity, seasonal variability, cloud coverage, and the spectral similarity of
various landscape elements.</p>
      <p>
        Traditional algorithms based on indices such as the Normalized Difference Water Index (NDWI)
offer fast solutions, but they are often vulnerable to complex environmental conditions. With advances
in deep learning within the field of computer vision, there is a growing trend towards the use of deep
convolutional neural networks (CNNs), which enable more accurate identification of image features
[
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. One of the most widely used models for segmentation tasks is the U-Net, proposed by
Ronneberger et al. Its encoder-decoder architecture with skip connections enables the fusion of deep
semantic information with high spatial resolution.
      </p>
      <p>Nonetheless, the effectiveness of CNN training — including that of U-Net — depends not only on
architectural design but also on the choice of activation functions, which define how neurons respond
to incoming signals. The ReLU (Rectified Linear Unit) family remains the most used due to its
simplicity and immunity to the vanishing gradient problem in the positive domain. However,
numerous modifications of ReLU (e.g., Leaky ReLU, PReLU, RReLU) as well as alternative functions
(ELU, Swish, Mish) have been proposed to improve convergence and address the limitations of
standard ReLU.</p>
      <p>This study presents a comparative analysis of six activation functions (ReLU, Leaky ReLU, ELU,
PReLU, Swish, RReLU) in the context of binary segmentation of water bodies using Sentinel-2 satellite
imagery. To ensure objective evaluation, all training parameters (number of epochs, dataset) were fixed
so that the only changing variable was the activation function. The evaluation criteria included F1
score, Intersection over Union (IoU), precision, recall, and convergence rate. The results provide
insights into which activation function contributes most effectively to accurate and reliable water
segmentation under diverse landscapes and imaging conditions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>U-Net Architecture</title>
        <p>This study employs a modified U-Net architecture specifically designed for the binary segmentation of
water bodies based on Sentinel-2 satellite data. The choice of U-Net is justified by its ability to
integrate high-level (global) features with fine-grained local details—an essential quality for detecting
small water bodies and complex shoreline structures. To enhance performance, the input data
undergoes preprocessing, including the generation of training patches from satellite scenes and the
creation of corresponding target masks. This enables the model to operate effectively on multispectral
imagery by adjusting the number of input channels and scale according to the available spectral bands.</p>
        <p>During model development, the unique characteristics of Sentinel-2 imagery are considered such as
the varying spatial resolution of individual bands and uneven surface illumination [23]. Each image is
normalized and divided into fixed-size patches, which simplifies the training process and reduces the
need to store large full-resolution intermediate results. Each patch is input into the U-Net as a tensor,
typically with 3 or 4 channels (RGB or RGB plus near-infrared). In this project, 512×512 sized patches
are used, striking a balance between computational cost and spatial detail preservation.</p>
        <p>
          The architecture retains the classic encoder-decoder structure with convolutional and transposed
convolutional blocks. The encoder progressively reduces spatial resolution while extracting
increasingly abstract features, whereas the decoder reconstructs the original image dimensions and
focuses on the accurate localization of segmented objects. Skip connections between corresponding
levels of the encoder and decoder help retain crucial fine-grained information that would otherwise be
lost—essential for delineating the boundaries of water bodies [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The final output layer generates a
binary map, where each pixel is assigned a probability of belonging to the water class. For full-size
imagery, the model processes each patch individually and subsequently reassembles the outputs into a
single map using mosaicking. Post-processing smoothing techniques are applied to reduce potential
artifacts or misclassifications.
        </p>
        <p>
          The advantages of U-Net in this project are further supported by multiple studies demonstrating its
effectiveness for remote sensing data [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The model's scalable filter size and layer dimensions allow it
to perform robustly under various imaging conditions while preserving the ability to identify narrow
linear structures. In this work, the encoder comprises layers with 64, 128, 256, and 512 filters, while the
bottleneck block reaches 1024 filters—a configuration widely used for high-resolution image
segmentation tasks [
          <xref ref-type="bibr" rid="ref3">9, 10, 3</xref>
          ]. To improve the model’s sensitivity to water, a near-infrared channel
(Sentinel-2 B8) may be incorporated alongside standard RGB, as this band provides better water-land
contrast due to differences in spectral reflectance [11]. Ultimately, each 512×512 patch is processed
independently, and the results are aggregated into a continuous segmentation map, optimizing
computation while accounting for spatial variability across the image.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Model Learning and Evaluation</title>
        <p>The research begins with the selection of Sentinel-2 satellite image that contains the relevant spectral
bands for distinguishing between water and land. The imageis normalized to minimize differences in
illumination and acquisition conditions.</p>
        <p>The fragment (5376x5376 pixels) of the original satellite image was selected for training dataset. The
image is then divided into fixed-size patches of 512×512 pixels to simplify the training process and
optimize computational efficiency. The total number of patches used for training is 110.</p>
        <p>Next, binary masks are generated to indicate the presence of water at the pixel level. Some of these
masks are refined manually, while others are derived from spectral indices and later validated for
labeling errors. This approach enables the creation of a robust training and validation dataset with a
balanced representation of water and non-water regions. A baseline U-Net model is used for
segmentation, with the number of input channels tailored to the selected spectral bands. Batch size,
learning rate, and other hyperparameters remain constant across all experiments to ensure fair
comparison among activation functions.</p>
        <p>The encoder-decoder structure, along with max-pooling and transposed convolution operations,
allows the network to preserve local details while reconstructing spatial features of the input image.</p>
        <p>For each activation function under consideration (ReLU, Leaky ReLU, ELU, PReLU, Swish, RReLU),
a separate variant of the model is implemented in which only the activation layers are modified. Aside
from the activation function, all other components—including the dataset and training duration—
remain unchanged.</p>
        <p>Upon completion of training, the models are evaluated on a validation set using F1, IoU, precision,
recall, and convergence metrics.</p>
        <p>The predicted outputs are stitched into complete segmentation maps, enabling both visual and
quantitative assessment of water body detection quality. Finally, a comparative analysis of all six
activation functions is performed to identify the most effective one for binary water segmentation
from Sentinel-2 imagery.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>ReLU Activation Function</title>
        <p>
          The Rectified Linear Unit (ReLU) is one of the most used activation functions in modern deep
convolutional networks [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It operates by zeroing all negative input values while retaining a linear
relationship for positive inputs. It is defined as:
 ( ) = max(0,  )
(1)
        </p>
        <p>
          ReLU was introduced to mitigate the vanishing gradient problem often encountered with sigmoid
or tanh activations [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Unlike these nonlinearities, ReLU provides a constant gradient for positive
inputs and avoids costly exponential computations, resulting in faster training.
        </p>
        <p>Its main advantages include computational simplicity and the ability to maintain non-zero gradients
when x &gt; 0, which facilitates effective optimization in deep architectures [12]. Additionally, the lack of
saturation for positive inputs allows neurons to output arbitrarily large values, assuming supportive
data and weights. However, ReLU has a significant drawback in the form of "dead neurons"—units that
output zero across all inputs if they remain in the negative region during training [8]. Despite this,
ReLU continues to demonstrate reliable performance in high-resolution image segmentation, including
satellite imagery [13].</p>
        <p>
          In this study, ReLU serves as the baseline activation function. The U-Net model with ReLU is used
as a reference to evaluate the performance gains achieved by its alternatives (Leaky ReLU, ELU, etc.). It
remains a widely adopted standard due to its proven efficacy in segmentation, classification, and
various deep learning tasks [
          <xref ref-type="bibr" rid="ref4 ref7">4, 7,12</xref>
          ]. An example of model execution is presented below:
Leaky ReLU is a variant of ReLU designed to mitigate the issue of "dead neurons" by allowing a small,
non-zero gradient for negative input values. While ReLU completely discards negative signals, Leaky
ReLU applies a small slope α to retain some gradient information [8]. It is defined as:
 ( ) = max ( ,  )
(2)
where α is a small positive coefficient (e.g., 0.01). This modification reduces the risk of permanent
neuron inactivity during training [14].
        </p>
        <p>The performance of Leaky ReLU is sensitive to the choice of α. A very small value makes it behave
like ReLU, whereas a large value may weaken its ability to discriminate between signal polarities [15].
In practice, α is often chosen between 0.01 and 0.1 to balance learning speed and neuron activity,
especially in vision tasks and satellite image segmentation [16].</p>
        <p>In water segmentation tasks, Leaky ReLU may enhance model adaptability in regions with high
spectral variability, such as vegetated shorelines or partially flooded zones. This study examines
whether Leaky ReLU enables the network to retain more informative neurons and achieve superior
segmentation performance compared to baseline ReLU [10].
ELU
The Exponential Linear Unit (ELU) was introduced to accelerate convergence and reduce bias shift
during training. It is defined as:</p>
        <p>
          ,   ≥ 0 (3)
 ( ) = α( −  ),   &lt; 0
where α is a positive parameter, typically set to 1 [17]. Unlike ReLU, ELU produces smooth negative
outputs rather than hard zeros, preserving gradients in the negative region [14]. For x ≥ 0, it behaves
similarly to ReLU, ensuring simple optimization and avoiding saturation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>ELU also helps to center activation values around zero, which can facilitate learning and reduce
reliance on normalization techniques [18]. However, it incurs higher computational costs due to the
exponential term and may generate large negative outputs that destabilize training in some cases [19].</p>
        <p>In this study, ELU is evaluated in contexts where nuanced control over negative inputs is
beneficial—for instance, near noisy land-water transitions. The goal is to assess whether ELU can
accelerate learning and improve segmentation metrics compared to ReLU and Leaky ReLU.</p>
      </sec>
      <sec id="sec-3-2">
        <title>PReLU</title>
        <p>Parametric ReLU (PReLU) generalizes Leaky ReLU by learning the coefficient α during training rather
than fixing it manually [26]. It is defined as:</p>
        <p>,  &gt; 0 (4)
 ( ) =  ,   ≤ 0</p>
        <p>Here, α is initialized to a small positive value and optimized alongside other network parameters
[8]. This adaptability allows the model to fine-tune the "leakiness" for each channel or neuron [14].</p>
        <p>The main advantage of PReLU is its ability to dynamically adjust the negative slope to the data
distribution, potentially improving accuracy [12]. However, it increases the number of parameters,
necessitating stronger regularization. In water segmentation, PReLU may prove useful in cases where
land-water boundaries are highly variable and require distinct sensitivity across channels [10].
depending on the input magnitude [21]. This prevents neuron inactivation and may allow for better
feature representation.</p>
        <p>
          Swish avoids abrupt changes around x = 0, resulting in smoother gradients [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It has been shown to
outperform ReLU in large-scale classification benchmarks such as ImageNet and COCO [22]. However,
it is computationally more expensive due to the exponential calculations involved.
        </p>
        <p>In this study, Swish is considered a promising alternative for scenarios where water-land
boundaries are fuzzy or ill-defined. Its effectiveness, however, remains dependent on dataset size and
training conditions [11].
[20]. It is defined as:
Swish is considered a smoother alternative to ReLU that can enhance gradient flow in deep networks
 ( ) =   ,
 ,


 ≥ 0
 &lt; 0
,

∈ [
, 
(5)
(6)
specified range during training [14]. Formally:
where αᵣis a random value. This randomness can act as a regularizer, helping the model avoid
overfitting or over-reliance on specific activation patterns [8]. However, it may also slow convergence
if the variation range is too broad [16].</p>
        <p>RReLU is potentially beneficial for heterogeneous datasets with varied conditions (e.g., seasonal
differences, diverse lighting). This study explores whether its built-in variability leads to more
generalized segmentation performance when compared to deterministic counterparts like Leaky ReLU
and PReLU.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The table below summarizes the comparative performance of all six activation functions based on four
key evaluation metrics: F1 score, Precision, Recall, and Intersection over Union (IoU). Among the tested
functions, Leaky ReLU achieved the highest F1 score (0.7386), along with the best precision (0.8395)
and overall IoU (0.5856). PReLU ranked second in terms of F1 score (0.7253), showing a balanced
performance with Precision of 0.7712 and Recall of 0.6845.</p>
      <p>While ELU reached the highest Recall (0.7286), it suffered from low precision (0.5293), which led to the
lowest overall F1 (0.6132) and IoU (0.4421) scores. ReLU and Swish produced similar mid-range
results, with ReLU slightly outperforming Swish in Recall (0.7067 vs. 0.7028).</p>
      <p>RReLU demonstrated relatively high precision (0.8261), exceeding that of PReLU, but its lower
recall (0.6273) placed its F1 score (0.7131) and IoU (0.5541) between those of baseline ReLU and the
topperforming Leaky ReLU. Thus, if the primary goal is to maximize F1 or IoU, Leaky ReLU is the most
optimal. If recall is prioritized—for instance, to reduce false negatives in water detection—ELU may be
considered, albeit at the cost of lower precision.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The results of this comparative experiment confirm that the choice of activation function has a
substantial impact on the performance of the U-Net model for binary segmentation of water bodies in
Sentinel-2 satellite imagery. Leaky ReLU demonstrated the best overall results, achieving the highest
values for F1 score and IoU while maintaining the strongest precision. PReLU followed closely, offering
a balanced trade-off between precision and recall, though it still Leaky ReLU in all major metrics.
ELU stood out by achieving the highest recall, but this came at the expense of significant precision
loss, resulting in the lowest F1 and IoU scores. The standard ReLU and Swish functions delivered
average performance with no significant advantages over the more adaptive alternatives. RReLU
offered high precision but somewhat reduced recall, placing its overall results between those of ReLU
and Leaky ReLU.</p>
      <p>In conclusion, for the segmentation of water surfaces from Sentinel-2 imagery, Leaky ReLU is the
most effective activation function, offering the best balance between accuracy, completeness, and
spatial consistency. In scenarios where maximizing recall is critical—such as minimizing omission of
water pixels—ELU may be considered, albeit with a higher risk of false positives. To further enhance
segmentation quality, future work should include expanding the training dataset with diverse
geographic regions, optimizing activation-related hyperparameters, and exploring newer functions
such as Mish or SELU, particularly in the context of multispectral Sentinel-2 data.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) usedGPT-4o in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Hnatushenko</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Honcharov</surname>
            <given-names>O</given-names>
          </string-name>
          .
          <article-title>Land Cover Mapping with Sentinel-2 Imagery Using Deep Learning Semantic Segmentation Models</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          . Vol.
          <volume>3909</volume>
          :
          <source>Proc. of the XI International Scientific Conference "Information Technology and Implementation" (IT&amp;I</source>
          <year>2024</year>
          ), Kyiv, Ukraine,
          <source>November 20-21</source>
          ,
          <year>2024</year>
          . Kyiv,
          <year>2024</year>
          . P. 1-
          <fpage>18</fpage>
          . https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3909</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Soldatenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hnatushenko</surname>
          </string-name>
          , Vik.
          <article-title>Improving Satellite Imagery Recognition Performance with Initial Dataset Limitation by Augmenting Training Data</article-title>
          . In: Babichev,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lytvynenko</surname>
          </string-name>
          , V. (eds) Lecture Notes in Data Engineering, Computational Intelligence,
          <article-title>and Decision-Making, Volume 2</article-title>
          .
          <source>ISDMCI 2024. Lecture Notes on Data Engineering and Communications Technologies</source>
          , vol
          <volume>244</volume>
          . Springer, Cham. https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -88483-2_
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hnatushenko</surname>
            ,
            <given-names>Vik.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hnatushenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soldatenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Heipke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Enhancing the quality of cnnbased burned area detection in satellite imagery through data augmentation</article-title>
          ,
          <source>Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-1/W2-2023</source>
          ,
          <fpage>1749</fpage>
          -
          <lpage>1755</lpage>
          , https://doi.org/10.5194/isprs-archives-XLVIII-1
          <string-name>
            <surname>-W2-</surname>
          </string-name>
          2023-1749-2023
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Goodfellow</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            <given-names>A. Deep</given-names>
          </string-name>
          <string-name>
            <surname>Learning</surname>
          </string-name>
          . - Cambridge, MA: MIT Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Isikdogan</surname>
            ,
            <given-names>Furkan</given-names>
          </string-name>
          &amp; Bovik, Alan &amp; Passalacqua, Paola.
          <article-title>Surface Water Mapping by Deep Learning</article-title>
          .
          <source>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
          <fpage>10</fpage>
          .1109/JSTARS.
          <year>2017</year>
          .
          <volume>2735443</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ronneberger</surname>
            ,
            <given-names>Olaf</given-names>
          </string-name>
          &amp; Fischer,
          <string-name>
            <surname>Philipp</surname>
          </string-name>
          &amp; Brox, Thomas. (). U-Net:
          <article-title>Convolutional Networks for Biomedical Image Segmentation</article-title>
          . LNCS,
          <volume>9351</volume>
          ,
          <year>2015</year>
          ,
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
          <fpage>10</fpage>
          .1007/978-3-
          <fpage>319</fpage>
          -24574-4_
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Glorot</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Deep Sparse</surname>
          </string-name>
          <article-title>Rectifier Neural Networks</article-title>
          .
          <source>Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics</source>
          , PMLR
          <volume>15</volume>
          :
          <fpage>315</fpage>
          -
          <lpage>323</lpage>
          ,
          <year>2011</year>
          . https://proceedings.mlr.press/v15/glorot11a.html
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>