<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analyzing the Similarity Learned by a Siamese Network with Contrastive Loss on the MNIST Dataset</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio A. Sánchez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Engineering and Artificial Intelligence, Instituto de Tecnologías del Conocimiento, Universidad Complutense de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>3</volume>
      <issue>2025</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The concept of similarity plays a fundamental role in many artificial intelligence techniques. The ability to automatically learn when two instances are similar or diferent is especially valuable when working with unstructured datasets such as images, audio, or text. Siamese neural networks enable the automatic learning of a distance function between pairs of instances by projecting them into a latent embedding space. In this work, we analyze the the embeddings generated and the distance learned by a Siamese network trained using contrastive loss to solve various tasks using the MNIST dataset. We also identify some potential problems and propose some ideas to improve the distance learned by the network that will be researched in future works.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Similarity</kwd>
        <kwd>Siamese Networks</kwd>
        <kwd>Contrastive Loss</kwd>
        <kwd>MNIST</kwd>
        <kwd>Embeddings</kwd>
        <kwd>Nearest Neighbors</kwd>
        <kwd>Clusters</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The concept of similarity constitutes a fundamental pillar in the development of intelligent systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Many learning techniques, both supervised and unsupervised, rely on the ability to estimate how similar
two elements are, whether to classify, cluster, recommend, or detect unusual patterns. In particular, in
Case-based Reasoning (CBR) systems, similarity plays a key role both in retrieving past experiences
relevant to the problem at hand and in adapting those experiences to the current context [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In tabular datasets where information is represented symbolically, as is common in many classical
machine learning applications, similarity between instances is typically computed by aggregating
the individual similarities between variables. These variable-level similarities are, in turn, computed
according to their nature: categorical or continuous, their scale of representation, or the variable’s role
in the specific context.</p>
      <p>Computing similarity between instances becomes significantly more challenging when dealing with
unstructured data, such as images, audio, or text. In such cases, the semantic properties are implicitly
represented in the data, making it dificult to establish a connection between the similarity of individual
features (e.g., the color of a pixel in an image) and the similarity of instances (e.g., when two images
represent the same concept).</p>
      <p>
        It is common to address this using neural networks that, through complex nonlinear transformations,
learn to project the instances into a lower-dimensional latent vector space, generating what are known
as embeddings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These embeddings can be obtained through diferent approaches. For instance,
autoencoder architectures adopt an unsupervised learning paradigm where the goal is to generate
embeddings that capture patterns suficient to reconstruct the original instance. In contrast, classifiers
aim to produce embeddings that encode discriminative features relevant for assigning the correct label
to each instance.
      </p>
      <p>
        Siamese Neural Networks (SNNs) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] consist of a pair of identical networks that share the same
weights and biases, followed by a distance measurement layer. These twin networks process two
inputs in parallel and generate embeddings that are then compared by computing a distance between
them. When trained with a contrastive loss function, SNNs aim to produce embeddings in which
instances of the same class are grouped closely together while being separated from those of other
classes. Unlike traditional classification approaches, SNNs can generalize to unseen classes during
training, making them particularly suitable for tasks such as few-shot learning, identity verification,
and duplicate detection.
      </p>
      <p>In this work, we explore the use of Convolutional Siamese Networks to automatically learn a distance
function between images in the context of a classification problem. We then analyze the intra-class and
inter-class similarity, identify prototypical examples for each class, and classify new images eficiently
based on those prototypes. We also investigate to what extend the similarity computed by the network
aligns with our intuition of similarity between images.</p>
      <p>The rest of the paper is organized as follows. Section 2 reviews the most relevant works in which this
type of network has been used in case-based reasoning systems. Section 3 presents the dataset used
in the experiments. Section 4 describes the proposed network architecture and the training process.
Section 5 analyzes the clusters created by the network in the latent embedding space and examines the
relationship between the computed distance and the misclassified images. Finally, Section 7 summarizes
the main conclusions of the work and outlines directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>The literature on Siamese Neural Networks is extensive; in this section, we focus specifically on recent
works that apply SNNs in the context of Case-Based Reasoning systems. These works include both
applications in specific domains and proposals for new architectural approaches.</p>
      <p>
        One of the earliest uses of Siamese networks in a CBR system can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which addresses
the task of recognizing physical activities from time series data generated by accelerometers worn by
participants. The results show that the performance of the SNN is comparable to that of a standard
convolutional neural network.
      </p>
      <p>
        In the domain of textual CBR systems, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] explores the combination of SNNs and autoencoders
with word embeddings, showing how this approach can enhance textual CBR systems by establishing
stronger relationships across cases and measuring similarities with minimal input from domain experts.
      </p>
      <p>
        In the context of Process-Oriented Case-Based Reasoning, where cases are typically represented
using semantic graphs to model workflows, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposes the use of Siamese Graph Neural Networks to
approximate the similarity between semantic graphs.
      </p>
      <p>
        For the domain of fault detection and prediction in industrial environments, where numerous sensors
and actuators are involved, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposes an SNN architecture that combines 2D convolutions with graph
convolutions to extract both temporal and spatial features. They also demonstrate that incorporating
expert knowledge can significantly reduce the number of learnable parameters required.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a more theoretical study is presented, where the authors propose a framework to classify
diferent similarity learning approaches based on whether the feature extraction and similarity
computation are modeled or learned. They show that using a classifier as the basis for a similarity measure
can achieve results comparable to state-of-the-art methods, and further improvements are possible by
learning the similarity function between embeddings.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the authors present a novel approach in which they train diferent class-to-class Siamese
networks to learn the patterns of both similarity and diference between pairs of classes. They then use
these patterns for classification, explanation, and prototypical case identification. Although the results
are promising, the proposed approach requires training a quadratic number of networks with respect to
the number of classes.
      </p>
      <p>
        Finally, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduces a novel architecture that incorporates self-attention mechanisms into the
traditional Siamese network structure. By weighting the features of the embeddings before computing
the distance, the proposed method improves classification accuracy across several datasets from the
UCI repository.
      </p>
      <p>As we can observe, Siamese networks have been successfully used in various CBR systems to retrieve
similar cases or propose prototypical cases across diferent domains. However, we believe that a deeper
investigation is needed into the type of distance these networks learn and its characteristics, in order to
better understand how they can be efectively used in this context. In particular, in this work we analyze
the clusters formed by the cases, how prototypical cases can be used to eficiently retrieve similar ones,
and how the distance computed by the network aligns with our intuitive notion of similarity between
cases.</p>
    </sec>
    <sec id="sec-3">
      <title>3. MNIST dataset</title>
      <p>
        The MNIST dataset [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is a very popular benchmark used in machine learning to evaluate image
classification algorithms. It consists of 70,000 grayscale images of handwritten digits ranging from 0
to 9, each normalized and centered in a 28x28 pixel grid. The dataset is divided into 60,000 training
samples and 10,000 test samples and each image is labeled with the corresponding digit class. Figure 2
shows some example images.
      </p>
      <p>Despite its apparent simplicity, achieving near-perfect accuracy on MNIST was once considered a
significant milestone, and the dataset continues to be a valuable resource for educational and
benchmarking purposes. The images of the digits written by diferent persons, introduces considerable
intra-class variability in terms of handwriting style, stroke thickness, orientation, and alignment. This
variability makes the dataset particularly useful for evaluating a model’s generalization capacity and
robustness to variations in input appearance.</p>
      <p>Another reason for choosing this dataset is that it is relatively easy for a human to judge whether the
handwriting of two digits looks more or less similar, and we aim to compare this notion of similarity
to the distance learned by a Siamese neural network. Finally, although the dataset contains many
examples, they are small enough to allow for rapid experimentation.</p>
    </sec>
    <sec id="sec-4">
      <title>4. The Convolutional Siamese Network</title>
      <p>
        A typical Siamese neural network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] consists of two or more identical subnetworks that share the same
weights and map the input instances to fixed-length vectors (embeddings) in a latent space. Then,
the last part of the network compares those vectors using a distance function, such as the Euclidean
distance or the cosine similarity. The normalized distance between the instances is the output of the
network, that can also be fed into additional layers for classification or regression tasks.
      </p>
      <p>
        Figure 1 shows the network architecture used in this work, which is inspired by the network proposed
in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The network receives two input images of size 28×28 pixels with a single color channel, denoted
as 1 and 2, and processes them through a shared encoder  (· ). This encoder module produces two
embeddings,  (1) and  (2), each represented as a 10-dimensional vector. The network then computes
the Euclidean distance between the vectors and normalizes it using a sigmoid function:
( (1),  (2)) = ‖ (1) −  (2)‖2
 () =
      </p>
      <p>1
1 + − 
 (1, 2) =  ( (1),  (2))</p>
      <p>
        The network is trained on pairs of instances labeled as similar (same digit) or dissimilar (diferent
digits), with the objective of minimizing the embedding distance for similar pairs and maximizing it (up
to a margin) for dissimilar ones. We use the contrastive loss [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], defined as:
      </p>
      <p>= (1 − )2 +  max(0,  − )2
where  indicates similarity (0 for similar, 1 for dissimilar),  is the distance between embeddings,
and  is a margin that enforces a minimum separation between dissimilar pairs ( = 1).</p>
      <p>This way, the Siamese network learns a mapping from inputs to an embedding space, where similar
instances are close together and dissimilar ones are far apart according to a chosen distance metric.</p>
      <p>The network was created using Keras over JAX and, although the architecture has several layers,
most of them are convolutions, so the resulting network is very small and only has 7,470 trainable
parameters. It was trained on 200,000 random pairs of images using the RMSprop optimizer for 5 epochs
and 10,000 random pairs of images for validation. The complete training process took less than 3
minutes on a nvidia RTX 4070 with 12 GB. We obtained an accuracy score of 0.9876 on the test dataset,
which means it can determine whether two images represent the same digit 98.76% of the time.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Clusters and similarity</title>
      <p>Once the network has been trained, we can study the clustering of images by category performed
by the network’s encoder module. For example, we can compute the embedding corresponding to a
prototypical image of each class by calculating the average embedding of the images belonging to each
class in the training set.
where  is the number of images belonging to class .</p>
      <p>It is interesting to note that these prototypical embeddings do not correspond to any image from the
training set, and we cannot reconstruct their associated images because the network does not have a
decoder module. However, we can compute the closest images to each of them, which are shown in
the first column of Figure 2. We can observe that, they correspond to clearly distinguishable digits,
although they do not always represent the most standard handwriting of each digit.</p>
      <p>Although we cannot exactly visualize the images associated with these centroids, we can use them
to compute distances using the network’s distance module. In this way, Figure 3 shows the distances
between the centroids of each cluster. The minimum distance between two centroids from diferent
classes is 0.998, which indicates that the network has successfully separated the diferent classes.
However, the distance between centroids does not seem to be related to the notion of similarity between
digits. For example, intuitively, we would say that the handwriting of a 1 is closer to a 7 than to a 5, but
this is not reflected in the learned distance.</p>
      <p>On the other hand, Table 1 shows the average distances of each image to the centroid of its class. We
can observe that the digit 3 exhibits the highest variability, and that the values in both the training and
test sets are similar. Based on this, we conclude that the clusters are fairly compact and well separated
from each other. We can confirm our intuition in Figure 4, which shows a projection of the embeddings
onto the plane using the TSNE algorithm.</p>
      <p>Finally, Figure 2 also shows diferent training images from each cluster, ordered by their distance
to the centroid. The first column displays the image closest to the centroid (the most prototypical
image of the class), and the following five columns show the images in each of the quintiles. In general,
the distance to the centroid does not appear to have a clear relationship with the notion of diference
between handwriting variations within the same digit.</p>
      <p>We can conclude that the internal representation of the images learned by the network efectively
clusters the digits. However, the resulting image-to-image distance does not appear to support an
intuitive ordering based on the visual similarity of the handwritten digit shapes—neither within the
same class nor across diferent classes.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Classification based on the nearest centroid</title>
      <p>One of the common uses of Siamese networks is to retrieve the nearest neighbors to classify new
instances. This approach has a computational cost that is linear with respect to the number of instances
in the training set. However, since the clusters generated by the network are compact and well separated,
we can also compute distances to their centroids, thus reducing the classification complexity to be linear
with respect to the number of classes (which can be considered constant in practice).</p>
      <p>Table 2 shows the results of this approach broken down by class. The classifier’s accuracy is 98.20%,
which is quite good considering that we are using a very small neural network and only computing
distances to 10 points each time. Precision and recall scores are very high for all digits, although
classifying fives, sevens, and nines appears to be slightly more challenging.</p>
      <p>Figure 5 shows the distribution of distances from each test image to its nearest centroid. We observe
that nearly all correct classifications occur when the distance to the centroid is below 0.1. In contrast,
incorrect classifications are more uniformly distributed across the full range of distances, which is a
somewhat surprising finding.</p>
      <p>We can delve a bit deeper by visualizing some of the misclassified images. Figure 6 shows the
incorrectly classified images that are farthest from any of the centroids. These images correspond to
embeddings with no close neighbors, and therefore the classifier does not know how to label them. We
can observe that the images on the left correspond to poorly defined digits, and it is possible that there
are not many similar examples in the training set. More surprising are the images on the right, where
the digits are perfectly recognizable, yet the distance computed by the network suggests they are not
close to the prototypical digits. In any case, these situations are not too problematic, as the system
could choose to indicate uncertainty rather than provide an incorrect answer.</p>
      <p>Figure 7 illustrates the opposite situation, where the neural network indicates that the images are
very similar to one of the prototypical digits, which is, however, not the correct one. Some of these
images may cause confusion, as their handwriting appears to be somewhere between multiple digits.
Nevertheless, a person would still be able to correctly recognize the digits in most cases and some of
them are in fact quite recognizable.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and Future Work</title>
      <p>Siamese networks appear to be highly efective at determining whether two images are similar (i.e.,
belong to the same class) or not (i.e., belong to diferent classes) in the MNIST dataset. Training through
the optimization of the discriminative loss function results in very compact embedding clusters that are
well separated from each other. This makes Siamese networks a promising approach for classification
Class</p>
      <p>Precision</p>
      <p>Recall F1-score</p>
      <p>Support
problems that rely on nearest-neighbor retrieval. In fact, we can achieve fairly good results by simply
computing the distance to the cluster centroids, which significantly reduces the computational cost of
such algorithms.</p>
      <p>However, the distance computed by the network does not seem to exhibit other desirable properties
of a similarity metric. In particular, it does not appear to efectively reflect the degree of similarity
between two instances, at least in the problem we have addressed in this work. Images from the same
class all are very similar to one another, and visually there does not seem to be a clear relationship
between handwriting style and the distance computed by the network. A similar situation occurs with
images from diferent classes, where the network simply indicates that they are very diferent, without
efectively capturing that some digits are more visually like others.</p>
      <p>Although the distance learned by the network can be useful in tasks where it is only necessary
to determine whether two entities are similar or diferent, such as binary classification or identity
verification, it may not be suitable for problems that require generating a similarity-based ranking.</p>
      <p>For instance, in CBR systems, it is common to retrieve the most similar cases and then assess their
contribution to the proposed solution based on their distance to the query. It is also standard practice
to apply a threshold beyond which retrieved cases are no longer considered relevant. Both practices
become more dificult when the learned distances are either too homogeneous (i.e., all cases are very
similar or very dissimilar) or do not follow a smooth, interpretable distribution.</p>
      <p>Furthermore, in recommender systems, where a ranked list of relevant items is typically provided,
the ordering generated using this type of learned similarity may lack meaningful diferentiation. The
same challenge arises in distance-based explainability methods, where examples and counterexamples
may not appear intuitive to users simply because the distance learned by the network does not reflect
the types of features or relationships users expect to define similarity.</p>
      <p>How could we encourage the network to generate better embeddings, where the relative distance
captures more than just class membership? One possible approach is to incorporate into the loss function
a component that does not rely on labels, but rather on the intrinsic characteristics of the images. In
this regard, a decoder module could be added to attempt to reconstruct the original image, thereby
forcing the network to encode the most significant visual patterns into the generated embeddings.</p>
      <p>On the other hand, the network seems to make classification errors fairly uniformly across the entire
range of distances, which is somewhat surprising. This could be due to the test set containing images
that are very diferent from those in the training set, or it may suggest that the network is not able
to adequately capture the patterns that distinguish diferent digits. By selecting training pairs more
carefully, we might overrepresent the harder-to-classify images, thus encouraging the network to better
learn their diferences from other classes.</p>
      <p>As part of future work, we plan to address these issues and analyze the behavior of these networks
across diferent datasets. The ability to automatically learn a similarity measure between images is very
promising, but we must develop mechanisms that align the learned similarity with our human notion
of similarity.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work has been partially funded by the Ministry of Science, Innovation and Universities
(PID2021123368OB-I00) and the Complutense University of Madrid (Group 921330).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>The author did not use any generative AI during the preparation of this work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Rissland</surname>
          </string-name>
          ,
          <article-title>AI and similarity</article-title>
          ,
          <source>IEEE Intell. Syst</source>
          .
          <volume>21</volume>
          (
          <year>2006</year>
          )
          <fpage>39</fpage>
          -
          <lpage>49</lpage>
          . URL: https://doi.org/10.1109/ MIS.
          <year>2006</year>
          .
          <volume>38</volume>
          . doi:
          <volume>10</volume>
          .1109/MIS.
          <year>2006</year>
          .
          <volume>38</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>R. L. de Mántaras</surname>
          </string-name>
          , E. Plaza,
          <article-title>Case-based reasoning: An overview</article-title>
          ,
          <source>AI Commun</source>
          .
          <volume>10</volume>
          (
          <year>1997</year>
          )
          <fpage>21</fpage>
          -
          <lpage>29</lpage>
          . URL: http://content.iospress.com/articles/ai-communications/
          <year>aic106</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          , Deep Learning, MIT Press,
          <year>2016</year>
          . http://www. deeplearningbook.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chicco</surname>
          </string-name>
          ,
          <article-title>Siamese neural networks: An overview</article-title>
          , in: H. M.
          <string-name>
            <surname>Cartwright</surname>
          </string-name>
          (Ed.),
          <source>Artificial Neural Networks - Third Edition</source>
          , volume
          <volume>2190</volume>
          of Methods in Molecular Biology, Springer,
          <year>2021</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>94</lpage>
          . URL: https://doi.org/10.1007/978-1-
          <fpage>0716</fpage>
          -0826-
          <issue>5</issue>
          _3. doi:
          <volume>10</volume>
          .1007/978-1-
          <fpage>0716</fpage>
          -0826-5\_3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiratunga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Massie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clos</surname>
          </string-name>
          ,
          <article-title>A convolutional siamese network for developing similarity knowledge in the selfback dataset</article-title>
          , in: A. A.
          <string-name>
            <surname>Sánchez-Ruiz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kofod-Petersen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of ICCBR</source>
          <year>2017</year>
          <article-title>Workshops (CAW, CBRDL, PO-CBR), Doctoral Consortium, and Competitions co-located with the 25th</article-title>
          <source>International Conference on Case-Based Reasoning (ICCBR</source>
          <year>2017</year>
          ), Trondheim, Norway, June 26-28,
          <year>2017</year>
          , volume
          <volume>2028</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEURWS.org,
          <year>2017</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-2028/paper8.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <article-title>Cases without borders: Automating knowledge acquisition approach using deep autoencoders and siamese networks in case-based reasoning</article-title>
          ,
          <source>in: 31st IEEE International Conference on Tools with Artificial Intelligence</source>
          ,
          <source>ICTAI</source>
          <year>2019</year>
          ,
          <article-title>Portland</article-title>
          ,
          <string-name>
            <surname>OR</surname>
          </string-name>
          , USA, November 4-
          <issue>6</issue>
          ,
          <year>2019</year>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>140</lpage>
          . URL: https://doi.org/10.1109/ICTAI.
          <year>2019</year>
          .
          <volume>00027</volume>
          . doi:
          <volume>10</volume>
          .1109/ICTAI.
          <year>2019</year>
          .
          <volume>00027</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Malburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Using siamese graph neural networks for similarity-based retrieval in process-oriented case-based reasoning</article-title>
          , in: I. Watson,
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Weber</surname>
          </string-name>
          (Eds.),
          <source>Case-Based Reasoning Research and Development - 28th International Conference, ICCBR</source>
          <year>2020</year>
          , Salamanca, Spain, June 8-12,
          <year>2020</year>
          , Proceedings, volume
          <volume>12311</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>244</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -58342-2_
          <fpage>15</fpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>030</fpage>
          -58342-2\_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Weingarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Using expert knowledge for masking irrelevant data streams in siamese networks for the detection and prediction of faults</article-title>
          ,
          <source>in: International Joint Conference on Neural Networks, IJCNN</source>
          <year>2021</year>
          , Shenzhen, China,
          <source>July 18-22</source>
          ,
          <year>2021</year>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://doi.org/10.1109/IJCNN52387.
          <year>2021</year>
          .
          <volume>9533544</volume>
          . doi:
          <volume>10</volume>
          .1109/IJCNN52387.
          <year>2021</year>
          .
          <volume>9533544</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Mathisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aamodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Langseth</surname>
          </string-name>
          ,
          <article-title>Learning similarity measures from data</article-title>
          ,
          <source>Prog. Artif. Intell</source>
          .
          <volume>9</volume>
          (
          <year>2020</year>
          )
          <fpage>129</fpage>
          -
          <lpage>143</lpage>
          . URL: https://doi.org/10.1007/s13748-019-00201-2. doi:
          <volume>10</volume>
          .1007/ S13748-019-00201-2.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Huibregtse</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Dalkilic</surname>
          </string-name>
          ,
          <article-title>Applying class-to-class siamese networks to explain classifications with supportive and contrastive cases</article-title>
          , in: I. Watson,
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Weber</surname>
          </string-name>
          (Eds.),
          <source>Case-Based Reasoning Research and Development - 28th International Conference, ICCBR</source>
          <year>2020</year>
          , Salamanca, Spain, June 8-12,
          <year>2020</year>
          , Proceedings, volume
          <volume>12311</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>245</fpage>
          -
          <lpage>260</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -58342-2_
          <fpage>16</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>030</fpage>
          -58342-2\_
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>A case weighted similarity deep measurement method based on a self-attention siamese neural network</article-title>
          ,
          <source>Ind. Artif. Intell</source>
          .
          <volume>1</volume>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1007/s44244-022-00002-y. doi:
          <volume>10</volume>
          .1007/S44244-022-00002-Y.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>The mnist database of handwritten digit images for machine learning research</article-title>
          ,
          <source>IEEE Signal Processing Magazine</source>
          <volume>29</volume>
          (
          <year>2012</year>
          )
          <fpage>141</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Mehdi</surname>
          </string-name>
          ,
          <article-title>Image similarity estimation using a siamese network with a contrastive loss</article-title>
          , https://keras. io/examples/vision/siamese_contrastive/,
          <year>2021</year>
          . Accessed:
          <article-title>(</article-title>
          <year>2025</year>
          -04-21).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>LeCun, Learning a similarity metric discriminatively, with application to face verification</article-title>
          ,
          <source>in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR</source>
          <year>2005</year>
          ),
          <fpage>20</fpage>
          -
          <lpage>26</lpage>
          June 2005, San Diego, CA, USA, IEEE Computer Society,
          <year>2005</year>
          , pp.
          <fpage>539</fpage>
          -
          <lpage>546</lpage>
          . URL: https://doi.org/10.1109/CVPR.
          <year>2005</year>
          .
          <volume>202</volume>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2005</year>
          .
          <volume>202</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>