<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Some Perspectives on Similarity Learning for Case-Based Reasoning and Analogical Transfer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fadi Badra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie-Jeanne Lesot</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esteban Marquer</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Couceiro</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sorbonne Université</institution>
          ,
          <addr-line>CNRS, LIP6, F-75005 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, Sorbonne Université, INSERM</institution>
          ,
          <addr-line>F-93000, Bobigny</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Lorraine</institution>
          ,
          <addr-line>CNRS, Loria, Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>16</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>In this paper we investigate interactions between recent advances in the modeling of analogical transfer and similarity learning. Indeed, a unifying principle of case-based prediction methods was recently established, according to which the plausible inference principle of analogical transfer can be interpreted as a transfer of similarity knowledge from a situation space to an outcome space. Following this principle, the task of analogical transfer can be addressed using a global indicator of the compatibility between two similarity measures. Such an indicator can also be used to assess the quality of the situation space similarity measure with respect to the case-based prediction task. We discuss several perspectives opened by such an interpretation of the task of analogical transfer as the optimisation of the compatibility criterion: we explore interactions with similarity learning, as well as with energy function optimisation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Case-Based Reasoning</kwd>
        <kwd>Analogical transfer</kwd>
        <kwd>Similarity learning</kwd>
        <kwd>Quality measure</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Analogical transfer is a cognitive process that allows to derive some new information about a
target situation by applying a plausible inference principle, according to which if two situations
are similar with respect to some criteria, then it is plausible that they are also similar with
respect to other criteria [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Case-based reasoning (CBR) systems implement analogical transfer
in order to infer some information about a new situation directly by comparing it to a set of
past experiences (called cases) stored in memory [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In that process, similarity knowledge is
a critical component and is dependent on the task and data considered. For instance, several
approaches have been proposed to measure similarities between data represented as Boolean
vectors and between sequences in the context of analogical reasoning, as described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Recent work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] showed that a common principle underlying case-based prediction methods
is that they interpret the plausible inference principle of analogical transfer as a transfer of
similarity knowledge from a situation space to an outcome space. This idea of modeling
analogical transfer as a transfer of similarity knowledge is a powerful idea, that can have many
implications. One of them is that learning a similarity measure can be framed as the problem of
optimizing the compatibility between two similarity measures on a data set.
      </p>
      <p>
        In this paper, we discuss some perspectives and directions that could be given to this line of
research. A global indicator of the compatibility between two similarity measures has already
been proposed in the CoAT method [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and preliminary experiments showed that such an
indicator can be used as an intrinsic indicator of the quality of the similarity measure with
respect to the case-based prediction task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A natural perspective to this research is to apply
these results to similarity learning, and to design a similarity learning method that would
optimise such an indicator on the data set. To this aim, we explore in this paper the connections
between the CoAT method and existing work in the domain of similarity learning. We then
show that interpreting CoAT in an energy-based model is quite straightforward, so that the
similarity learning task can be stated as the task of learning an energy function.
      </p>
      <p>The paper is organized as follows. In Section 2 we recall the previous work on the CoAT
method. We then briefly survey in Section 3 some approaches to learning (dis)similarities
that seem relevant to CoAT, and discuss to how to leverage CoAT to obtain suitable similarity
measures. We also explore techniques based on the optimisation of energy function that we
propose in Section 4 and discuss further perspectives in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The CoAT Method</title>
      <p>
        In the CoAT method [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ], the analogical transfer inference is made by minimizing a global
indicator of compatibility between two similarity measures. Such an indicator can also be used
as an intrinsic indicator of the quality of the similarity measure w.r.t. the transfer task.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Definition of the Indicator</title>
        <p>Let  denote an input space, and ℛ an output space. An element of  is called a situation,
and an element of ℛ is called an outcome, or a result. A set  = {( 1,  1), . . . , (  ,   )} of
elements in  × ℛ is called a case base. An element  = (,  ) ∈  is called a source case.
In addition, the spaces  and ℛ are respectively equipped with two similarity measures  
and  ℛ , that respectively denote the similarity measure on situations and on outcomes.</p>
        <p>The compatibility of  ℛ with   is measured globally on the case base  , by introducing a
global indicator Γ(  ,  ℛ ,  ). This indicator measures the compatibility of  ℛ with   from
an ordinal point of view on the whole case base  , by checking if the order induced by  ℛ is
the same as the one induced by   . The following continuity constraint is tested on each triple
of cases ( 0,   ,   ), with  0 = ( 0,  0),   = (  ,   ), and   = (  ,   ):
if   ( 0,   ) ≥   ( 0,   ), then  ℛ ( 0,   ) ≥  ℛ ( 0,   ).
( )</p>
        <p>Constraint ( ) expresses that anytime a situation   is more similar to a situation  0 than
situation   , this order should be preserved on outcomes. A triple ( 0,   ,   ) does not satisfy ( ) if the
but less similar for outcomes, i.e., when   ( 0,   ) ≥ 
counts the total number of inversions of similarity observed on a case base 
Such a violation of the constraint is called an inversion of similarity. The indicator Γ(  ,  ℛ , 
:
ℛ ( 0,   ).</p>
        <p>)
Γ(  ,  ℛ , 
) = |{(( 0,  0), (  ,   ), (  ,   )) ∈ 
× 
×</p>
        <p>such that</p>
        <p>( 0,   ) ≥   ( 0,   ) and  ℛ ( 0,   ) &lt;  ℛ ( 0,   )}|.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Inference</title>
        <p>When the case base is fully known, except for the outcome   of one case   = (  ,   ), the
transfer inference consists in finding the outcome   that minimizes the value of the indicator:
 ∈ℛ
  = arg min Γ(  ,  ℛ , 
∪ {(  ,  )}).
2.3. An Intrinsic Indicator of the Quality of a Similarity Measure
measure  
The indicator Γ(  ,  ℛ ,</p>
        <p>
          with respect to the transfer task, independently of the algorithm used for the
inference. We report here some first experiments made in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] that show a strong correlation
and the corresponding performance of the CoAT prediction algorithm.
between the value of the Γ(  ,  ℛ ,
        </p>
        <p>) indicator obtained for a chosen similarity measure  
) can be used to assess the quality of the situation space similarity
Experimental Protocol. The experiment is conducted on 200 instances extracted from the
Balance Scale data set1. As the instances of these data sets are described only by  numeric
features, each situation can be represented by a vector of R . Let x, y ∈ R
vectors. These data induce a classification task: the outcomes are categorical classes and the
 be two such
outcome similarity measure  ℛ is the class membership, i.e. 
ℛ (,  ) = 1 if 
otherwise. The performance of the CoAT algorithm is measured by generating 100 diferent
=  , and 0
classification tasks
{(  ,  ℛ ,</p>
        <p>)}1≤ ≤100, each of which is obtained by choosing for   a
decreasing function of a randomly weighted Euclidean distance. More precisely, a set of random
decreasing function of the Euclidean distance computed in the   ’s embedding space:
linear maps {  : R</p>
        <p>−→ R }1≤ ≤100 are generated, and for each map   ,   is defined as a
  (x, y) =  −  (x,y) with   (x, y) = ‖  x −   y‖2 =
(x − y)     (x − y).</p>
        <p>︁√
The
performance is also
measured
on the
task
  (x, y) =  −‖x−y‖2 is a decreasing function of the Euclidean distance, which amounts to
taking as linear map the identity matrix. For each task, the performance is measured by the
(  ,  ℛ , 
), in
which
prediction accuracy, with 10-fold cross validation.</p>
        <p>Results. Fig. 1 shows for each classification task the average accuracy and standard deviation
of the CoAT algorithm according to the value of the Γ indicator ("Dataset complexity" axis
on the figure). The blue points correspond to the randomly generated   similarity measures.
The red point gives the results for the   similarity measure based on the standard Euclidean
distance. The green line shows the result of a linear regression on the data. The Pearson’s
coeficient is −0.97. The results clearly show a correlation between the value of the indicator
and the performance of the CoAT algorithm.
3. Perspectives on Learning (Dis)similarity Measures
While it is possible to use CoAT to quantify the suitability of a similarity measure for a CBR
task, we argue that it should be possible to adapt CoAT to learn suitable similarity measures.
Below we describe some existing methods to learning similarity (or dissimilarity) that appear
relevant to adapt CoAT, before discussing how optimizing the indicator of CoAT relates to these
(dis)similarity measure learning methodologies.</p>
        <p>
          In what follows, we will not make distinction between similarity and dissimilarity measures
since they are the counterpart of one another. It is possible to define one from the other, for
instance, given a dissimilarity  (,  ) defined on R+ we define the similarity  (,  ) on [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
1
with the inverse  (,  ) = 1+ (, ) or the exponential  (,  ) =  − (, ).
3.1. Related Works on (Dis)similarity Measure Learning
Constructing a similarity measure for a given task is dificult and time-consuming, especially
if domain knowledge is to be taken into account into the process. It is possible to use data to
support and facilitate this process, either to guide the design of the measure [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] or to learn
suitable parameters for a similarity measure.
        </p>
        <p>
          Designing or learning (dis)similarity measures from data has long been studied [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Here, we
briefly discuss three approaches, namely, by combining local similarities, by unsupervised
approaches based on clustering techniques, and by supervised or semi-supervised metric learning
approaches. Note that there is a particular focus in CBR on the explainability of the similarity
measures as well as on using complex data (i.e., heterogeneous or structured), which constrains
the learning of (dis)similarity measures.
        </p>
        <p>
          Combining local (dis)similarities. Computing (dis)similarities in heterogeneous data can be
performed by transforming the input dataset into a homogeneous one. An interesting approach
is to consider the overall similarity measure as a weighted sum of ad-hoc measures. For instance,
the k-Prototypes algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] computes a dissimilarity  (,  ) between two instances  and 
as
 (,  ) =   (,  ) +   (,  ),
(1)
where   (,  ) is the Euclidean distance for a subset of continuous attributes,   (,  ) the
number of mismatched categorical attributes, and where  a weighting parameter. Gower’s
similarity [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a popular measure that works in a similar fashion.
        </p>
        <p>
          More generally, it is possible to rely on existing similarity measures for each aspect of the
data, and combine them to obtain a global similarity. For instance, [
          <xref ref-type="bibr" rid="ref12 ref13 ref8">8, 12, 13</xref>
          ] learn the weights
of linear combinations of local similarity functions for CBR tasks. Another example is [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], in
which a set of local similarities estimated by artificial neural networks are aggregated. Note
that the above mentioned weights can be thought as the importance that each local measure
has, and thus used for explanation and fairness purposes [
          <xref ref-type="bibr" rid="ref14 ref15 ref16 ref17 ref18 ref19">14, 15, 16, 17, 18, 19</xref>
          ].
        </p>
        <p>The main drawback of combining local dissimilarities is that it requires additional
preprocessing and learning as well as supervision.</p>
        <p>
          Unsupervised learning of (dis)similarities. Shi and Horvath [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] proposed a method to
compute dissimilarities between instances in unsupervised settings using Random Forest (RF).
RF [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] is a popular algorithm for supervised learning tasks, and is widely used in many applied
ifelds, e.g., in biology [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and in image recognition [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Essentially, it is an ensemble method
that combines decision trees in order to obtain better classification results in supervised learning
on high-dimensional data.
        </p>
        <p>
          The algorithm begins by creating several new training sets, each one being a bootstrap sample
of elements from the initial data set  . A decision tree is built on each training set, using
a random sample of   features at each split. The prediction task is then performed by a
majority vote or by averaging the results of the decision trees, according to the problem at hand
(classification or regression). This approach leads to better accuracy and generalization capacity
of the model compared to single decision trees, while reducing the variance [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. However, this
ensemble approach requires labelled data.
        </p>
        <p>
          The adaptation of RF to unsupervised settings was made possible by the generation of
synthetic instances, that enable a binary classification between the latter and the observed
(unlabelled) instances. The use of Unsupervised Random Forest (URF) for measuring (dis)similarity
presents several advantages. For instance, instances described by mixed types of variables as
well as missing values can be handled. In fact, this method has been successfully used in many
applications [
          <xref ref-type="bibr" rid="ref25 ref26 ref27 ref28">25, 26, 27, 28</xref>
          ].
        </p>
        <p>Albeit its appealing character, the method sufers from two main drawbacks. Firstly, the
generation step is not computationally eficient: since the obtained trees highly depend on the
generated instances, it is necessary to construct many forests with diferent synthetic instances
and average their results, leading to a computational burden. Secondly, the synthetic instances
may bias the model being constructed to discriminate instances on specific features.</p>
        <p>
          More recently, Ting et al. [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] proposed a similar approach to compute a mass-based
dissimilarity between instances, based on isolation forests [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. While their approach is similar, it
difers on some key points, such as the fact that self-similarities are not constant in mass-based
dissimilarity, since it they depend on the distribution of the data. This property is interesting
and may lead to good results in cases where clusters are of varying density. However, this
method does not apply to heterogeneous data.
        </p>
        <p>
          Following the tracks of [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] proposed a method, called Unsupervised Extremely
Randomised Trees (UET), to compute similarities on unlabelled data. The main idea is to
randomly split the data in an iterative fashion until a stopping criterion is met, and to compute
a similarity based on the co-occurrence of instances in the leaves of each generated tree.
It was shown to provide tailor made multidimensional similarity measures for complex and
heterogeneous data [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] and to be easily adaptable to structured data such as labelled graphs [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
The empirical study of UET showed that it outperforms existing methods (such as URF) in terms
of computational time, while giving better cluster results and, consequently, more relevant
similarities. Moreover, it has interesting invariance properties such as such as invariance under
monotonic transformations of variables and robustness to correlated variables and noise, that
drastically reduces preprocessing.
        </p>
        <p>Despite of producing tailor made measures for data at hand, the main drawback of UET is
that it computes similarities on each space (the situation and outcome) without establishing
links between the two.</p>
        <p>
          Metric learning. Learning dissimilarity measures from data has been tackled in the field
of metric learning (for an extended introduction, see [
          <xref ref-type="bibr" rid="ref35 ref36">35, 36</xref>
          ]) by learning the parameters of
parametric distance functions   , following either relative (ordinal) constraints or link/cannot
link (similarity/dissimilarity) constraints. Metric learning techniques have been used for
representation learning: combining a parametric representation model with a simple non-parametric
distance function (typically the Euclidean distance) allows to learn a representation model
suitable to preserve the relative or link/cannot link constraints. These constraints are usually
implemented by minimizing the triplet loss or the contrastive loss as follows.
        </p>
        <p>
          On the one hand, contrastive loss [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] is used to enforce link/cannot link constraints on
training pairs   ,   associated with labels   ,   . If   ,   , associated with labels   ,   , is a pair
of similar elements (  ≈   ), then we want to minimize   (  ,   ), and we want to maximize
the latter if the pair is not similar (  ̸=   ). The contrastive loss is defined as
 (  ,   ) =  ℛ (  ,   )  (  ,   ) − (1 −  ℛ (  ,   ))  (  ,   ),
        </p>
        <p>
          On the other hand, triplet loss [
          <xref ref-type="bibr" rid="ref38 ref39">38, 39</xref>
          ] methods use training triplets  0,   ,   associated with
labels  0,   ,   . that are selected such that  0 (called the anchor as in CoAT) is closer to   than
  . For such triplets, it is desired that   ( 0,   ) &lt;   ( 0,   ) which translates into the triplet
loss
        </p>
        <p>( 0,   ,   ) = max(  ( 0,   ) −   ( 0,   ) + , 0)
where the margin  is used to enforce a gap between the clusters of situations.</p>
        <p>To implement relative constraints with triplet loss, it is enough to have  0,   ,   verify an
ordinal relation of the form  0 ≤   &lt;   or  0 ≥   &gt;   . In a classification setting, the labels
are classes that do not necessarily have an order defined, so the link/cannot link constraint
where</p>
        <p>ℛ is the class membership similarity measure mentioned above.
 0 =   ̸=   is used instead. This latter constraint corresponds to  ℛ ( 0,   ) &lt; 
ℛ ( 0,   ),</p>
        <p>
          Note that while metric learning was initially designed to use class labels, making it a
supervised methodology, semi-supervised and unsupervised variants have been also proposed [
          <xref ref-type="bibr" rid="ref40 ref41">40, 41</xref>
          ].
3.2. Links Between CoAT and Metric Learning Approaches
The Γ indicator defined in Section 2.1 measures how suitable a similarity measure is for a
particular CBR task. As such, it could be used to identify or, following metric learning methodology,
to learn a similarity measure or a suitable representation space. To help make such a parallel,
we propose to leverage striking similarities between CoAT and triplet loss.
        </p>
        <p>Indeed, as in triplet loss methods, the CoAT
method considers similarity
judgements that are data triplets of the form
{( 0,   ,   ) | 
ℛ ( 0,   )
&lt;

ℛ ( 0,   )},
but then counts the number of triplets violating the constraint ( ), i.e., such that

 ( 0,   ) ≥</p>
        <p>ℛ ( 0,   ). In triplet loss terminology, this
corresponds to counting the number of hard negatives among all possible triplets formed with
instances of the data set. Semi-hard negatives (i.e., triplets such that   ( 0,   ) + 
for some margin  ) are excluded from this procedure. Therefore, when applied to classification
≥ 
 ( 0,   )
settings, the contribution of a triplet to the CoAT indicator Γ can be seen as a simplified version
of the loss  ( 0,   ,   ) used in triplet loss methods, that would take value 1 if the triplet is a
hard negative, and 0 otherwise.</p>
        <p>
          However, the idea of the CoAT method is to sum up these contributions on all possible triplets
of a case base. Although in our first experiments, the case base consisted in the whole data set,
a more case-based approach would require crafting a (preferably small but informative) case
base for the task before attempting to learn a similarity measure. Moreover, one contribution
of the work done on the CoAT method has been to show that the prediction for a new case
depends only on the new similarity relations that result from the addition of the new case to
the case base [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This suggests that learning should be done by carefully selecting a case base
from whole data set, and training for a test case (,  ) by minimizing
ΔΓ(, , 
 ,  ℛ , 
) = Γ(  ,  ℛ , 
∪ {(,  )}) − Γ(  ,  ℛ , 
).
        </p>
        <p>
          This could lead to giving additional theoretical justification of triplet loss methods and give new
insights on how to solve the sampling issue (i.e., which training triplets to select).
4. Perspectives on Learning an Energy Function
This section discusses another perspective opened by the analogical inference interpretation
as the optimisation of the proposed Γ indicator, as established in Section 2.2. Indeed, this
view allows to exploit the formalism of energy-based models proposed for machine learning
tasks by [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] reminded below. As detailed in the following, the interpretation of CoAT in an
energy-based model is quite straightforward: the global indicator Γ of the CoAT approach can
be seen as an energy function, that measures the compatibility between two similarity measures
        </p>
        <p>. In this perspective, CoAT’s transfer strategy is an energy-based
inference, that consists in completing the description of the case base in order to minimize its
energy, and learning the energy function (and hence, the similarity measure) could be achieved
by optimizing a contrastive loss function.</p>
        <p>Energy-Based Models. Inspired from statistical physics, energy-based models specify a
probability distribution
 ( ;  ) =</p>
        <p>
          −  ( )/
︀∫  −  ( )/ 
directly via a parameterized scalar-valued function   ( ) called an energy function. In machine
learning, energy-based models are trained to be optimized on the data manifold: the energy
function is learned to give low values to training data, and higher values to data points that are
far from the data manifold [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ]. In its conditional version, the definition of an energy function
  :
 ×  −→
        </p>
        <p>R assumes the existence of an input space  , an output space  , and a set of
parameters  . The energy function   associates to each pair (,  ) ∈  × 
  (,  ) that represents the compatibility between the input  and the output  under the set
of parameters  . The energy function   takes low values when  is compatible with  , and
higher values when  and  are less compatible. The goal of the energy-based inference is to find,
a scalar value
among a set of outputs  , the output  * ∈  that minimizes the value of the energy function:
 ∈
 * = arg min   (,  ).</p>
        <p>
          Given a family of energy functions   (,  ) indexed by a set of parameters  , the goal of learning
is to optimize the  parameters in order to “push down” (i.e., assign lower energy values to) the
points on the energy surface that are around the training samples, and to “pull up” all other
points. Contrastive divergence [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] is a common learning strategy that consists in optimizing a
contrastive loss function such as the hinge loss, which is defined, for a training sample (  ,   )
and a generated out of distribution sample (  , ^) by:
        </p>
        <p>ℓ(,   ,   ) = max(0,  +   (  ,   ) −   (  , ^)).</p>
        <p>The hinge loss associates a loss value to a training sample (  ,   ) whenever its energy is not
lower by at least a margin  than the energy of the incorrect sample (  , ^).
the value of the energy function:</p>
        <p>× ℛ −→
An Energy-Based Model of Analogical Transfer. The input space  (from which similarity
knowledge is transferred) is the situation space  . The output space  (to which similarity
knowledge is transferred) is the outcome space ℛ . The situation space  is equipped with a
similarity measure  
energy function   :
, and the outcome space is equipped with a similarity measure  ℛ
. The</p>
        <p>R measures the compatibility of the outcome similarities
with the added situation similarities when a potential new case  ^ = (,  ) is added to the case
base. The energy function   is parameterized by a hyperparameter  = (  ,  ℛ , 
), which
includes the case base</p>
        <p>. Indeed, assuming that   and  ℛ
attributes, the compatibility between two similarity measures can not be evaluated per se, but
only relatively to a given set of case pairs. For a new situation  , the goal of the energy-based
inference is to find, among a set of potential outcomes  ∈ ℛ , the outcome   that minimizes
are defined on diferent sets of
 ∈ℛ
  = arg min   (,  ).
measures  ℛ
Among the three parameters of  = (  ,  ℛ ,</p>
        <p>are usually fixed, so that learning  amounts to learning the situation similarity
measure   for the task at hand. This can be done by contrastive divergence using the hinge
loss defined as follows: for a training sample (  ,   ) ∈  × ℛ
and a chosen outcome ^ ∈ ℛ ,
), the case base</p>
        <p>and the outcome similarity
ℓ(,   ,   ) = max(0,</p>
        <p>+   (  ,   ) −   (  , ^)).</p>
        <p>The CoAT case-based prediction method directly implements this energy-based model by
taking as energy function the global indicator Γ:
  CoAT(,  ) = Γ(  ,  ℛ , 
∪ {(,  )}).</p>
      </sec>
      <sec id="sec-2-3">
        <title>Illustration on Some Synthetic Data Sets.</title>
        <p>Fig. 2 gives some examples of energy maps
that are obtained for diferent synthetic data sets on a binary classification task. On each
ifgure, the dataset size is the same ( |</p>
        <p>| = 100), but the instances span diferently on the
2D description space. The instances are equally split into two classes (orange and blue). The
similarity measure on situations   is a decreasing function of the Euclidean distance as in
Sec. 3 (i.e.,</p>
        <p>=   ), except for Fig. 2 d, where   is constructed from a linear transformation
of the Euclidean distance, by choosing from a set of 100 randomly generated transformations,
the one that minimizes the energy of the case base. Let us denote by  * the resulting similarity
measure. The similarity measure on outcomes  ℛ represents class membership as previously.On
the figures, the colors indicate for each point of space the class that would be predicted by the
CoAT algorithm : green for the blue class, and orange for the orange class. The color saturation
is proportional to the diference between the energy of the predicted class and the energy of
the other class.</p>
        <p>Results In Fig. 2 a, the two classes are well separated, and no instance is more similar to an
instance of a diferent class than it is to an instance of the same class, hence,  (  ,  ℛ , 
0. In Fig. 2 b, the two classes are closer, and even overlap, and some inter-class similarities
) =
happen to be lower than some intra-class similarities, leading to the non-zero data set energy
 (  ,  ℛ ,  ) = 86, 786. Fig. 2 c and d show a data set with two linearly separable classes. In
Fig. 2 c,   is set to the (inverse of) the Euclidean distance, which leads to sub-optimal prediction
performance: the prediction frontier does not correspond to the real class frontier, and some
instances are misclassified. The energy of the case base is  (  ,  ℛ ,  ) = 43, 264. In Fig. 2 d,
the similarity measure   is optimized by choosing a similarity measure  * that minimizes
the energy of the case base. The resulting prediction performance is improved: the prediction
frontier corresponds to the real class frontier, and no instance of the case base are misclassified.
The energy of the case base is  ( *,  ℛ ,  ) = 17, 146.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusion</title>
      <p>In this paper we investigated interactions between analogical transfer and similarity learning,
in the framework of CoAT. In particular, we identified similarities between the Γ indicator and
the triplet loss of metric learning, that may be used to obtain suitable similarities for analogical
transfer. We also proposed an interpretation of the CoAT method in the formalism of
energybased models, so that the similarity learning task can be expressed as the task of learning an
energy function.</p>
      <p>The established connections allow to envision other applications. For instance, it could be
used for case base construction and maintenance. Indeed, if we consider the indicator as an
energy function, the competence of a case should relate to its ability, when it is added to the case
base, to lower the energy of other cases. Reasoning with a small but competent case base would
solve one of the actual limitations of the CoAT method, which is the quadratic computational
complexity of the inference procedure.</p>
      <p>An additional direction for future works concerns the integration of expert knowledge,
to promote interaction with domain experts when processing a case base. We envision this
integration at two levels: the design of the similarity measure and the choice of suitable cases.
We envision a semi-automatic approach to reach a suitable compromise between available data,
expert input, and selection of competent cases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <article-title>A logical approach to reasoning by analogy</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>1987</year>
          . arXiv:
          <volume>1011</volume>
          .
          <year>1669v3</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aamodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <source>Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches</source>
          ,
          <source>AI</source>
          Communications
          <volume>7</volume>
          (
          <year>1994</year>
          )
          <fpage>39</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Miclet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bayoudh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delhay</surname>
          </string-name>
          , Analogical Dissimilarity:
          <article-title>Definition, Algorithms and Two Experiments in Machine Learning</article-title>
          , JAIR
          <volume>32</volume>
          (
          <year>2008</year>
          )
          <fpage>793</fpage>
          -
          <lpage>824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Badra</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Lesot</surname>
          </string-name>
          ,
          <string-name>
            <surname>Case-Based Prediction - A Survey</surname>
            ,
            <given-names>IJAR</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Badra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Dataset</given-names>
            <surname>Complexity</surname>
          </string-name>
          <article-title>Measure for Analogical Transfer</article-title>
          , in: IJCAI,
          <year>2020</year>
          , pp.
          <fpage>1601</fpage>
          -
          <lpage>1607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Badra</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Lesot</surname>
          </string-name>
          ,
          <article-title>Theoretical and Experimental Study of a Complexity Measure for Analogical Transfer</article-title>
          , in: ICCBR,
          <year>2022</year>
          , pp.
          <fpage>175</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Badra</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Lesot</surname>
          </string-name>
          , CoAT-APC:
          <article-title>When Analogical Proportion-based Classification Meets Case-Based Prediction</article-title>
          , in: ATA@ICCBR,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Mork</surname>
          </string-name>
          ,
          <article-title>Similarity Measure Development for Case-Based Reasoning-A Data-Driven Approach</article-title>
          , in: NAIS, volume
          <volume>1056</volume>
          , Springer, Cham,
          <year>2019</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>M. M. Deza</surname>
          </string-name>
          , E. Deza, Encyclopedia of distances, in: Encyclopedia of Distances, Springer,
          <year>2009</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>583</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Data Mining and Knowledge Discovery 2 (</article-title>
          <year>1998</year>
          )
          <fpage>283</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gower</surname>
          </string-name>
          ,
          <article-title>A general coeficient of similarity and some of its properties</article-title>
          ,
          <source>Biometrics</source>
          (
          <year>1971</year>
          )
          <fpage>857</fpage>
          -
          <lpage>871</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] W. Cheng, E. Hüllermeier,
          <article-title>Learning Similarity Functions from Qualitative Feedback</article-title>
          , in: ECCBR, volume
          <volume>5239</volume>
          , Springer,
          <year>2008</year>
          , pp.
          <fpage>120</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <article-title>A Data-Driven Approach for Determining Weights in Global Similarity Functions</article-title>
          , in: ICCBR, volume
          <volume>11680</volume>
          , Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>125</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gabel</surname>
          </string-name>
          , E. Godehardt,
          <article-title>Top-Down Induction of Similarity Measures Using Similarity Clouds</article-title>
          , in: ICCBR, volume
          <volume>9343</volume>
          , Springer, Cham,
          <year>2015</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>"why should I trust you?": Explaining the predictions of any classifier</article-title>
          ,
          <source>in: 22nd SIGKDD, ACM</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A uniefid approach to interpreting model predictions</article-title>
          ,
          <source>in: NIPS</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Alves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amblard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bernier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Reducing unintended bias of ML models on tabular and textual data</article-title>
          ,
          <source>in: 8th DSAA</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>J.-B. Lamy</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Sekar</surname>
            , G. Guezennec,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bouaud</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Séroussi</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence for breast cancer: A visual case-based reasoning approach</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>94</volume>
          (
          <year>2019</year>
          )
          <fpage>42</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Mork</surname>
          </string-name>
          ,
          <article-title>On the Explanation of Similarity for Developing and Deploying CBR Systems</article-title>
          , in: AAAI,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Horvath</surname>
          </string-name>
          ,
          <article-title>Unsupervised learning with random forest predictors</article-title>
          ,
          <source>Journal of Computational and Graphical Statistics</source>
          <volume>15</volume>
          (
          <year>2006</year>
          )
          <fpage>118</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forests,
          <source>Machine learning 45</source>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Percha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Garten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <article-title>Discovery and explanation of drug-drug interactions via text mining</article-title>
          ,
          <source>in: PSB</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>410</fpage>
          -
          <lpage>421</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Random forest classifier for remote sensing classification</article-title>
          ,
          <source>Int. J. Remote Sensing</source>
          <volume>26</volume>
          (
          <year>2005</year>
          )
          <fpage>217</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <article-title>The elements of statistical learning</article-title>
          , volume
          <volume>1</volume>
          , Springer series in statistics New York,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seligson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Janzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Belldegrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Horvath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Figlin</surname>
          </string-name>
          ,
          <article-title>Using tumor markers to predict the survival of patients with metastatic renal cell carcinoma</article-title>
          ,
          <source>The Journal of urology 173</source>
          (
          <year>2005</year>
          )
          <fpage>1496</fpage>
          -
          <lpage>1501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Drake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nunez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaddis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Horvath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sahin</surname>
          </string-name>
          , et al.,
          <article-title>Breast cancer molecular signatures as determined by sage: correlation with lymph node status</article-title>
          ,
          <source>Molecular Cancer Research</source>
          <volume>5</volume>
          (
          <year>2007</year>
          )
          <fpage>881</fpage>
          -
          <lpage>890</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rennard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Locantore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Delafont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tal-Singer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Silverman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vestbo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bakke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Calverley</surname>
          </string-name>
          , et al.,
          <article-title>Identification of five chronic obstructive pulmonary disease subgroups with diferent prognoses in the eclipse cohort using cluster analysis</article-title>
          ,
          <source>Annals of the American Thoracic Society</source>
          <volume>12</volume>
          (
          <year>2015</year>
          )
          <fpage>303</fpage>
          -
          <lpage>312</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>K.</given-names>
            <surname>Peerbhay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mutanga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ismail</surname>
          </string-name>
          ,
          <article-title>Random forests unsupervised classification: The detection and mapping of solanum mauritianum infestations in plantation forestry using hyperspectral data</article-title>
          ,
          <source>IEEE J. Sel. Top. Appl. Earth Obs Remote Sens</source>
          <volume>8</volume>
          (
          <year>2015</year>
          )
          <fpage>3107</fpage>
          -
          <lpage>3122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Washio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Lowest probability mass neighbour algorithms: Relaxing the metric constraint in distance-based neighbourhood algorithms</article-title>
          ,
          <source>Machine Learning</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Isolation forest</article-title>
          ,
          <source>in: 8th ICDM, IEEE</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>413</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>P.</given-names>
            <surname>Geurts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ernst</surname>
          </string-name>
          , L. Wehenkel,
          <article-title>Extremely randomized trees</article-title>
          ,
          <source>Machine learning 63</source>
          (
          <year>2006</year>
          )
          <fpage>3</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dalleau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smaïl-Tabbone</surname>
          </string-name>
          ,
          <article-title>Unsupervised extremely randomized trees</article-title>
          ,
          <source>in: 22nd PAKDD</source>
          <year>2018</year>
          , volume
          <volume>10939</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>489</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -93040-4_
          <fpage>38</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -93040-4\_
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dalleau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smaïl-Tabbone</surname>
          </string-name>
          ,
          <article-title>Unsupervised extra trees: a stochastic approach to compute similarities in heterogeneous data</article-title>
          ,
          <source>Int. J. Data Sci. Anal</source>
          .
          <volume>9</volume>
          (
          <year>2020</year>
          )
          <fpage>447</fpage>
          -
          <lpage>459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dalleau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smaïl-Tabbone</surname>
          </string-name>
          ,
          <article-title>Computing vertex-vertex dissimilarities using random trees: Application to clustering in graphs</article-title>
          ,
          <source>in: 18th IDA</source>
          , volume
          <volume>12080</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bellet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Habrard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebban</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Metric Learning for Feature Vectors and Structured Data (</article-title>
          <year>2014</year>
          ). arXiv:
          <volume>1306</volume>
          .
          <fpage>6709</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bellet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Habrard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebban</surname>
          </string-name>
          , Metric Learning, AIM, Springer,
          <year>2015</year>
          . URL: https://hal. science/hal-01121733. doi:
          <volume>10</volume>
          .2200/S00626ED1V01Y201501AIM030.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>LeCun, Learning a similarity metric discriminatively, with application to face verification</article-title>
          ,
          <source>in: CVPR</source>
          , volume
          <volume>1</volume>
          ,
          <year>2005</year>
          , pp.
          <fpage>539</fpage>
          -
          <lpage>546</lpage>
          . doi:
          <volume>10</volume>
          .1109/ CVPR.
          <year>2005</year>
          .
          <volume>202</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schrof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Philbin,</surname>
          </string-name>
          <article-title>FaceNet: A Unified Embedding for Face Recognition and Clustering</article-title>
          , in: CVPR,
          <year>2015</year>
          , pp.
          <fpage>815</fpage>
          -
          <lpage>823</lpage>
          . arXiv:
          <volume>1503</volume>
          .
          <fpage>03832</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Vassileios</surname>
          </string-name>
          <string-name>
            <surname>Balntas</surname>
          </string-name>
          , Edgar Riba,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mikolajczyk</surname>
          </string-name>
          ,
          <article-title>Learning local feature descriptors with triplets and shallow convolutional neural networks</article-title>
          ,
          <source>in: Proceedings of the British Machine Vision Conference (BMVC)</source>
          , BMVA Press,
          <year>2016</year>
          , pp.
          <volume>119</volume>
          .
          <fpage>1</fpage>
          -
          <lpage>119</lpage>
          .11. URL: https: //dx.doi.org/10.5244/C.30.119. doi:
          <volume>10</volume>
          .5244/C.30.119.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tao</surname>
          </string-name>
          , J. Liu, P. Liu,
          <article-title>Semi-supervised sparse metric learning using alternating linearization optimization</article-title>
          ,
          <source>in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2010</year>
          , p.
          <fpage>1139</fpage>
          -
          <lpage>1148</lpage>
          . URL: https://doi.org/10.1145/1835804.1835947. doi:
          <volume>10</volume>
          .1145/1835804.1835947.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <article-title>Self-taught metric learning without labels</article-title>
          , in: CVPR, IEEE,
          <year>2022</year>
          , pp.
          <fpage>7421</fpage>
          -
          <lpage>7431</lpage>
          . URL: https://doi.org/10.1109/CVPR52688.
          <year>2022</year>
          .
          <volume>00728</volume>
          . doi:
          <volume>10</volume>
          . 1109/CVPR52688.
          <year>2022</year>
          .
          <volume>00728</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , S. Chopra,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ranzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Tutorial on Energy-Based Learning</article-title>
          ,
          <source>in: Predicting Structured Data</source>
          ,
          <year>2006</year>
          , p.
          <fpage>59</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Osindero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-W.</given-names>
            <surname>Teh</surname>
          </string-name>
          ,
          <source>Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation, Cognitive Science 30</source>
          (
          <year>2006</year>
          )
          <fpage>725</fpage>
          -
          <lpage>731</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>