<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Some Preliminary Results on Analogies Between Sentences Using Contextual and Non-Contextual Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Barbero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stergos Afantenos</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT, University of Toulouse</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>34</fpage>
      <lpage>45</lpage>
      <abstract>
        <p>Analogies have been characterized as fundamental to abstraction, concept formation, and perception, and are traditionally expressed as quadruplets in the form of proportional analogies  :  ::  :  read “ is to  as  is to  ”. While Natural Language Processing (NLP) has primarily focused on word analogies and SAT problems, recent research has started exploring analogies between sentences and even documents. In this paper we explore the potential of identifying analogies between pairs of sentences via the identification of common latent relations between them. We exploit three diferent datasets generating pairs of sentences which can either share the same latent relation-forming thus an analogy-or not. We encode phrases into a higher dimensional vector space using embeddings from GloVe, BERT, and RoBERTa which we then feed to both a Multi Layer Perceptron (MLP) and a Convolutional Neural Network (CNN). Results show that architectures using contextual embeddings as inputs outperform those based on static embeddings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Analogies have preoccupied humanity at least since antiquity [1]. In recent years they have
been characterized as being at “the core of cognition” [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ] and have even been considered as
being the fundamental mechanism via which abstraction, concept formation and perception are
achieved [
        <xref ref-type="bibr" rid="ref2 ref3">3, 4</xref>
        ].


=  respectively1 [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ].
      </p>
      <p>Traditionally analogies have been expressed as a quadruplet  :  ::  :  read “ is to  as
 is to  ”. Such quadruplets then form valid analogies if pairs (,  ) and (,  ) share the same
underlying relation, forming thus a proportional analogy. The underlying relation has been
viewed as the symbolic counterpart of arithmetic or geometric proportions:  −  =  −  and</p>
      <p>
        In Natural Language Processing (NLP) various approaches adopt the framework of quadruplets
focusing mostly on word analogies, such as man is to woman as king is to queen [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">6, 7, 8, 9</xref>
        ],
morphology [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ] or on SAT problems [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ]. More recently several researchers have focused on
the problem of identifying analogies between sentences [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">12, 13, 14</xref>
        ] or even documents [
        <xref ref-type="bibr" rid="ref14">15</xref>
        ].
author is the corresponding author.
* This work was performed while the first author was working at IRIT, University of Toulouse, France. The first
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
      </p>
      <p>
        In this paper we are interested in further exploring the potential of analogies between
sentences via the identification of common latent relations between them. We exploit three
diferent datasets, namely the Microsoft Research Paraphrases Corpus (MSRP) [
        <xref ref-type="bibr" rid="ref15">16</xref>
        ], the Penn
Discourse TreeBank (PDTB) [
        <xref ref-type="bibr" rid="ref16">17</xref>
        ] as well as the Stanford Natural Language Inference (SNLI)
corpus [
        <xref ref-type="bibr" rid="ref17">18</xref>
        ] and we use GloVe [
        <xref ref-type="bibr" rid="ref18">19</xref>
        ], or transformer-based architectures such as BERT [
        <xref ref-type="bibr" rid="ref19">20</xref>
        ] and
RoBERTa [
        <xref ref-type="bibr" rid="ref20">21</xref>
        ] for the encoding of phrases into a higher dimensional vector space. We show
that architectures that are based on contextual embeddings outperform ones that are based on
static embeddings.
      </p>
      <p>The rest of the paper is structured as follows. In Section 2 we present the related work. In
Section 3 we present the datasets that we have used in order to perform our experiments. The
methodology for these experiments is described in Section 4 while the results are presented in
Section 5. We conclude in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Initial work on analogies in NLP was performed by [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] who introduced Latent Relational
Analysis (LRA) in order to identify analogies in the context of the Scholastic Aptitude Test (SAT),
testing this approach in 20 scientific and metaphorical examples.
      </p>
      <p>
        More recently Mikolov et al. [
        <xref ref-type="bibr" rid="ref21 ref22">22, 23</xref>
        ] have used analogies as a means to test the quality of static
vectors representing word embeddings produced with word2vec for use in neural architectures.
The authors showed that such embeddings could preserve the parallelogram rule that is found
in analogies, evaluating thus the intrinsic qualities of such embeddings. Later work though
has shown that this is not suficient since most models appear to take shortcuts; no evidence
exists of abstraction and analogical mapping, as one would expect from such claims. More
precisely, [
        <xref ref-type="bibr" rid="ref23">24</xref>
        ] show the Google analogy test set that we used by [
        <xref ref-type="bibr" rid="ref21 ref22">22, 23</xref>
        ] is not well balanced
and thus does not allow us to draw any safe conclusions concerning the underlying embeddings.
They show that the vector ofset approach is not enough to claim that the proposed method
captures analogies. The authors thus introduce the Bigger Analogy Test Set (BATS). In this
more sophisticate dataset the authors show that derivational and lexicographic relations remain
a challenge. Similar conclusions are drawn by [
        <xref ref-type="bibr" rid="ref24">25</xref>
        ] both for the vector ofset approach as well
as the 3CosAdd [
        <xref ref-type="bibr" rid="ref25">26</xref>
        ]. They argue that such datasets cannot be used to evaluate the intrinsic
qualities of such datasets.
      </p>
      <p>
        In terms of word analogy classification [
        <xref ref-type="bibr" rid="ref26">27</xref>
        ] used the Google dataset [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ] which they extended
using permutation properties of analogies, presented in the same article. They then apply a
Convolutional Neural Network using as input Glove embeddings representing each word. A
similar approach was also adopted by [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ] in the context of detecting morphological analogies.
We also adopt this approach in this paper.
      </p>
      <p>
        Recently, several researchers have explored sentential analogies. [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ] explore analogies
between sentences in order to identify  from a predefined set of possible candidates, given
(,  ) and  such that  :  ::  :  is a valid analogy. They use syntactic and semantic
datasets and test various embedding methods. In a similar vein [
        <xref ref-type="bibr" rid="ref27">28</xref>
        ] perform a similar task but
generate  instead. Both approaches show that analogies based on syntactic analogies obtain
better results than semantic ones. [
        <xref ref-type="bibr" rid="ref12 ref13">13, 14</xref>
        ] explore sentential analogies based purely on semantic
information.
      </p>
      <p>
        In another approach, [
        <xref ref-type="bibr" rid="ref14">15</xref>
        ] view analogies via the prism of the Structure Mapping Theory [
        <xref ref-type="bibr" rid="ref28">29</xref>
        ].
Their goal is to identify analogies in procedural texts focusing on the structural similarities
between the texts. Underlying texts describe procedures in two diferent domains. The authors
extract entities and their relationships. The latter are sets of ordered verbs. They extract those
based on question answer pairs. The similarity measures that they propose reflect the fact that
the two sets share more relations. Bert vectors representing the questions via which entities were
extracted, are used in order to measure cosine similarity and thus identify potential mappings.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data used</title>
      <p>In order to perform our experiments we used three well known datasets. In what follows we
provide a detailed description of the corpora used as well as the procedure which lead us to the
creation of analogical quadruplets that were later used in our experiments. We should mention
that we used the input datasets as they were released, no further additions or modifications
were performed from us.</p>
      <sec id="sec-3-1">
        <title>3.1. Paraphrases</title>
        <p>
          The first corpus that we used was the Microsoft Research Paraphrases Corpus (MSRP) [
          <xref ref-type="bibr" rid="ref15">16</xref>
          ]
which is composed of 5801 pairs of sentences labeled as paraphrase or not. The pairs are
distilled from a database containing more than 13 million sentences pairs, itself extracted from a
more than 9 million sentences corpus [
          <xref ref-type="bibr" rid="ref15">16</xref>
          ]. The 9M sentences corpus is composed of sentences
extracted from +32k news clusters from internet. This corpus then has been largely reduced
to contain sentences with a credited author only, leaving 49375 individual sentence pairs. So
this corpus is composed of naturally occurring, non handcrafted sentences pairs. Sentences
pairs with minimal variations such as typography error have been removed as they could have
constituted “low quality” paraphrases.
        </p>
        <p>A Support Vector Machine-Classifier (SVM-Classifier) is then used to identify a set of possible
paraphrases from the 49375 sentences pairs. This set is validated by human annotators later.
The SVM-Classifier is trained on a 10000 sentences pairs training set annotated by 2 human
judges, and a 3rd who served the function of judge in case of disagreement. The distribution
of this training set is 2968 positives examples and 7032 negatives. The classifier considered
multiples features: string similarity, morphological variants, synonyms mapping with WordNet
Lexical Mapping and Encarta Thesaurus, and finally composite features. The SVM-Classifier
allowed to extract 20574 sentences pairs as possible paraphrases from the 4959375 previously
considered. The number is high because the classifier’s role was to separate possible sentences
pairs to be evaluated by human judgment and not discriminate all non-paraphrases pairs, so
the classifiers tend to classify inputs as positive rather than negative, at the assumed cost of
having more false-positives.</p>
        <p>
          Human judgment was applied to a 5801 subset of the 20574 previously extracted sentence
pairs. Two judges annotated each sentence and a third one was used in case of disagreement.
Each judge was asked if the pairs’ sentences were semantically equivalent. About 3900 (67%) of
the sentences pairs were labeled as semantically equivalent.
3.2. PDTB
The second corpus that we used was the Penn Discourse TreeBank (PDTB) [
          <xref ref-type="bibr" rid="ref16">17</xref>
          ] corpus which
contains discourse annotations between sentences clauses extracted from the Wall Street Journal
Corpus containing over 1 million words. The corpus describes a total of 36592 relations [
          <xref ref-type="bibr" rid="ref16">17</xref>
          ].
Discourse annotations can be triggered by an explicit or implicit discourse connective. The
former are extracted from syntactically defined classes and are separated in 3 grammatical
classes subordinating conjunctions, coordinating conjunctions and discourse adverbs. Explicit
connectives can be connected to more than 1 clause or sentence, but the minimality principle
is applied which requires minimum information to complete the interpretation. In the case of
an implicit connection between the two clauses the annotators have been instructed to insert
an explicit connective. Three other labels were available in order to correctly annotate three
cases that prevented the annotators from inserting a coherent explicit connective. The AltLex
indicating that the relation was already explicited by a non-connective expression, the insertion
of a connective would then lead to a redundancy. The entRel indicating the existence of an
entity based coherence relation between the two clauses, but no other relation. And finally
noRel in case of no relation between the two clauses. PDTB relations are ordered hierarchically
into class, type and subtype. For our experiments we used the first level of the hierarchy.
        </p>
        <p>
          The inter-annotator agreement was high: 90,2% for explicit relations and 85,1% for implicit
when exact match metric was considered; and respectively 94,5% and 92,6% when partial
match metric was considered. Class level disagreement was resolved by a team of 3 experts,
disagreement at lower levels were resolved by providing a tag for the direct higher level.
Agreement for the class level reached 94%, 84% for type level and 80% for subtype level.
3.3. SNLI
The Stanford Natural Language Inference (SNLI) corpus [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ] labels pairs of sentences as
Contradiction, Entailment or semantic neutrality [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ]. It contains 570k pairs of sentences by humans.
Construction of the corpus was done using Mechanical Turks who were presented with a
premise in the form of a sentence and were asked to provide three hypotheses, in a sentential
form, for contradiction, entailment and semantic similarity. 10% of the corpus was validated by
trusted Mechanical Turks. Overall a Fleiss  of 0.70 was achieved.
        </p>
        <p>The indeterminacies of event and entity co-reference are two well known issues during
labeling of NLI data degrading the quality of the annotated corpus. They represent respectively
a possible confusion between an Entailment and a Neutral relation, and between a Contradiction
and a Neutral relation. This confusion comes from the fact that an assumption may or not have
been made.</p>
        <p>In order to solve this problem the annotation process was made in a grounded scenario aiming
to reduce assumptions. Annotators were then able to generate sentences in the same scenario
in order to illustrate the relations instead of relying on automatic data augmentation techniques.
The work of 2500 employees permitted the data collection phase. When presented with an
image caption without the matching image, the annotators had to write three sentences, one
for each relation (the exact instructions are described in the SNLI paper). The image captions
came from the Flickr30k corpus containing 160k captions from 30k individuals images. The
validation phase is completed on 10% of the 570k pairs of sentences by a set of 30 trusted workers.
They were presented pairs of sentences and had to label them, each pair being presented to 4
annotators so there is 5 judgments considering the label from the data collection phase. The
gold-label has been assigned to the pairs with at least a 3-annotators consensus, representing
98% of the data. The corpus is then separated in three individual files : test and dev (10k pairs
each), train (the rest of the pairs).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.4. Generation of analogical quadruplets</title>
        <p>In order to create our analogical quadruplets we proceeded as follows. For each of the
aforementioned datasets we randomly selected two pairs of sentences each one linked with a relation.
Since our input datasets do not contain relations that have as arguments the same sentences,
we never have analogies of the form  :  ::  :  . In case the relation linking the two pairs
is the same we have a positive instance of an analogy otherwise a negative instance. For the
SNLI corpus we considered neutral as not being a relation. For each input dataset we create a
balanced training, test and development datasets containing the same number of positive and
negative instances. Training consists of 400K instances while testing and development 40K
instances each.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        Our problem can be formalized as follows. Given a set of quadruplets of sentences  :  ::  : 
which can either form an analogy (pairs  :  and  :  share the same latent relation) or not we
need to estimate a function that predicts whether a new instance of four sentences is an analogy
or not. Each quadruplet is represented by the input tokens of its sentences  = { 1, . . . ,  | |}
with  ∈ {, , ,  } and | | representing the length of the sentence. With each quadruplet we
associate a  ∈ {0, 1} which represents whether the quadruplet is an analogy or not. For each
quadruplet we obtain embeddings using GloVe [
        <xref ref-type="bibr" rid="ref18">19</xref>
        ], BERT [
        <xref ref-type="bibr" rid="ref19">20</xref>
        ] and RoBERTa [
        <xref ref-type="bibr" rid="ref20">21</xref>
        ] which we
then pass to two diferent architectures, a Multi-layer perceptron (MLP) and a Convolutional
Neural Network (CNN).2
      </p>
      <sec id="sec-4-1">
        <title>4.1. Embeddings</title>
        <p>In order to perform classification we need to provide embeddings for each sentence. In the
case of GloVe3 static embeddings are provided for each word, while in the case of BERT4 and
RoBERTa5 embeddings are dynamic. In order to obtain embeddings that represent sentences
from the ones representing words a common approach [20, for example] is to take the mean of
the embeddings representing each word. This is the approach that we have used as well. For
2Our code is available at https://github.com/ThomasBARBERO/EXPLO_ANALOGIE
3The Glove embeddings that we used are the following: https://huggingface.co/sentence-transformers/average_
word_embeddings_glove.6B.300d
4https://huggingface.co/bert-base-uncased
5https://huggingface.co/roberta-base
each sentence  ∈ {, , , 
} we obtain an embedding
with emb ∈ {glove, bert, roberta}. Thus four diferent embeddings a, b, c and d are
obtained for each of the sentences , ,  and  . In the case of BERT we have also examined the use
of the representation obtained for the final hidden state of the special symbol [CLS]. Embedding
dimensions for each method are shown in Table 1. No further fine-tuning was performed on
BERT or RoBERTa.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Classifiers</title>
        <p>Multi-layer perceptron (MLP) The first classifier that we use is a multi-layer perceptron.
The MLP takes as input the concatenation of the representations for the four sentences a, b, c
and d as a vector [a; b; c; d] and has two hidden layers, the first has a dimension of 100 and the
second of 50.</p>
        <p>z1 = W1 [a; b; c; d] + b1
z2 = W2 z1 + b2
^ =</p>
        <p>1
1 +  −z2
with W1, b1, W2, b2 learnable matrices. The output layer is of dimension 1 and we use a
sigmoïd function providing a score for the final prediction
Convolutional Neural Network (CNN) The second classifier architecture that we have
used are Convolutional Neural Networks which are widely used for image and audio processing,
but are useful for Natural Language Processing tasks too as well, including analogies [27, 10,
inter alia]. CNNs aim to recognize patterns, extracting features from the initial tensor given
as input. The core of CNNs is their convolutional layers applying filters called kernels on the
whole input. Kernels’ weights and bias are learnt parameters, convolution between a kernel
and a tensor allows to extract a learnt feature from the tensor. Parameters define the method
for the application of kernels: the kernels’ size, the stride which indicates the spatial distance
between two kernel applications and padding which indicates the number of pixels to add on
the considered tensor’s borders. Our CNN’s implementation is as follows, illustrated in Fig. 1:
1. The input goes through a first convolutional layer with 2×1 kernels and 2×1 stride
allowing to firstly get feature maps for the pairs (,  ) and (,  ). At the end of this process (,  )
and (,  ) are reduced to one dimension regarding the width while the other dimension
represents the embedding size. The size of the output is 2 ×    _ .
2. We fed the output to a second convolutionnal layer with 2 × 2 kernels and 2 × 2 stride so
the features maps of (,  ) and (,  ) are now reunited in one single dimension across
width, the embedding size is divided by 2 too.
3. We then apply dropout and feed the output to a singular linear layer, we use sigmoid
activation function to compute a confidence score for the 2 sentence pairs being in an
analogical proportion relation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>For both architectures we ranged learning rate between 10−4 and 10−5, and dropout from 0.1
to 0.3. We used Adam optimizer with default PyTorch settings and Binary Cross Entropy Loss.
Results for both architectures and combinations of embeddings are shown in Table 2.</p>
      <sec id="sec-5-1">
        <title>5.1. Transformer-based Language Models vs GloVe</title>
        <p>Transformer-based Language Models outperform GloVe almost constantly in terms of accuracy
and F1-score (ability to recognize valid analogical proportions). While the scores are not
significantly higher, we can still conclude that contextual embeddings provide better handling
of latent relations and analogies between sentences in comparison to static embeddings. Let us
note also that representing a sentence by the mean of its contextual word vectors outperforms
the CLS sentence representation.
Precision Recall F1</p>
        <p>Accuracy</p>
        <p>Precision Recall F1</p>
        <p>Accuracy
GloVe-mean
BERT-base-mean
BERT-base-CLS
roBERTa-base-mean ccllaassss 10
roBERTa-base-CLS
GloVe-mean
BERT-base-mean
BERT-base-CLS
roBERTA-base-mean ccllaassss 10
roBERTa-base-CLS
GloVe-mean
BERT-base-mean
BERT-base-CLS
roBERTa-base-mean ccllaassss 10
roBERTa-base-CLS</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Performance across corpora</title>
        <p>Overall scores for the SNLI dataset are the highest with accuracy ranging from 62.708 to 68.01
and F1-score peaking at 68.2 across MLP and CNN. Scores for MRPC are a bit lower with
accuracy ranging from 57.537 to 63.992 and F1-score peaking at 62.662 considering CNN only
as it constantly outperforms MLP. The classifiers had a harder time grasping analogies on
the PDTB corpus with accuracy ranging from 53.773 to 56.47 and F1-score peaking at 57.564,
F1-score being below 50 for roBERTa-base-CLS/CNN and GloVe-mean/MLP. The classifiers had
a harder time grasping analogies on the PDTB corpus with accuracy ranging from 53.773 to
56.47 and F1-score peaking at 57.564, F1-score being below 50 for roBERTa-base-CLS/CNN and
GloVe-mean/MLP. This can be explained by the fact that the number of latent relations that we
had to handle in PDTB is much higher (5 latent relations) than the latent relations that we have
in the MRPC or the SNLI corpora. We assume that providing more data will yield better overall
results.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. BERT vs roBERTa</title>
        <p>One main diference between BERT and roBERTa is respectively the presence and absence of
the Next sentence prediction training task. While BERT considered this task to be beneficial
for the learning of long range dependencies roBERTa considered this task counter-productive.
Although roBERTa performs slightly better than BERT (considering the mean-pooling sentence
representation method) we cannot draw a definitive conclusion about the utility of the Next
Sentence Prediction training task for analogical properties learning. A bigger training set may
have enforced the tendency. roBERTa outperformed BERT for SNLI and MRPC, the two corpora
for which the sentences from the sentence pairs do not follow each other in a natural context.
The next sentence prediction may be detrimental in this case.
5.4. CNN vs MLP
Both MLP and CNN are relevant for the classification task we performed as they have almost
similar results with the CNNs usually outperforming the MLPs, but not always. However the
MRPC’s results show a large diference in performance between the 2 classifiers, the CNNs
performing significantly better than the MLPs. Meaningful features were extracted from the
sentences representations. As we described Section 4 features are first extracted from the
(,  ) pair and the (,  ) pair in tandem. This could probably be attributed due to teh fact that
paraphrases use semantically similar words which probably are closer to the vector space which
is better captured by CNNs than MLPs, although further analysis is needed in order for this
claim to be verified.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>In this paper we have focused on the problem of identifying analogies between pairs of sentences
based on common latent relations that exist or not between the pairs. We have used both
contextual embeddings (BERT en roBERTa) as well as static embeddings (GloVe). Both BERT
and roBERTa outperformed GloVe at the binary classification task we performed. We believe
an error analysis or a diferent classification task might shed more light on those results. In
conclusion this work scratches the surface of Transformer-based Language Models’ ability to
encode analogical properties. Our experiments show that embeddings issued from
Transformerbased architectures can better capture analogies via the identification of common latent relations,
in comparison to static embedding approaches. Nonetheless it is premature to conclude that
such architectures can indeed capture more broadly the mechanism of analogy making.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the anonymous reviewers for their thoughtful and constructive
comments. This work has been partially funded by the ANR AT2TA project, grant number
ANR-22-CE23-0023.
[1] Aristotle, Poetics, 384—322 BCE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Hofstadter</surname>
          </string-name>
          ,
          <article-title>Analogy as the Core of Cognition</article-title>
          , in: D.
          <string-name>
            <surname>Gentner</surname>
            ,
            <given-names>K. J.</given-names>
          </string-name>
          <string-name>
            <surname>Holyoak</surname>
            ,
            <given-names>B. N.</given-names>
          </string-name>
          <string-name>
            <surname>Kokinov</surname>
          </string-name>
          (Eds.),
          <source>The Analogical Mind: Perspectives from Cognitive Science</source>
          , The MIT Press, Cambridge, Massachusetts,
          <year>2001</year>
          , pp.
          <fpage>499</fpage>
          -
          <lpage>538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hofstadter</surname>
          </string-name>
          , E. Sander,
          <article-title>Surfaces and Essences: Analogy as the Fuel and Fire of Thinking</article-title>
          , Basic Books,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          ,
          <source>On the measure of intelligence</source>
          ,
          <year>2019</year>
          . arXiv:
          <year>1911</year>
          .01547.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Barbot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Miclet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          ,
          <article-title>Analogy between concepts</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>275</volume>
          (
          <year>2019</year>
          )
          <fpage>487</fpage>
          -
          <lpage>539</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          , in: C.
          <string-name>
            <surname>J. C.</surname>
          </string-name>
          <article-title>B</article-title>
          . et al. (Ed.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          , Curran Associates Inc.,
          <year>2013</year>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , W.-t. Yih, G. Zweig,
          <article-title>Linguistic regularities in continuous space word representations, in: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Atlanta, Georgia,
          <year>2013</year>
          , pp.
          <fpage>746</fpage>
          -
          <lpage>751</lpage>
          . URL: https://aclanthology.org/N13-1090.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Classifying and completing word analogies by machine learning</article-title>
          ,
          <source>International Journal of Approximate Reasoning</source>
          <volume>132</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          . URL: https:// www.sciencedirect.com/science/article/pii/S0888613X21000141. doi:https://doi.org/ 10.1016/j.ijar.
          <year>2021</year>
          .
          <volume>02</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Solving word analogies: A machine learning perspective</article-title>
          ,
          <source>in: European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty</source>
          , volume
          <volume>11726</volume>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>250</lpage>
          . URL: https://doi.org/10.1007/ 978-3-
          <fpage>030</fpage>
          -29765-7_
          <fpage>20</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -29765-7\_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alsaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lay</surname>
          </string-name>
          , E. Marquer,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Murena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <article-title>A neural approach for detecting morphological analogies</article-title>
          ,
          <source>in: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1109/DSAA53316.
          <year>2021</year>
          .
          <volume>9564186</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Turney</surname>
          </string-name>
          ,
          <article-title>The Latent Relation Mapping Engine: Algorithm and Experiments</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>33</volume>
          (
          <year>2008</year>
          )
          <fpage>615</fpage>
          -
          <lpage>655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , G. de Melo,
          <article-title>Sentence analogies: Linguistic regularities in sentence embeddings</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3389</fpage>
          -
          <lpage>3400</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .coling-main.
          <volume>300</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . coling-main.
          <volume>300</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Afantenos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kunze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Analogies between sentences: Theoretical aspects - preliminary experiments</article-title>
          , in: J.
          <string-name>
            <surname>Vejnarová</surname>
          </string-name>
          , N. Wilson (Eds.),
          <article-title>Symbolic and Quantitative Approaches to Reasoning with Uncertainty - 16th European Conference</article-title>
          ,
          <string-name>
            <surname>ECSQARU</surname>
          </string-name>
          <year>2021</year>
          , Prague, Czech Republic,
          <source>September 21-24</source>
          ,
          <year>2021</year>
          , Proceedings, volume
          <volume>12897</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>18</lpage>
          . URL: https://doi.org/ 10.1007/978-3-
          <fpage>030</fpage>
          -86772-
          <issue>0</issue>
          _1. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86772-0\_1.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Afantenos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Theoretical study and empirical investigation of sentence analogies</article-title>
          , in: M.
          <string-name>
            <surname>Couceiro</surname>
          </string-name>
          , P. Murena (Eds.),
          <source>Proceedings of the Workshop on the Interactions between Analogical Reasoning and Machine Learning (International Joint Conference on Artificial Intelligence - European Conference on Artificial Intelligence (IJAIECAI</source>
          <year>2022</year>
          )), Vienna, Austria, July
          <volume>23</volume>
          ,
          <year>2022</year>
          , volume
          <volume>3174</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>28</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3174</volume>
          /paper2.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O.</given-names>
            <surname>Sultan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shahaf</surname>
          </string-name>
          ,
          <article-title>Life is a circus and we are the clowns: Automatically finding analogies between situations and processes</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>3547</fpage>
          -
          <lpage>3562</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          . emnlp-main.
          <volume>232</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brockett</surname>
          </string-name>
          ,
          <article-title>Automatically constructing a corpus of sentential paraphrases</article-title>
          ,
          <source>in: Proceedings of the Third International Workshop on Paraphrasing (IWP2005)</source>
          ,
          <year>2005</year>
          . URL: https://aclanthology.org/I05-5002.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dinesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Miltsakaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Robaldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Webber</surname>
          </string-name>
          ,
          <source>The Penn Discourse TreeBank 2</source>
          .0.,
          <source>in: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Marrakech, Morocco,
          <year>2008</year>
          . URL: http://www.lrec-conf.org/proceedings/lrec2008/pdf/754_ paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          , G. Angeli,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          ,
          <source>in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . URL: http://www.aclweb.org/anthology/D14-1162.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . URL: http: //arxiv.org/abs/
          <year>1907</year>
          .11692, cite arxiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>26</volume>
          , Curran Associates Inc.,
          <year>2013</year>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . URL: https://proceedings.neurips.cc/paper/2013/hash/ 9aa42b31882ec039965f3c4923ce901b-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>in: International Conference on Learning Representations, Workshop</source>
          ,
          <year>2013</year>
          . URL: https://arxiv.org/abs/1301.3781. doi:
          <volume>10</volume>
          .48550/ARXIV.1301.3781.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gladkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drozd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Matsuoka</surname>
          </string-name>
          ,
          <article-title>Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't., in: North American Chapter of the Association for Computational Linguistics</article-title>
          , Student Research Workshop, Association for Computational Linguistics, San Diego, California,
          <year>2016</year>
          , pp.
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          . URL: https://aclanthology.org/N16-2002. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N16</fpage>
          -2002.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drozd</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Li,</surname>
          </string-name>
          <article-title>The (too many) problems of analogical reasoning with word vectors</article-title>
          ,
          <source>in: Joint Conference on Lexical and Computational Semantics</source>
          , Association for Computational Linguistics, Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>148</lpage>
          . URL: https://aclanthology.org/S17-1017. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>S17</fpage>
          -1017.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Linguistic regularities in sparse and explicit word representations</article-title>
          ,
          <source>in: Proceedings of the Eighteenth Conference on Computational Natural Language Learning</source>
          , Association for Computational Linguistics, Ann Arbor, Michigan,
          <year>2014</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>180</lpage>
          . URL: https://aclanthology.org/W14-1618. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>W14</fpage>
          -1618.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Solving word analogies: A machine learning perspective</article-title>
          ,
          <source>in: Proc.15th Europ. Conf. Symb. &amp; Quantit</source>
          . Appr. to Reas. with Uncert.
          <source>(ECSQARU)</source>
          ,
          <source>LNCS 11726</source>
          ,
          <fpage>238</fpage>
          -
          <lpage>250</lpage>
          , Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lepage</surname>
          </string-name>
          ,
          <article-title>Vector-to-sequence models for sentence analogies</article-title>
          ,
          <source>in: International Conference on Advanced Computer Science and Information Systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>441</fpage>
          -
          <lpage>446</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICACSIS51025.
          <year>2020</year>
          .
          <volume>9263191</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gentner</surname>
          </string-name>
          , Structure Mapping:
          <article-title>A Theoretical Framework for Analogy</article-title>
          ,
          <source>Cognitive Science 7</source>
          (
          <year>1983</year>
          )
          <fpage>155</fpage>
          -
          <lpage>170</lpage>
          . URL: https://doi.org/10.1207/s15516709cog0702_3. doi:
          <volume>10</volume>
          .1207/ s15516709cog0702\_
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>