<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Analogical Inference in Healthcare</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Safa Alsaidi</string-name>
          <email>safa.alsaidi@inria.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Couceiro</string-name>
          <email>miguel.couceiro@loria.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie Quennelle</string-name>
          <email>sophie.quennelle@inria.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anita Burgun</string-name>
          <email>anita.burgun@aphp.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Garcelon</string-name>
          <email>nicolas.garcelon@institutimagine.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrien Coulet</string-name>
          <email>adrien.coulet@inria.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre de Recherche des Cordeliers, Inserm, Université paris Cité, Sorbonne Université</institution>
          ,
          <addr-line>F-75006 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Imagine Institute</institution>
          ,
          <addr-line>F-75015 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Inria Paris</institution>
          ,
          <addr-line>F-75012 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LORIA, CNRS, Universite de Lorraine</institution>
          ,
          <addr-line>F-54000</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Service d'Informatique Biomédicale, Hôpital Necker-Enfants Malades, Assistance Publique - Hôpitaux de Paris</institution>
          ,
          <addr-line>F-75015 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>40</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>Analogical proportions are statements of the form  :  ::  :  that are used to map similar relationships between two pairs of objects,  ,  , and  ,  . Analogies have long been a subject of research in the Natural Language Processing (NLP) community, where they have been applied to a variety of reasoning and classification tasks. Lately, machine and representation learning have shown to be useful for analogical reasoning. In this paper, we discuss the possibility of adapting the analogical framework to healthcare applications, in particular to medical decision support. We particularly hypothesize that as language representations help in analogical reasoning in NLP, patient representation learned from Electronic Health Records (EHRs) may help in healthcare. We define three diferent analogy based settings adapted to EHR data that we see as first steps to the development of analogical applications to this domain. We provide statistics on the first sets of analogies that we built from a publicly available dataset of EHRs, and report preliminary, but promising results to detect patient-stay analogies following our very first experimental setting.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;analogy classification</kwd>
        <kwd>electronic health records</kwd>
        <kwd>patient representation learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and motivation</title>
      <p>
        An analogical proportion, or an analogy, is a relation between four objects  ,  ,  , and  that
is expressed as “ is to  as  is to  ” and formally denoted as  :  ::  :  . There are
two main tasks associated with analogical proportions: analogy detection and analogy solving.
Analogy detection corresponds to the task of deciding whether a quadruple ⟨, , ,  ⟩ is a
valid analogy. Analogy solving corresponds to finding a fourth element  so that  :  ::  : 
is a valid analogy. This can be done either by retrieving  from a pool of candidates or by
generating  . Analogies have been extensively studied and applied to various Natural Language
Processing (NLP) tasks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. Object representations called embeddings are low-dimensional
representations of high-dimensional vectors, which have been used to improve deep learning
methodologies. Some of these embeddings learn precise representations and are able to detect
diferences between objects. As a result they can discriminate between valid and invalid
analogical proportions and solve analogical equations.
      </p>
      <p>In this paper, we explore the possibility to leverage the analogy framework to solve tasks
relevant to the healthcare domain. We particularly consider using Electronic Health Records
(EHRs) to learn patient representations, i.e., patient embeddings. We initiate the construction of
sets of patient-based analogies using relationships existing between patient hospital stays from
a publicly available set of EHRs. These health records consist of clinical and administrative
data collected during patient hospital stays. Generally they are composed of structured (e.g.,
diagnostic codes, lab tests) and unstructured data (e.g., clinical notes, nursing reports, discharge
summaries), either static (e.g., patient demographics) or temporal (e.g., vital signs).</p>
      <p>
        EHRs have been secondary used to conduct epidemiological and observational studies. They
have also been used as real word data to train predictive models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In particular, deep learning
methods have become increasingly popular in medical informatics for general tasks such as
predicting mortality, in-hospital readmission, diagnoses, etc. A key element for such tasks
is to efectively convert patient data from the raw EHR format to embeddings that can be
further processed [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Representation learning thus consists of learning low-dimension feature
representations from raw data. As EHR data are heterogeneous and complex, studies have shown
that deep learning models are suited to encode complex EHR data to learn patient representations
and that various architectures are suited to diferent biomedical tasks [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref7 ref8 ref9">7, 8, 9, 10, 11, 12</xref>
        ]. For
instance, Madhumita et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] used a stacked denoised autoencoder and a paragraph vector
model to learn generalized patient representations directly from clinical notes. Si and Roberts
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] utilized a three-level hierarchical attention-based recurrent neural network (HAN) with
greedy segmentation to learn patient representation from clinical notes. Zhang et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
proposed 2 multi-modal neural network architectures to enhance patient representation learning
by combining sequential unstructured notes with structured data.
      </p>
      <p>
        Analogies have only been sporadically applied to healthcare. Nonetheless, analogical
reasoning has been applied in clinical practice by physicians for diagnosis and prognosis, as a way
of linking visible signs and symptoms to possible causes. Indeed, medical reasoning relies on
observations of previous patients with similar signs and symptoms, who happened to have a
certain disease. Several studies have investigated analogies in healthcare by applying various
machine learning methods. For instance, Rather et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] used analogical proportions to
identify hidden or unknown biomedical knowledge from free text resources. In their work, they
defined analogies of the form “ acetaminophen is a type of drug as diabetes’ is a type of disease.”
Dynomant et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used analogical proportions to compare embedding methods trained on
a corpus of French health-related documents. Each analogical proportion aimed to verify if
(  ⃗ 1 −   ⃗ 2) +   ⃗ 3 ≈   ⃗ 4, allowing to check if the similarity between the first
two terms is similar to the one between Term 3 and Term 4.
      </p>
      <p>
        In this paper, we describe an ongoing work on analogical inference in healthcare. We
introduce three analogy based settings, where each setting aims to investigate specific biomedical
tasks, namely identity, predictive, and generative tasks. In comparison with previous studies
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref14 ref7 ref8 ref9">14, 7, 8, 9, 10, 11, 12</xref>
        ], we aim to build analogies based on patient-stay representations. One of
the main contributions of our work is a framework to build sets of proportions for analogical
inference in healthcare.
      </p>
      <p>This paper is organized as follows. Section 2 provides a description of the MIMIC-III dataset.
Section 3 defines our analogical settings and associated biomedical tasks, and justifies our
task choices. Section 4 presents preliminary statistics of the analogical proportions built from
MIMIC-III. Section 5 initiates a discussion addressing some analogical postulates that could
be useful when generating our analogies. Section 6 illustrates the feasibility of our approach
by providing preliminary results using one of our experimental settings. Section 7 discusses
perspectives for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data description</title>
      <p>
        We propose to use EHRs as a source of patient medical history data and aim to consider both its
structured and unstructured data to define our analogies. In particular we experiment with a
publicly available dataset of EHRs called MIMIC-III (Medical Information Mart for Intensive
Care-III) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. MIMIC-III is a critical care database, developed by the Massachusetts Institute of
Technology (MIT)’s Laboratory for Computational Physiology and distributed by PhysioNet
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. It contains integrated, de-identifed health-related data in accordance with Health Insurance
Portability and Accountability Act (HIPAA). It contains data associated with all patients admitted
to the ICU (Intensive Care Unit) of Beth Israel Deaconess Medical Center between 2001 and 2012.
It contains various data, such as patient demographics, vital signs, lab test results, medications,
hospital length of stay, survival, clinical notes, imaging reports and more, structured into 26
tables. Each patient-stay is associated with diagnosis codes, motivating the stay and procedures
performed during the stay. It encompasses data of more than 40,000 ICU patients and more
than 60,000 ICU stays. Table 1 shows statistics for the subgroups of adult patients (aged 18 and
above) with at least two stays, which is the subset that we consider in the rest of the article.
      </p>
      <p>The database contains a combination of structured and unstructured data and is accessible to
researchers under a data use agreement, where users are required to follow a HIPAA training
course demanded by the National Institutes of Health (NIH).</p>
      <sec id="sec-2-1">
        <title>Patients (total)</title>
      </sec>
      <sec id="sec-2-2">
        <title>Gender, male (total)</title>
      </sec>
      <sec id="sec-2-3">
        <title>Age (median, in years)</title>
      </sec>
      <sec id="sec-2-4">
        <title>ICU stays (total)</title>
      </sec>
      <sec id="sec-2-5">
        <title>Hospital stays (total)</title>
      </sec>
      <sec id="sec-2-6">
        <title>ICU length of stay (median, in days)</title>
      </sec>
      <sec id="sec-2-7">
        <title>Hospital length of stay (median, in days)</title>
      </sec>
      <sec id="sec-2-8">
        <title>Clinical notes per stay (median) Statistics</title>
        <p>8, 526
4, 818
66.24
23, 345
19, 709
2.33
9.74
18.0</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental settings and biomedical tasks</title>
      <p>As we defined previously, an analogy is a 4-ary relation and is usually written as  :  ::  :  .
In this paper, we define three analogy based settings and associated tasks that we are interested
in investigating with EHR data. We name our three settings as follows: (i) Identity; (ii) Identity
+ Sequent; (iii) Identity + Directly Sequent. For these settings, we do not want to learn “full”
patient representation, but patient-stay representations (i.e., learn a numeric vector representation
of EHR data that belong to a single hospital stay) which we hope to be simpler.
Identity In the first setting, we propose to build analogies of the form:

 1 :   2
 1
 1 ::   3
 2 :   4
 2
where   refers to the stay  of patient  . Here, pairs of the analogy quadruples are made of two
random stays belonging to the same patient. Since there is no constraint on the order of stays,
 1 can happen before   1 or the inverse. Note that  1 and  2 could be the same patient, and that
 1  2
 1 and  2, or  3 and  4, could represent the same time stamp. Furthermore,  1 and  3 or  2 and  4
could be the same when  1 =  2 (but not when  1 ̸=  2). In this setting we aim at investigating
identity tasks, i.e., associating an unafected sample of data to the patient it belongs. Note that
this setting fits several data cleaning and data privacy related applications
Identity + Sequent For this setting, we add a temporal constraint to analogies, as we force

 11 ≪   2
 1 and   23 ≪   4
 2
where ≪ denotes temporal sequentiality between stays of a same patient, i.e., 

after   1 and   24 takes place after   2 but not necessarily directly right after. We consider cases
 1  3
where  1 =  2. In this setting, we also define a relation named diagnosis, which forces   1 and
 1
 23 to have the same diagnosis. This relation provides more meaning to our analogies and gives
us more medical insight into the relationship between our patients. For example, based on the
 12 takes place
diferent stays associated with a single patient we hope to see how a certain disease develops
(similarly or diferently) between two distinct patients.</p>
      <p>Identity + Directly Sequent In this third setting, we make the temporal constraint more
strict as we force the two stays of the same patient to be directly sequent (no other stay can
exist in between). We note this constraint

 11 ≺   2
 1 and   23 ≺   4
 2
The diagnosis relation is kept between   1 and   23 , and cases where  1 =  2 are also
 1
considered. With these three settings, we aim at investigating the applicability of two tasks:
analogy detection and analogy inference. For instance, given an analogy of the form  :  ::
 :  , we can either propose potential values for an unknown stay  (i.e., predictive task) or
generate stays which would enrich our dataset with synthetic stays (i.e., generative task).</p>
      <p>
        For the last two settings, we define additional settings by considering three levels of relaxation
of the diagnosis constraint. It is satisfied either if both stays are associated with the very same
primary diagnostic code (level 4) or in more relax settings, i.e., if both codes belong to the
same level-3 or level-2 branch of the hierarchy of the ICD-9-CM (International Classification of
Diseases, Ninth Revision, Clinical Modification) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Statistical details on the influence of the
constraints is discussed in the next section.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Preliminary statistics on MIMIC-III</title>
      <p>We computed some preliminary statistics on MIMIC-III dataset to check how many analogical
proportions can be formed for each of the three analogical settings and based on the three level
diagnosis constraint as shown in Table 3. To form the analogies, we built tuples of each of the
two stays that belong to a single patient  . We kept only adult patients (aged 18 and above) that
have at least two hospital stays. For our first setting, we define a
valid analogy as a quadruple
of four stays (  1</p>
      <p>2 ,   24 ), where each pair of two stays belong to a single patient   . Since
we do not restrict the order of the stays for each of the pairs, our analogies were made of all the
permutations of all the stays belonging to a patient.
 3</p>
      <p>For our second and third settings, we define a
valid analogy to be a quadruple made of four

stays (  1
 2 have the same diagnostic code. As an order constraint is introduced for these two settings,
we had to make sure that   11 takes place before   1 and   23 takes place before   24 . For the second
 2
setting, the stays do not necessarily happen directly right after, where other stays can exist in
between. As for the third setting, one stay immediately follows the other, i.e., there is no other
stay in between them.</p>
      <p>For the diagnosis constraint, we referred to the ICD-9-CM, which is the standard nomenclature
for assigning diagnosis codes to each hospital stay. Indeed, each stay has a unique primary
diagnosis code and a set of secondary codes. Diagnostic codes are organized hierarchically as
follows: (1) chapter, (2) block, (3) 3-digit category, and (4) full code. As an example, the diagnosis
code 767.4 and its hierarchy are presented Table 2.</p>
      <sec id="sec-4-1">
        <title>Level Level name</title>
      </sec>
      <sec id="sec-4-2">
        <title>Example of code range</title>
        <p>1
2
3
4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Chapter</title>
      </sec>
      <sec id="sec-4-4">
        <title>Block</title>
      </sec>
      <sec id="sec-4-5">
        <title>3-digit category code</title>
      </sec>
      <sec id="sec-4-6">
        <title>Full code</title>
        <p>760–779
764–779
767
767.4
and third analogical settings, we preprocessed our diagnosis codes in the following manner.
We kept only the primary diagnostic code associated with each patient-stay. We filtered ICD-9
codes that appeared only once. For the second level and third level diagnosis settings, we
performed the same preprocessing except that we had to first convert the ICD-9 codes into their
corresponding category and block formats.</p>
        <p>Table 3 provides the number of valid analogies that can be formed with the defined settings.
As shown, the number of analogies is the highest for the Identity setting, which could be
explained as a result of the absence of constraint on both the order of stays and diagnosis. The
more strict the order constraint is, the less the amount of valid analogies that could be formed.
The diagnosis constraint also influences the number of analogies that could be generated. The
number of analogies is the lowest for full code (level 4) settings, where less patients share the
very same primary diagnosis code. In comparison, more patients can share a single diagnosis
code in the category (level 3) and block (level 2) settings, where we observe the highest number
of analogies in the block setting for both the second and third analogical settings.</p>
      </sec>
      <sec id="sec-4-7">
        <title>Setting</title>
      </sec>
      <sec id="sec-4-8">
        <title>Identity</title>
      </sec>
      <sec id="sec-4-9">
        <title>Identity + Sequent</title>
      </sec>
      <sec id="sec-4-10">
        <title>Identity+Directly Sequent</title>
      </sec>
      <sec id="sec-4-11">
        <title>ICD Level</title>
      </sec>
      <sec id="sec-4-12">
        <title>Analogies N/A</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Properties of analogies for data augmentation</title>
      <p>
        As we are interested in exploring diferent deep learning models, we would need large amounts
of data to train them. To enlarge the training datasets, one may use analogy properties to
generate more analogies in a process called data augmentation. Training our model on diferent
equivalent forms of the same analogy could help reduce overfitting. Previous works [
        <xref ref-type="bibr" rid="ref2 ref21 ref22">21, 2, 22</xref>
        ]
have defined postulates that proportional analogy should obey; some of which include the
following:
• reflexivity:  :  ::  : 
• inner reflexivity:  :  ::  : 
• determinism:  :  ::  :  →  = 
• symmetry:  :  ::  :  →  :  ::  : 
• inner symmetry:  :  ::  :  →  :  ::  : 
• central permutation:  :  ::  :  →  :  ::  :  .
      </p>
      <p>However, not all these postulates hold for all our settings. Based on the current definitions of
our analogical settings, we can apply reflexivity for all the three settings. Inner reflexivity can
only be applied for the Identity setting. Adding this postulate for the second and third settings
would require to loose our order constraint, which is inconsistent with the temporal aspect of
predictive modeling. Determinism holds for all the settings. We include this postulate even if
it produces trivial analogies. Central permutation can be applied on our analogies for the first
setting only and in the very particular case when  1 =  2. When  1 ̸=  2, central permutation
cannot be applied to increase our dataset as it would enable to associate stays of distinct patients,
which is inconsistent with the aim of the Identity setting. Concerning the second and third
analogy settings, central permutation cannot be applied as it violates the order constraint in
most cases. Note that central permutation can be applied for these two settings for cases when
 1 =  2,  2 ≤  1,  3 ≤  4, and the same diagnosis is associated to   1 and   23 . Inner symmetry
 1
can be applied for the Identity setting, but it violates the order constraints for the other two
settings. For all the three analogical settings, by applying symmetry to one valid analogy, we
can increase the number of valid analogies as it does not violate any of the three constraints.
In addition to valid forms, we can also consider invalid forms (i.e., that contradict some of the
setting constraints or that cannot be inferred from the base cases using the allowed postulates)
for classification purposes.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Preliminary experiments: error analyses in the Identity setting</title>
      <p>
        We set up a preliminary experiment on the analogy detection task, addressing our Identity
setting. Inspired by [
        <xref ref-type="bibr" rid="ref23 ref3">3, 23</xref>
        ], we consider a CNN classifier adapted to patient-stay, to determine
whether a given (, , ,
      </p>
      <p>
        ) constitutes a valid analogy. For the embedding model, we consider
the Fusion CNN model developed by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which combines both structured and unstructured
data to obtain patient-stay representations. For this very first experiment, we only consider
structured data limited to demographics and admission-related information. For unstructured
data, we group clinical notes associated with a hospital stay. We learn clinical note embeddings
and concatenate them with static information following [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to obtain our final patient-stay
representations.
      </p>
      <p>We considered hospital stays of 200 patients extracted randomly from MIMIC-III. We define
a valid analogy as a quadruple of four stays (  1
 2 ,   24 ), where each pair of two stays

belong to a single patient   . We do not define any order constraint for our stays; therefore,
 1 can happen before   12 . Quadruples where  1 =  2 are also included in the dataset. To
 1
generate other valid analogies, we make use of all postulates in Section 5, except for central
permutation that is only applied in the case when  1 =  2. As reflexivity forces  1 =  2, it cannot
be applied in the cases where  1 ̸=  2. Accordingly, given a valid analogy  :  ::  :  , we
generate 8 additional valid analogies, namely,  :  ::  :  ,  :  ::  :  ,  :  ::  :  ,
 :  ::  :  ,  :  ::  :  ,  :  ::  :  ,  :  ::  :  , 
:  ::  :  , and 2 invalid,
namely,</p>
      <p>:  ::  :  and  :  ::  :  . When  1 =  2, we generate one more valid analogy
of the form  :  ::  :</p>
      <p>and we consider invalid analogies as valid.</p>
      <p>For training and evaluation, we split our dataset into 70% training set and 30% testing set,
representing 939,638 analogies for training and 402,703 for testing. We randomly draw 50,000
analogies of each set (i.e., training and testing) when loading the data. Following the data
augmentation procedure introduced before, given a valid analogy, we generate 9 valid analogies
(i.e., positive examples) and 2 invalid analogies (i.e., negative examples) for cases when  1 ̸=  2.
In contrast, we generate 12 valid analogies and no invalid analogies for cases when  1 =  2.
Based on this setting, we tend to generate more valid analogies than invalid ones. We trained
our model on 10 epochs, with 3 random intializations to observe how the model behaves and
how much it is able to learn. We only computed the accuracy and obtained 96.85 ± 1.75 for
valid analogies and 70.31 ± 1.94 for invalid analogies. Our model performs the best for positive
examples which can be explained as a result of the imbalance between valid and invalid examples
in the training data. Nonetheless, these preliminary results seem to show that the model learns,
to some extent, patient-stay identity relationships.</p>
      <p>To gain a deeper insight on how our classification model works, we present four examples:
one true positive, one false negative, one true negative, and one false positive. Patient ids in
the examples below have been changed and dates have been shifted. We provide elements of
interpretation to explain why our model correctly classifies some analogies and why in other
cases it does not.</p>
      <p>Analysis of a true positive example. We consider the stay   11 of patient 1249, who is a
female, with 83yo, sufers from Measles keratitis, admitted twice before, and with 12 Radiology
reports documenting this stay. The second stay   1 of the same patient 1249, but with 81yo,
 2
sufers from Pancreat cyst/pseudocyst, only admitted once before, and with 5 Radiology reports
and 7 Nursing/Other reports. The stay   2 belongs to patient 4695, who is a female, 21yo, with
 3
Acute venous embolism and thrombosis of superficial veins of upper extremity, admitted 5
times before, and with 5 Radiology reports and 2 Nursing/Other reports. The stay   2 of the
 4
same patient 4695, but with 22yo, sufers from Hypertensive chronic kidney disease, admitted 8
times before, and with 5 Physician reports and 7 Nursing reports documenting this stay.</p>
      <p>This example has been correctly classified as valid for all the 9 valid forms. As we do
not introduce any order constraint for this setting, we can notice that for some forms like
 :  ::  :  and  :  ::  :  ,   12 would take place before   11 in time. The model correctly
classifies these analogy forms as valid.</p>
      <p>Analysis of a false negative example. We consider the stays in this example to belong to
the same patient 1109, who is a female. The stay   1 of patient 1109, with 25yo, sufers from
 1
Malignant essential hypertension, admitted 7 times before, and with 1 Radiology report and
2 Nursing/Other reports documenting this stay. The second stay   1 of patient 1109, but with
 2
26yo, with Hypertensive chronic kidney disease, admitted 4 times before, and with 8 Physician
reports and 4 Nursing reports. The third stay   2 of patient 1109, but with 26yo, admitted once
 3
again for Hypertensive chronic kidney disease, admitted 3 times before, and with 3 Physician
reports and 8 Nursing reports. The fourth stay   2 of patient 1109, with 27yo, sufers from
 4
Vascular complications of medical care, admitted 5 times before, and with 3 Physician reports
and 9 Nursing reports.</p>
      <p>As we mentioned above, for cases where  1 =  2, applying central permutation would also give
us valid analogies. In this example, our model incorrectly classified the form of  :  ::  : 
as invalid. As there were less analogies made of four stays that belong to the same patient
included in our dataset, we noticed that our model is more likely to incorrectly classify these
analogies, particularly for invalid forms.</p>
      <p>Analysis of a true negative example.
with 23yo, sufers from Hypertensive chronic kidney disease, admitted 4 times before, and with
4 Physician reports and 8 Nursing reports documenting this stay. The stay   1 belongs to the
same patient 553, but with 24yo, admitted once again for Hypertensive chronic kidney disease,
9 times before, and with 8 Radiology reports and 4 Nursing/Other reports.


 2 belongs to patient 2387, who is a male, with 52yo, with Unspecified disease of pericardium,
admitted 7 times before, and with 2 Radiology reports and 2 Nursing/Other reports. The stay
 2 belongs to the same patient 2387, but with 53yo, with Unspecified pleural efusion, admitted
This analogy has been correctly classified as invalid for both invalid forms,  :  ::  :</p>
      <p>We consider the stay   11 of patient 553, who is a male,
and  :  ::  :  .</p>
      <p>Analysis of a false positive example.</p>
      <p>1
We consider the stay   1 of patient 2771, who is a
 2
 4
male, with 71yo, sufers from Subendocardial infarction, admitted 16 times before, and with 9
Nursing/Other reports. The second stay   1 belongs to the same patient 2771, but with 70yo,
sufers from Other pulmonary embolism and infarction, admitted 5 before, and with 1 Radiology
report and 3 Nursing/Other reports. The stay   2 of patient 2222, who is a female, with 69yo,
with Diverticulosis of colon with hemorrhage, admitted 9 times before, and with 2 Physician
reports and 10 Nursing reports. The stay   2 belongs to the same patient 2222, but with 67yo,
with Arterial embolism and thrombosis of lower extremity, admitted 5 times before, and with 3
 3
Nursing/Other reports.</p>
      <p>This analogy has been incorrectly classified as valid for the invalid form,  :  ::  :  . We
noticed that when the category of the clinical notes is similar between two hospital stays and
when our hospital stays do not include a lot of clinical notes, our model seems to struggle to
distinguish between the two hospital stays.   1 and   2 have the same number of Nursing/Other
reports. Thus these reports may not contain enough information to help our model diferentiate</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this paper we discussed an exploratory approach to investigate analogical inference in
healthcare. We started by briefly surveying some related work that address diferent applications of
analogical reasoning in diferent domains. We defined three analogical settings for diferent
healthcare tasks, and discussed the motivation behind our settings. We also presented
preliminary statistics of the sets of analogical proportions that we built from MIMIC-III. The main
contribution of our work is the formalization of settings that are meaningful in healthcare, and
that guide the process of building sets of analogies in healthcare. We discuss the pertinence of
certain widely used postulates in this healthcare context. Lastly, we also illustrated the Identity
setting on which we addressed a preliminary experiment on the analogy detection task. These
ifrst results pave the way to conducting further experiments on the other two settings, and to an
in depth analysis of the potential of coupling representation learning and analogical reasoning
in healthcare.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We thank IARML reviewers for their constructive and positive feedback. Experiments presented
in this paper were carried out using computational clusters equipped with GPU from the
Grid’5000 testbed (see https://www.grid5000.fr).</p>
      <p>The research work of the second named author is partially supported by TAILOR, a EU Horizon
2020 project (GA No 952215), and the Inria Project Lab “Hybrid Approaches for Interpretable
AI” (HyAIAI).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Murena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Ghossein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dessalles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cornuéjols</surname>
          </string-name>
          ,
          <article-title>Solving analogies on words based on minimal complexity transformation</article-title>
          ,
          <source>in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1848</fpage>
          -
          <lpage>1854</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lepage</surname>
          </string-name>
          , De l'
          <article-title>analogie rendant compte de la commutation en linguistique, Habilitation à diriger des recherches</article-title>
          , Universit'e
          <string-name>
            <surname>Joseph-Fourier - Grenoble</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alsaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lay</surname>
          </string-name>
          , E. Marquer,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Murena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <article-title>A neural approach for detecting morphological analogies</article-title>
          ,
          <source>in: Proceedings of the 8th IEEE International Conference on Data Science and Advanced Analytics (DSAA)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hertzmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oliver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Curless</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Salesin</surname>
          </string-name>
          ,
          <article-title>Image analogies</article-title>
          ,
          <source>in: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH)</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brunak</surname>
          </string-name>
          ,
          <article-title>Mining electronic health records: towards better research applications and clinical care</article-title>
          ,
          <source>Nature Reviews Genetics</source>
          <volume>13</volume>
          (
          <year>2012</year>
          )
          <fpage>395</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <article-title>Deep representation learning of patient data from electronic health records (ehr): A systematic review</article-title>
          ,
          <source>Journal of biomedical informatics</source>
          (
          <year>2020</year>
          )
          <fpage>103671</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R. A.</given-names>
            <surname>Solares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hassaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Canoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Salimi-Khorshidi</surname>
          </string-name>
          ,
          <article-title>Behrt: Transformer for electronic health records</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Landi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Glicksberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cherng</surname>
          </string-name>
          , G. Landi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Danieletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Furlanello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miotto</surname>
          </string-name>
          ,
          <article-title>Deep representation learning of electronic health records to unlock patient stratification at scale</article-title>
          ,
          <source>npj Digital Medicine</source>
          <volume>3</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Miotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Kidd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Dudley</surname>
          </string-name>
          ,
          <article-title>Deep patient: an unsupervised representation to predict the future of patients from the electronic health records</article-title>
          ,
          <source>Scientific reports 6</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Patient representation from structured electronic medical records based on embedding technique: Development and validation study</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>9</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. He,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Representation learning for clinical time series prediction tasks in electronic health records, BMC Medical Informatics and Decision Making 19-S (</article-title>
          <year>2019</year>
          )
          <fpage>259</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kowsari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Lobo</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. E. Barnes,</surname>
          </string-name>
          <article-title>Patient2vec: A personalized interpretable deep representation of the longitudinal electronic health record</article-title>
          ,
          <source>IEEE Access 6</source>
          (
          <year>2018</year>
          )
          <fpage>65333</fpage>
          -
          <lpage>65346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Madhumita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walter</surname>
          </string-name>
          ,
          <article-title>Patient representation learning and interpretable evaluation using clinical notes</article-title>
          ,
          <source>Journal of biomedical informatics 84</source>
          (
          <year>2018</year>
          )
          <fpage>103</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <article-title>Patient representation transfer learning from clinical notes based on hierarchical attention network</article-title>
          ,
          <source>AMIA Summits on Translational Science Proceedings</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
          <fpage>597</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Zhang,</surname>
          </string-name>
          <article-title>Combining structured and unstructured data for predictive models: a deep learning approach</article-title>
          ,
          <source>BMC Medical Informatics and Decision Making</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <fpage>280</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Rather</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Using deep learning towards biomedical knowledge discovery</article-title>
          ,
          <source>International Journal of Mathematical Sciences and Computing</source>
          ,
          <source>(IJMSC) 3</source>
          (
          <issue>2017</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dynomant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lelong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dahamna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Massonnaud</surname>
          </string-name>
          , G. Kerdelhué,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grosjean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canu</surname>
          </string-name>
          , Darmoni,
          <article-title>Word embedding for the french natural language in health care: comparative study</article-title>
          ,
          <source>JMIR medical informatics 7</source>
          (
          <year>2019</year>
          )
          <fpage>118</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A. E. W.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , T. J.
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L. wei H.</given-names>
          </string-name>
          <string-name>
            <surname>Lehman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Ghassemi</surname>
            , B. Moody, P. Szolovits,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Celi</surname>
          </string-name>
          , R. G. Mark,
          <article-title>Mimic-iii, a freely accessible critical care database</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Goldberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. N.</given-names>
            <surname>Amaral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Glass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hausdorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Mark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Mietus</surname>
          </string-name>
          , G. B. Moody, C.
          <article-title>-</article-title>
          K. Peng,
          <string-name>
            <given-names>H. E.</given-names>
            <surname>Stanley</surname>
          </string-name>
          , Physiobank, physiotoolkit, and
          <article-title>physionet: components of a new research resource for complex physiologic signals</article-title>
          .,
          <source>Circulation 101 23</source>
          (
          <year>2000</year>
          )
          <fpage>E215</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <article-title>Centers for Disease Control and Prevention, International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM)</article-title>
          , https://www.cdc.gov/nchs/icd/icd9cm.htm,
          <year>2015</year>
          . Accessed:
          <fpage>2022</fpage>
          -05-01.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Miclet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bayoudh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delhay</surname>
          </string-name>
          ,
          <article-title>Analogical dissimilarity: Definition, algorithms and two experiments in machine learning</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>32</volume>
          (
          <year>2008</year>
          )
          <fpage>793</fpage>
          -
          <lpage>824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>C.</given-names>
            <surname>Antic</surname>
          </string-name>
          , Analogical proportions, ArXiv abs/
          <year>2006</year>
          .02854 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Solving word analogies: A machine learning perspective, in: Proceedings of the Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU)</article-title>
          , volume
          <volume>11726</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>