<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An analogy based framework for patient-stay identification in healthcare</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Safa Alsaidi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Couceiro</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esteban Marquer</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie Quennelle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anita Burgun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Garcelon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrien Coulet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre de Recherche des Cordeliers, Inserm, Université Paris Cité, Sorbonne Université</institution>
          ,
          <addr-line>F-75006 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Imagine Institute</institution>
          ,
          <addr-line>F-75015 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Inria Paris</institution>
          ,
          <addr-line>F-75012 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LORIA, CNRS, Universite de Lorraine</institution>
          ,
          <addr-line>F-54000</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Service d'Informatique Biomédicale, Hôpital Necker-Enfants Malades, Assistance Publique - Hôpitaux de Paris</institution>
          ,
          <addr-line>F-75015 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Analogical proportions are statements of the form “ is to  as  is to ”. Analogies have been used in various reasoning and classification tasks, addressing diferent domains. Representation learning has enabled interesting progress in various analogy reasoning applications, where it focuses on the challenge of obtaining a vector representation of complex data. In the biomedical domain, representation learning has been adapted to patient data to solve various tasks such as predicting readmission, diagnosis, and length of stay. In this paper, we focus on the particular task of patient-stay identification, i.e., does a hospital stay belong to a patient or not? This constitutes a building block for addressing key biomedical tasks such as patient matching and privacy preservation. We propose a prototypical architecture that combines patient-stay representation learning and the analogical reasoning framework. For evaluation, we constitute sets of analogies from real-word Electronic Health Records, where objects are patient-stay representations learned from the data. We enrich our analogies using analogical properties and use them to train a neural model to detect whether an analogy is valid. We define three first experimental setups to address our task, present our empirical results, and discuss further perspectives.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;analogy classification</kwd>
        <kwd>patient matching</kwd>
        <kwd>electronic health records</kwd>
        <kwd>patient representation learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        An analogical proportion, or simply analogy, is a quaternary relation involving four objects , ,
 , and  that draws a parallel between the relation between  and  and the relation between
 and , and that supports analogical reasoning. There are two common tasks associated with
analogies, namely, analogy detection and analogy solving. Analogy detection aims at deciding
whether a quadruple ⟨, , , ⟩ constitutes a valid analogy. Analogy solving aims at finding
an  that makes  :  ::  :  a valid analogy. Analogy reasoning has been applied to diferent
Natural Language Processing (NLP) tasks such as mining paradigm tables in linguistics and
image generation [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        Representation learning consists of learning low-dimension feature representations (i.e.,
embeddings) from data. These embeddings, or vector representations, of objects (i.e., words,
images, characters, etc.) underpin much of modern machine learning and have demonstrated
impressive performance on various downstream NLP tasks. For instance, Lim et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed
a deep learning model to tackle analogies using semantic embeddings. Their architecture
integrates the characteristics of analogies by design and relies heavily on pretrained GloVe
embeddings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These embeddings were not trained explicitly to find analogies; yet they
were able to detect diferences between objects. Hertzmann et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed an analogical
framework to learn “image filters” between a pair of images to create an “analogous” filtered
result on a third image. The generated image  should relate to  in the same way as  relates
to . Alsaidi et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] developed a neural approach and used character-based embeddings to
detect morphological analogies between words.
      </p>
      <p>
        Analogies have not been suficiently exploited in healthcare, which thus motivates our work.
However, practitioners unconsciously use analogical reasoning (i.e., medical reasoning) in their
daily clinical practice to understand the possible causes for a disease diagnosis and prognosis
by linking visible signs and symptoms that have been observed among diferent patients. In
addition, several machine learning methods were applied to investigate analogies in healthcare.
For instance, Casteleiro et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] utilized analogies to infer disease treatments from statements
extracted from text. In their work, they try to extract biomedical facts by analogical reasoning
from embeddings. Dynomant et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used analogical proportions to compare embedding
methods trained on a corpus of French health-related documents (i.e., discharge summary,
procedure reports, and prescriptions). Analogical proportions were applied on the embeddings
of medical documents to verify if (⃗ − ⃗) + ⃗ ≈ ⃗, thus allowing to check whether the
similarity between  and  is similar to the one between  and . An example of an analogical
proportion they obtain is “(cardiology - heart) + lung ≈ pneumology.” Rather et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] used
analogical proportions to identify hidden or unknown biomedical knowledge from free text
resources. They proposed analogical proportions of the form “acetaminophen is a type of drug
as diabetes is a type of disease.”
      </p>
      <p>
        In this paper, we aim to explore how the analogy framework can help in solving tasks relevant
to the healthcare domain. We propose two models that learn patient-stay representations (i.e.,
learn a vector representation of all the patient EHR data collected during a single stay) to
detect analogies in healthcare. To do so, we define two crucial steps that are (1) the learning of
embeddings adapted to patient data, and (2) the definition of a neural network dedicated to learn
formal properties of analogy. As for the network, we use the same model that was proposed
by Lim et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for word semantics, and later adapted by Alsaidi et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] by incorporating
character-based embeddings for morphological analogies. We argue that the framework itself
has the potential to be applied in a wide range of domains, and we propose to use it here for
healthcare applications, namely, for the patient identification task we introduce below.
      </p>
      <p>
        Electronic Health Records (EHRs) are real world healthcare data that have been used to
train predictive models (including neural network models) for diferent biomedical tasks, e.g.,
predicting patient mortality, hospital readmission, length of stay, etc. These EHRs consist of
clinical and administrative data collected during patient hospital stays in the form of both
structured and unstructured data. Structured data generally includes diagnostic codes, lab
tests, demographics, admission-related information, etc. It can be either static, e.g., patient
demographics, or temporal, e.g., vital signs. Unstructured data includes various documents in
natural language such as clinical notes, nursing reports, discharge summaries, lab reports, etc.
For this work, we consider EHRs from the MIMIC-III (Medical Information Mart for Intensive
Care, version 3) database [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to learn patient representations (i.e., patient embeddings) by
converting patient data from the raw EHRs to embeddings that can be further processed.
MIMICIII is a free publicly available hospital database containing de-identified patient health data. This
database has been widely used by researchers conducting data mining and machine learning
studies applied to healthcare.
      </p>
      <p>
        Several neural network architectures have been developed to represent biomedical data. For
instance, Si et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] adapted a multi-level CNN to learn patient representations from clinical
notes through a multi-task learning framework to predict patient mortality and length of stay.
Zhang et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used GRU-based RNN to capture relationships between clinical events and
employed attention mechanism to learn a personalized representation to predict patient’s future
hospitalization using EHR data. Madhumita et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] used a stacked denoising autoencoder
and a paragraph vector model to learn generalized patient representations directly from clinical
notes to predict patient mortality, primary diagnostic, procedural category, and patient gender.
Zhang et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed two neural network architectures that enhance patient representation
learning by combining sequential unstructured notes with structured data and evaluated these
representations on 3 risk evaluation tasks (i.e., in-hospital mortality, 30-day hospital readmission,
and length of stay prediction). In our paper, we learn patient-stay representations and consider
the task of patient-stay identification. We think that the tools that address this task will serve
as building blocks for more complex and key biomedical tasks, such as patient matching and
privacy preservation checking [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ].
      </p>
      <p>In this paper, we particularly propose to tackle this task by relying on the detection of
analogies in healthcare. In Section 2, we define the setting of analogy that we work on. The
models we propose to detect analogies are described in Section 3, along with the procedures
we use for data augmentation, training, and evaluation. In Section 4, we provide a description
of the MIMIC-III dataset and detail how we build our experimental dataset. We present our
experiments and report our results in Section 5. In Section 6, we discuss perspectives for future
research.</p>
      <p>The main contributions of this paper are the following:
• we propose an analogy based setting using patient-stay representations;
• we propose an embedding model to learn patient-stay representations;
• we display the performance of our classification model to detect analogies on patient-stay
data.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Defining the task</title>
      <p>As we defined previously, an analogy is a 4-ary relation written as  :  ::  :  and expressed
as “ is to  as  is to ”. In this paper, we work on patient-stay analogies, i.e., on analogies
involving hospital stay. In our setting, , , , and  represent patient-stay representations.
We define an analogy based setting on patient-stay data that we refer to as Identity setting.
For that, we consider patient-stay representations, which are vector representations of EHR
data that belong to a single hospital stay. Based on the type of EHR data that we decide to
include, our patient-stay representations can be made of a representation of either structured or
unstructured data, or they can be made of the concatenation of both types of data. More details
are provided in Section 5. For this setting, we propose to build analogies of the form:
11 : 12 :: 23 : 2
4
where  refers to the stay  of patient . Here, pairs of the analogy quadruples are made of two
random stays belonging to the same patient. Since there is no constraint on the order of stays,
11 can happen before 12 or the inverse. Note that 1 and 2 can be the same patient, and that
1 and 2, or 3 and 4, can represent the same time stamp. Furthermore, 1 and 3 or 2 and 4
can be the same when 1 = 2 (but not when 1 ̸= 2). The Identity setting finds applications in
several tasks relevant to biomedical informatics, including:
• data cleaning,
• data privacy related application,
• patient matching.</p>
      <p>
        Data cleaning applications in the health domain involve repairing or removing patient health
data that is inaccurate, incorrectly structured, duplicative, or incomplete. In data cleaning
applications, we can associate an erroneously afected sample of data to the patient it belongs.
Privacy related applications include verifying if patient data is de-identified, and whether it
can be re-identified using diferent systems. Patient matching is defined as the identification
and linking of one patient’s data within and across health databases in order to obtain a
comprehensive view of that patient’s health care record [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In patient matching, we try to
match patient-related information, either a single patient data (e.g., a document) or full EHR
data, that can coexist in one or several databases.
      </p>
      <p>In this paper, we try to match patient-stay representations to the patient they belong to. We
focus on the task of patient-stay identification, where we aim to determine if a particular hospital
stay belongs to a certain patient. We address this task by learning a model to classify such
quadruples into valid and invalid analogies. In this sense, we implement the task of analogy
detection that aims to determine if a quadruple is a valid analogy. For our Identity setting, we
define a valid analogy as a quadruple of four stays (11 , 12 , 23 , 24 ), where each pair of two
stays belong to a single patient  ; other forms of analogies are considered invalid.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Approach</title>
      <p>Our model is made of two components: an embedding model and a classification model. The
second takes as input patient-stay representations computed by the first (see Section 3.1).
Our embedding model is trained along with the classification model. We also detail the data
augmentation procedure in Section 3.2, and describe the training and evaluation protocols that
we followed in Section 3.3.</p>
      <sec id="sec-3-1">
        <title>3.1. Embedding and Classification Models</title>
        <p>The models described in this subsection are schematized in Figure 1.</p>
        <p>
          Embedding Model. As our embedding model, we adapt the Fusion CNN model that was
developed by Zhang et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] to obtain patient-stay representations. They proposed a neural
network architecture that combines structured and unstructured data to obtain patient
representations. The model consists of five parts: static information encoder, temporal signals
embeddings, sequential notes representation, patient representation, and output layer that is
used to predict three diferent clinical tasks.
        </p>
        <p>
          In this work, we restricted structured data to static information, i.e., demographics ()
and admission-related information (), omitting deliberately vitals in this first attempt.
Our model is thus made of static information encoder and sequential notes representation
that are used to obtain patient-stay representations as illustrated by the left frame of Figure 1.
The static categorical features are encoded as one-hot vectors through the static information
encoder. The output of the encoder is  = [; ], where [; ] is the
concatenation of  and .  forms one part of the full patient-stay representation.
As shown in Figure 1, the clinical notes representation part is made of a document embedding
model, 2 convolutional layers, a max-pooling layer, and a flatten layer. To learn the document
embeddings of the clinical notes, we use paragraph vectors (i.e., Doc2Vec) [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The document
embeddings are passed to the convolutional layers and max-pooling layers. The output of the
max-pooling layer is then flattened into , the latent representation of the clinical notes.
Based on the type of data that we decide to consider for our experiments, the final patient-stay
representation can be made of the representation of only static information (i.e., ), the
representation of only clinical notes (i.e., ), or the concatenation of the representation of
clinical notes and static information (i.e., −  = [; ]). The final patient-stay
representation is then passed to our classification model to detect analogies.
        </p>
        <p>
          Classification Model. As in Alsaidi et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], we adapt the neural architecture in Lim et al.
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to our patient-stay setting. Our classification model determines if an analogy  :  ::  : 
is valid by verifying if  and  difer in the same way as  and . The architecture of
the classification model is a Convolutional Neural Network (CNN), which takes as input the
embeddings of size  of four elements , , , . We stack them to get a matrix of size
 × 4. The CNN is made of three layers as depicted in the right frame of Figure 1. The first
convolutional layer with 128 filters of 1 by 2 is applied on the embeddings, such that it analyses
each pair separately without overlaps and measures how  and , and how  and  difer
for each component. The second convolutional layer with 64 filters of 2 by 2 is applied on the
resulting matrix, after which the result is flattened into a 64 × ( − 1) unidimensional vector
and used as input of a fully connected dense layer that produces a single output. The second
layer aims at checking if the diference between  and  is the same as the one between  and
. If  and  are diferent in the same way as  and , then  :  ::  :  is a valid analogy.
The last layer aggregates this information using a sigmoid activation to get a result (i.e., output
of the classification model) between 0 (for invalid analogies) and 1 (for valid analogies). All
layers, except the last one, use Regularized Linear Unit (ReLU) as activation function.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Augmentation</title>
        <p>
          Deep neural network approaches require large amounts of data. Therefore we took advantage
of properties of analogies to produce additional proportions based on our dataset in a process
called data augmentation. Previous works [
          <xref ref-type="bibr" rid="ref19 ref20 ref21">19, 20, 21</xref>
          ] have proposed postulates that analogies
should obey. For this study, we consider the following:
•  :  ::  :  (reflexivity);
•  :  ::  :  (inner reflexivity);
•  :  ::  :  →  :  ::  :  (symmetry);
•  :  ::  :  →  :  ::  :  (inner symmetry);
•  :  ::  :  →  :  ::  :  (central permutation).
        </p>
        <p>Based on the definition of our analogical setting, we can apply all the above-mentioned postulates
to generate valid analogical proportions except for central permutation, which can only be
applied in the very particular case when 1 = 2. When 1 ̸= 2, central permutation cannot
be applied to increase our dataset as it would enable to associate stays of distinct patients,
which is inconsistent with the aim of the Identity setting. Note that from reflexivity and central
permutation we can deduce inner reflexivity. As reflexivity forces 1 = 2, applying it in cases
where 1 ̸= 2 would result in a case where 1 = 2.</p>
        <p>For the cases where 1 ̸= 2, given a valid analogy we can generate eight additional valid
analogical proportions, namely
•  :  ::  : ,
•  :  ::  : ,
•  :  ::  : ,
•  :  ::  : ,
•  :  ::  : ,
•  :  ::  : ,
•  :  ::  : ,
•  :  ::  : ;
•  :  ::  :  and
•  :  ::  : .
and two invalid analogical proportions, namely</p>
        <p>For cases where 1 = 2, we apply reflexivity to generate one more valid analogical proportion,
namely  :  ::  : . Note that for cases where 1 = 2, invalid analogical proportions would
be considered valid.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Training and Evaluation</title>
        <p>As mentioned, we define a valid analogy as a quadruple of four stays (11 , 12 , 23 , 24 ), where
each pair of two stays belong to a single patient  . For each analogy in the dataset, we start by
embedding the four stays. We augment the embeddings using the postulates that we recalled
in Section 3.2. As a result, we generate 9 valid analogical proportions (i.e., positive examples)
and 2 invalid analogical proportions for cases where 1 ̸= 2. For cases where 1 = 2, we
obtain 10 + 2 = 12 valid analogical proportions and no invalid analogical proportions. For
optimization, we use the Binary Cross-Entropy (BCE) loss. To evaluate the classification model
we use the same data augmentation process as for training, and we compute the accuracy and
F1 score.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset description</title>
      <p>
        For our experiments, we used EHRs from the MIMIC-III [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as a source of patient medical
history data. MIMIC-III is a critical care database developed by the Massachusetts Institute of
Technology (MIT)’s Laboratory for Computational Physiology and distributed by PhysioNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
The database is publicly available, where it is accessible to researchers after finishing a HIPAA
training course demanded by the National Institutes of Health (NIH). The database contains
health-related information associated with all patients admitted to the ICU (Intensive Care Unit)
of Beth Israel Deaconess Medical Center between the years 2001 and 2012. It encompasses data
of more than 40,000 ICU patients with more than 60,000 ICU stays. All patients’ data has been
de-identifed in accordance with Health Insurance Portability and Accountability Act (HIPAA).
The dataset contains various types of data such as patient demographics, vital signs, lab test
results, medications, hospital length of stay, procedures, clinical notes, diagnosis codes (ICD-9),
imaging reports, etc.
      </p>
      <p>To build our dataset, we keep only adult patients (i.e., patients aged 18 and above) with at
least two admissions. As we do not define any order constraint, we obtain all the permutations
of all the stays belonging to a patient. We organize our dataset in way where each pair of stay
is associated to the patient it belongs to: ⟨1, 2,     _⟩, where 1 corresponds to
11 , 2 corresponds to 12 , and the associated     _ that represents 1. We obtain a
dataset made of 46,986 triples, where for each two pairs of stays we produce an analogy. For
our experiments, we use all hospital stays associated with randomly selected 200 patients. We
use the data augmentation process to generate positive and negative examples. For training
and evaluation, we perform a random split (using a fixed random seed) in a training set of 70%
of the extracted analogies, the remaining 30% serving as the test set. We end up with 939,638
analogies for training and 402,703 for testing. To maintain reasonable training and evaluation
time, we randomly selected 50,000 analogies from the training set and 50,000 analogies from
the testing set.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment Setup</title>
      <p>We now present the three experiments that we conducted in the Identity setting. In Section 5.1,
we describe the patient-stay features that we consider and the data preprocessing that we
performed for structured and unstructured data. We describe the implementation details in
Section 5.2. The results of our experiments are reported in Section 5.3 and discussed further in
this section. The code used for our experiments is written in Python 3.9 and PyTorch and is
available in the repository https://github.com/Safa-98/patient-stay-analogy.</p>
      <sec id="sec-5-1">
        <title>5.1. Stay Features and Data Preprocessing</title>
        <p>We consider both structured (i.e., demographics and admission-related information) and
unstructured data (i.e., clinical notes) to define our analogies. In this subsection, we describe the
patient-stay features that are utilized by our model and some data preprocessing details.
Static information. In our experiments, our static information consists of demographic
information and admission-related information. For demographic information, we extract patient’s
age, gender, marital status, ethnicity, and insurance information. We keep only adult patients
(i.e., patients aged 18 and above). We split the age into 5 groups [18, 25[, [25, 45[, [45, 65[, [65, 89[
and [89, +∞[. For admission-related information, we include admission type as features.
Clinical notes. Nursing, Nursing/Other, Physician, and Radiology notes make up the majority
of clinical notes in MIMIC-III database. For each hospital stay, we only kept notes that belong
to these 4 categories. We excluded notes that have an error tag and notes that lack a hospital
admission id.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Implementation Details</title>
        <p>To build our corresponding cohorts, we performed the preprocessing described in the previous
section to obtain our patient-stay features. Patients without any records of clinical notes or
with notes that do not belong to the 4 categories defined above were removed. We computed
the median of notes per hospital admission to determine the number of clinical notes to extract
per hospital admission. Therefore, we kept the first 12 notes, and used padding ( i.e., completion
with zeros) for hospital admissions with less than 12 notes.</p>
        <p>
          For the unsupervised Doc2Vec model [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], we finetune it on the training set to obtain the
document-level embeddings using the Gensim toolkit [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. For the training algorithm, we use
PV-DBOW (Paragraph vector-Distributed Bag of Words). We set the number of training epochs
as 30, the initial learning rate as 0.025, the learning rate decay as 0.0002, and the dimension of
vectors as 200 to train. The Fusion CNN model is trained with Adam optimizer with a learning
rate of 0.0001 and ReLU as the activation function. The chosen batch size is 64.
        </p>
        <p>In this paper we perform three experiments. In the first, we consider both structured and
unstructured data. Therefore, we obtain our patient-stay representation by concatenating the
representations of clinical notes along with static information. In this experiment, we verify if a
particular hospital stay belongs to a patient by looking at both the structured and unstructured
data associated with each stay. In the second, we only consider unstructured data, which means
that our patient-stay representations are based solely on the representations of clinical notes.
Therefore, by looking at clinical notes associated with a single hospital stay, we check if a
particular hospital stay belongs to a patient. In the third, we only consider structured data,
which means that our patient-stay representations are based solely on the representations of
static information (i.e., demographics and admission-related information). Therefore, we verify
if a particular hospital stay belongs to a patient by looking at the static information that is
associated with a hospital stay.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Results and Discussion</title>
        <p>As mentioned previously, we conducted three experiments that mainly difer in what type of
data was used to obtain our patient-stay representations. For all the experiments, we used 50,000
analogies for training and evaluation, and applied the same procedure for data augmentation.
We report the accuracy and F1 score for each experiment. The F1 score gives a better measure
of the incorrectly classified cases than the accuracy metric.</p>
        <p>For the first experiment, we fed our embedding model with both structured ( i.e., demographics
and admission-related information) and unstructured data (i.e., clinical notes). Our patient-stay
representations are thus made of the concatenation of static information and clinical notes. We
chose the epochs where the training loss is at the local minimum. We trained our model for 10,
20, and 40 epochs, with 3 diferent random initializations in each case. Our results are detailed
in Table 1. Our model performs the best for positive examples. For 40 epochs, the model gives
the best result for valid analogies and performs best for invalid analogies for 20 epochs.</p>
        <p>For our second experiment, we used only unstructured data, i.e., the  part of the
embedding for the patient-stay representations. Our patient-stay representations thus consisted of
only the representation of clinical notes. The training loss was at the local minimum for 15,
20, and 40 epochs. Therefore, we trained our model for 15, 20, and 40 epochs, with 3 diferent
random initializations in each case. As shown in Table 2, our model performs the best for
positive examples when we train by 20 epochs.</p>
        <p>For our third experiment, we used only structured data, i.e., , to represent our
patientstay representations. Our training loss was at the local minimum for 15, 20, and 40 epochs.
Therefore, we trained our model for 15, 20, and 40 epochs, with 3 diferent random initializations
in each case. We report our results in Table 3. As seen, the accuracy for positive examples is
high for all cases compared to negative examples where the accuracy drops.</p>
        <p>
          In all our experiments, we can see that our model performs the best for positive examples
regardless of whether we use [; ], only , or only  for the patient-stay
representations. This can be explained as a result of the imbalance between positive and
negative examples in the training data. Balancing the data would be the next step as it proved
to be a good solution for [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to get similar results for positive and negative examples. The
accuracy for valid analogies is the highest when our embedding model is fed with only static
information. Between the first and the second experiment, the accuracy is the highest for valid
analogies when the patient-stay representations are made of the concatenation in contrast
to when our patient-stay representations are made of only clinical notes. This indicates that
adding or using static information when learning patient-stay representations, as in the first
and third experiment, improves the performance of our model, where it allows the model to
better distinguish the stays and to match them to the patient they belong to. We also notice
that the accuracy for invalid analogies is the highest when the embedding model is fed with
only clinical notes. For all performed experiments, the F1 score is high, which indicates that
our model is able to correctly classify analogies to the class they belong to (i.e., valid or invalid).
        </p>
        <p>To gain more insight into how our models perform, we conducted an error analysis where
we noticed that most misclassifications were spotted in two cases.</p>
        <p>1. Cases where 1 = 2.</p>
        <p>To recall, we do not generate invalid analogies for cases where 1 = 2; therefore, invalid
analogy forms ( :  ::  :  and  :  ::  : ) should be considered valid in these
cases. In our error analysis, we noticed that when the four stays belong to the same
patient, our model classifies the above-mentioned invalid analogy forms as invalid instead
of valid. We believe that our model was not trained enough to distinguish these forms
of analogies as there were less analogies with four stays belonging to the same patient
generated in our dataset.
2. Cases where representations are made of only clinical notes.</p>
        <p>To recall, in our second experiment we only used the representations of clinical notes to
obtain patient-stay representations. We noticed that when the category of the clinical
notes is similar between two hospital stays or when two hospital stays have less than five
clinical notes, our model struggles to distinguish between the two hospital stays. This
indicates that in some cases using only clinical notes to learn patient-stay representations
might not be suficient as these notes might not contain enough information to help our
model diferentiate between two similar stays that belong to two distinct patients. As a
result, the model would incorrectly match these two similar stays to the same patient.</p>
        <p>In these experiments, we did not include temporal data, where we only used demographics
and admission-related information as structured data. It would be interesting to also include
temporal signals (i.e., vital signs) along with demographics and admission-related information
as structured data. Our patient-stay representations would be then made of the concatenation
of the representations of static information and temporal signals as structured data and the
representation of clinical notes as unstructured data.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Perspectives</title>
      <p>
        We adapted the approach in [
        <xref ref-type="bibr" rid="ref3 ref6">3, 6</xref>
        ] from semantic and morphological analogies to patient-stay
analogies. Our prototypical architecture has some limits, but seems promising for the task
of patient identification. Our classification model is flexible in terms of the analogies that it
classifies. Changing the way the data is augmented will change the way the model behaves.
Our model can be adapted to diferent healthcare applications through dedicated embedding
models [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Inspired by [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], we implemented a model to build patient-stay representations.
As mentioned in Section 5.3, there are multiple plausible improvements to our approach, in
terms of balancing valid and invalid analogies as well as including other types of data to build
our patient-stay representations. As we limited ourselves to analogy detection, a future work
would be to address analogy solving in the same setting that would allow the generation of
synthetic patient-stays.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Experiments presented in this paper were carried out using computational clusters equipped
with GPU from the Grid’5000 testbed (see https://www.grid5000.fr).</p>
      <p>The research work of the second and third named authors is partially supported by TAILOR,
a project funded by EU Horizon 2020 research and innovation program under GA No 952215,
and the Inria Project Lab “Hybrid Approaches for Interpretable AI” (HyAIAI).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lepage</surname>
          </string-name>
          ,
          <article-title>Morphological predictability of unseen words using computational analogy</article-title>
          ,
          <source>in: Workshops Proceedings for the Twenty-fourth International Conference on Case-Based Reasoning (ICCBR)</source>
          , volume
          <year>1815</year>
          ,
          <year>2016</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Deep visual analogy-making</article-title>
          ,
          <source>in: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1252</fpage>
          -
          <lpage>1260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Prade</surname>
          </string-name>
          , G. Richard,
          <article-title>Solving word analogies: A machine learning perspective, in: Proceedings of the Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU)</article-title>
          , volume
          <volume>11726</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Proceedings of the Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hertzmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oliver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Curless</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Salesin</surname>
          </string-name>
          ,
          <article-title>Image analogies</article-title>
          ,
          <source>in: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH)</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alsaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lay</surname>
          </string-name>
          , E. Marquer,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Murena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <article-title>A neural approach for detecting morphological analogies</article-title>
          ,
          <source>in: Proceedings of the 8th IEEE International Conference on Data Science and Advanced Analytics (DSAA)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casteleiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Diz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maroto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J. F.</given-names>
            <surname>Prieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wroe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Torrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>Semantic deep learning: Prior knowledge and a type of fourterm embedding analogy to acquire treatments for well-known diseases</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dynomant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lelong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dahamna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Massonnaud</surname>
          </string-name>
          , G. Kerdelhué,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grosjean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Darmoni</surname>
          </string-name>
          , et al.,
          <article-title>Word embedding for the french natural language in health care: comparative study</article-title>
          ,
          <source>JMIR medical informatics 7</source>
          (
          <year>2019</year>
          )
          <fpage>118</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Rather</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Using deep learning towards biomedical knowledge discovery</article-title>
          ,
          <source>International Journal of Mathematical Sciences and Computing</source>
          ,
          <source>(IJMSC) 3</source>
          (
          <issue>2017</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. E. W.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , T. J.
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L. wei H.</given-names>
          </string-name>
          <string-name>
            <surname>Lehman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Ghassemi</surname>
            , B. Moody, P. Szolovits,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Celi</surname>
          </string-name>
          , R. G. Mark,
          <article-title>Mimic-iii, a freely accessible critical care database</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Jim</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <article-title>Deep representation learning of patient data from electronic health records (ehr): A systematic review</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>115</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kowsari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Lobo</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. E. Barnes,</surname>
          </string-name>
          <article-title>Patient2vec: A personalized interpretable deep representation of the longitudinal electronic health record</article-title>
          ,
          <source>IEEE Access 6</source>
          (
          <year>2018</year>
          )
          <fpage>65333</fpage>
          -
          <lpage>65346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Madhumita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walter</surname>
          </string-name>
          ,
          <article-title>Patient representation learning and interpretable evaluation using clinical notes</article-title>
          ,
          <source>Journal of biomedical informatics 84</source>
          (
          <year>2018</year>
          )
          <fpage>103</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Zhang,</surname>
          </string-name>
          <article-title>Combining structured and unstructured data for predictive models: a deep learning approach</article-title>
          ,
          <source>BMC Medical Informatics and Decision Making</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <fpage>280</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Waruhari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Babic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nderu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Were</surname>
          </string-name>
          ,
          <article-title>A review of current patient matching techniques</article-title>
          ,
          <source>in: Informatics Empowers Healthcare Transformation (ICIMTH)</source>
          , volume
          <volume>238</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Meurers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Johns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Prasser</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving data sharing infrastructures for medical research: systematization and comparison</article-title>
          ,
          <source>BMC Medical Informatics Decision Making</source>
          <volume>21</volume>
          (
          <year>2021</year>
          )
          <fpage>242</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B. H.</given-names>
            <surname>Just</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. T.</given-names>
            <surname>Marc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Munns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Sandefer</surname>
          </string-name>
          ,
          <article-title>Why patient matching is a challenge: Research on master patient index (mpi) data discrepancies in key identifying fields</article-title>
          .,
          <source>Perspectives in health information management 13</source>
          (
          <year>2016</year>
          )
          <article-title>1e</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Distributed representations of sentences and documents</article-title>
          ,
          <source>in: Proceedings of the 31th International Conference on Machine Learning (ICML)</source>
          , volume
          <volume>32</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Miclet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bayoudh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delhay</surname>
          </string-name>
          ,
          <article-title>Analogical dissimilarity: Definition, algorithms and two experiments in machine learning</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>32</volume>
          (
          <year>2008</year>
          )
          <fpage>793</fpage>
          -
          <lpage>824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lepage</surname>
          </string-name>
          , De l'
          <article-title>analogie rendant compte de la commutation en linguistique, Habilitation à diriger des recherches</article-title>
          , Universit'e
          <string-name>
            <surname>Joseph-Fourier - Grenoble</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Antic</surname>
          </string-name>
          , Analogical proportions, ArXiv abs/
          <year>2006</year>
          .02854 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Goldberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. N.</given-names>
            <surname>Amaral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Glass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hausdorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Mark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Mietus</surname>
          </string-name>
          , G. B. Moody, C.
          <article-title>-</article-title>
          K. Peng,
          <string-name>
            <given-names>H. E.</given-names>
            <surname>Stanley</surname>
          </string-name>
          , Physiobank, physiotoolkit, and
          <article-title>physionet: components of a new research resource for complex physiologic signals</article-title>
          .,
          <source>Circulation 101 23</source>
          (
          <year>2000</year>
          )
          <fpage>E215</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Řehůřek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sojka</surname>
          </string-name>
          ,
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          ,
          <source>in: Proceedings of the LREC Workshop on New Challenges for NLP Frameworks</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alsaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Couceiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Garcelon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Coulet</surname>
          </string-name>
          ,
          <article-title>Exploring analogical inference in healthcare</article-title>
          ,
          <source>in: Workshop on Interactions between Analogical Reasoning and Machine Learning (IARML)</source>
          ,
          <year>2022</year>
          (to appear).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>