<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Embedding-based Metrics to expedite patients recruitment process for clinical trials</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>1st Houssein Dhayne</string-name>
          <email>houssein.dhayne@net.usj.edu.lb</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>2nd Rima Kilany</string-name>
          <email>rima.kilany@usj.edu.lb</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Engineering, ESIB, Saint Joseph University</institution>
          ,
          <addr-line>Beirut</addr-line>
          ,
          <country country="LB">Lebanon</country>
        </aff>
      </contrib-group>
      <fpage>23</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>-Despite the unprecedented volumes of Electronic Medical Records (EMRs) generated daily across healthcare facilities, the ability to leverage these data for patient participation in clinical trial remains overwhelmingly unfulfilled. The reason behind this is that matching patient information to the eligibility criteria for clinical trials is a manual, effort-consuming process. Therefore, automating this process is an essential step in improving the number of patients participating in clinical research. To address this issue, we propose a novel framework for automated patients to clinical trials matching. The matching process is based on measuring the similarity score between phrases extracted from patient medical records and the eligibility criterion for a trial. Our solution is based on a combination of NLP techniques and modern deep learning-based NLP models. In this context, we follow pre-training and transfer learning approaches to help the model learn task-specific reasoning skills. Additionally, we perform supervised fine-tuning on large Medical Natural Language Inference (MedNLI) and Semantic Textual Similarity (STSB) datasets. The matching process was performed at semantic phrases level by converting patient information and trial criteria into vector representations. We then used a scoring function that combined cosine similarity and scaling normalization to identify potential patient-trial matches. The experimental results have shown that our framework is highly effective in sorting out patients by their similarity scores. Index Terms-NLP, NLI, EMR, Automated clinical trial eligibility screening, BioBERT, Sentence similarity</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        The widespread adoption and use of electronic medical
records (EMRs), together with the development of advanced
artificial intelligence models, offer remarkable opportunities
for improving the clinical research sector [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Furthermore,
EMRs offer a wide range of potential uses in clinical trials
such as facilitating the clinical trial feasibility assessment and
patient recruitment, as well as obtaining main patient health
information and medical history prior to their screening visit.
The latter is a critical step in reducing the costs and duration
of clinical trials [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Additionally, linking EMRs with clinical
trials has been shown to increase patient recruitment rate [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
However, there are many barriers to overcome in order to use
EMRs for clinical trials.
      </p>
      <p>
        Even though EMRs were designed to record information in
a structured format, such as procedure information, diagnosis
codes, drug prescriptions, and lab results, free text remains
the most flexible way for physicians to express case nuances
and clinical reasoning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These free texts usually contain
important facts about patients, but they are rarely available
for formal queries [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>On the other hand, eligibility criteria for a clinical trial
describes the characteristics of patients who are qualified to
participate in the trial. Each criterion is usually expressed as
a descriptive text and specified in the form of inclusion and
exclusion criteria. Therefore, free text criteria can not always
be transformed into structured data representations.</p>
      <p>
        Authors in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] confirmed that using only structured data
from the EMR is insufficient in resolving eligibility criteria
for patient recruitment in clinical trials, and that unstructured
data is essential to resolve 59% to 77% of the trial criteria.
      </p>
      <p>However, matching clinical notes with eligibility criteria is
still a manually performed task, which makes it an expensive
process in terms of time and effort. This slows down clinical
trials and may delay new drugs from benefiting patients. As a
consequence, it might entail the loss of human lives that
otherwise would have been able to benefit from new medication.
For these reasons, automated matching of clinical notes with
eligibility criteria in the eligibility screening workflow would
help overcome the bottlenecks of pre-screening practices in a
trial setting.</p>
      <p>To tackle the above challenge efficiently, we need to execute
a matching process at a semantic sentence level, rather than by
just checking for the presence or absence of a lexical criterion.
The investigation of the potential use of modern deep
learningbased NLP(Natural Language Processing) models, led us to
propose a framework that would automate the evaluation of
the eligibility of patients to be candidates for a relevant clinical
trial. As a first step, the framework splits patient clinical
report and clinical trial sentences into comparatively basic
phrase units. Secondly, it classifies the phrases into various
clinical categories (diagnosis, drug, procedure, observation).
Thirdly, the framework converts candidate phrases into
vector representations using an appropriate deep learning-based
NLP model. Finally, it calculates a semantic matching score
between patients and a clinical trial by using a combination of
cosine similarity alongside a scaling normalization method.</p>
      <p>This paper is organized as follows: In section II, we expose
the problem definition and review the related works. In
section III, we describe our framework and illustrate the different
challenges. The evaluation of the results and outcomes is
discussed in section IV. Finally, we conclude this paper in
section V.</p>
      <p>II. BACKGROUND</p>
      <sec id="sec-1-1">
        <title>A. Problem definition</title>
        <p>According to our approach, the problem definition of
patient-trial matching can be described as follows:</p>
        <p>Finding clinical trial participants is the task of matching
Patient Pi(Pi 2 EM R) represented by a Discharge Summary
DSi to a Clinical Trial CT represented by an Eligibility
Criteria EC. Formally, the solution to this task is to find
the top-K highest-values of function M which computes the
matching score denoted by:</p>
        <p>M(Pi, CT ) = v which represents the score of matching
patient Pi to a CT .</p>
        <p>This list of the top-K highest-scores reduces the overall
number of patients that will need to be screened by clinicians
in order to identify eligible patients.</p>
        <p>B. Data representation</p>
        <p>
          1) Clinical trial: A clinical trial is a type of research that
provides a longstanding foundation in the practice of medicine
and the evaluation of new medical treatments. Each trial has
eligibility criteria describing the characteristics according to
which a patient or participant must meet all inclusion criteria
and none of the exclusion criteria. In this respect, the criteria
differ from study to study. Authors in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] analysed 1000
eligibility criteria and showed that 23% of the criteria are
simple, or can be reduced to simple criteria, and that 77% of
the criteria remain complex to evaluate. Therefore, a formally
computable representation of eligibility criteria would require
natural language processing techniques as part of automated
screening for patient eligibility.
        </p>
        <p>
          2) Patient medical records: An EMR typically collects
various types of patient information, including patient discharge
summaries, prior diagnoses, radiology reports, medication
history, and so on. Hospital discharge summaries are a
physicianauthored synopsis of a patient’s hospital stay, which serve
as the main documents communicating a patients care plan
to the post-hospital care team [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Discharge summaries are
organized in several sections. These sections usually include
past medical history and history of present illness as shown
in fig.1.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>C. Related work</title>
        <p>
          In the recent past, several projects have developed tools
and technologies for automated trial-patient matching. Milian
et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] used a template-based formalism to extract and
represent the semantics of the trial criteria in order to improve
their comparability. Patel et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] formulated the matching
process as a semantic retrieval problem by expressing clinical
trial criterion in the form of semantic query, which a reasoner
can then use with a formal medical ontology - SNOMED CT
to retrieve eligible patients. Other works such as EliIE [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and
Criteria2Query [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] have focused on identifying standardized
medical entities in eligibility criteria using machine learning
approaches, the extracted entities being then used to query
patient data. Shivade et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] constructed an annotated dataset
that determined whether the medical note contains text that
meets a criterion or not. Then, they implemented two lexical
methods and two semantic methods to determine a relevance
score of each sentence with a criterion statement, and found
that semantic methods gave better results than lexical methods.
Ni et al [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] evaluated a system using a combination of NLP,
information retrieval and machine learning methods to identify
a cohort of patients for clinical trial eligibility pre-screening.
Their system relies on both structured data and clinical notes
from EMRs.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>III. FRAMEWORK OVERVIEW</title>
      <p>In this section, we describe the framework we propose
for automating the matching process between patients and a
clinical trial. This framework takes into account the following
different challenges; (i) In order to treat complex sentences
in patient’s data as well as in clinical trials, we break down
paragraphs into sentences and complex sentences are then
parsed into phrases. These phrases are the basic units for
matching. (ii) To avoid costly comparisons without fault
dismissals, phrases are partitioned using classification methods,
which limits the number of pairs to match. (iii) To match
phrases, we represent them in the form of distributed vectors,
which enables calculating similarity for formally different but
semantically related phrases. Fig. 2 shows an overview of
our Patients to Clinical Trial matching framework. Given a
Clinical Trial CT and set of Patients P, our task is to calculate
a Matching score M(Pi, CT ).</p>
      <sec id="sec-2-1">
        <title>A. Paragraph and sentence decomposition</title>
        <p>
          In order to measure the similarity between two sentences,
we have to deal with a simple sentence representing a
linguistically-meaningful unit. This process requires
segmenting both paragraph-level and sentence-level structures into
phrase-level structures. According to [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], segmentation of
paragraphs and sentences is the process of parsing the longer
processing units, consisting of one or more words, to further
processing stages such as part-of-speech parsers,
morphological analyzers, etc.
        </p>
        <p>In our model, we handle each phrase as a primitive semantic
unit and find matching phrases between patient and clinical</p>
        <p>TABLE I</p>
        <p>EXAMPLE OF SENTENCES SEGMENTATION INTO PHRASES
Paragraph Phrases</p>
        <p>Eligibility Crieteria NCT03484780
Previous open laparotomy 1- Previous open laparotomy
or contraindications to 2- contraindications to laparoscopy
laparoscopy, as determined by 3- determined by implanting
implanting physician. physician</p>
        <p>Discharge Summary
ifiaHmnrbiytsrtoetihrcloelyarayrptddiaoioissanfetl.apwisHaneirtfiohassxtrtoaaycrnsttiyumtoiscnoaopflaocagsotturriloaanltaiorny 1fi23d4i----bsreswHHitaliaiilstssathettutooisaorryynnptoooicsfftocpamoagryruooolnxacyaatisrromydniaaaillrntaienttrhrfyiaearlpctaisotn.
trials by calculating the similarity of each phrase in the
discharge summary to each phrase in Eligibility Criteria (EC).</p>
        <p>
          We used paragraph and sentence segmentation of
MetaMap [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. MetaMap was provided by the National
Library of Medicine (NLM) to map Medical Language
Processor (MLP) text to the UMLS Metathesaurus
concepts [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. MetaMap breaks text into paragraphs,
sentences, and then phrases. Table I presents a simple
example of segmenting sentences into phrases. The first refers
to the eligibility criteria (NCT03484780) and the second
illustrates an example from a patient discharge summary.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Phrases classification</title>
        <p>
          A discharge summary report contains information about
different topics. Therefore, the large number of heterogeneous
phrases extracted from the patient reports may affect the
efficiency and effectiveness of pairwise phrase matching [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>To minimize the number of required comparisons, we
applied a filtering methodology. The latter aims to filter all
the classes of phrases that do not correspond to a given class,
which limits the number of pairs to match.</p>
        <p>Data classification techniques could support achieving this
filtering by separating phrases extracted from patient data and
clinical trial into different medical categories. This
classification filters-out non-matching pairs prior to verification, which
increases the efficiency of phrases similarity matching with
high precision and without sacrificing recall.</p>
        <p>In our study, a total of 1500 eligibility criteria were extracted
from a Clinical Trials database1 and were manually labelled by
a certified nurse and a data science master student according
to four classes (diagnosis, drug, procedure, observation).</p>
        <p>
          In this work, we have empirically explored and compared
four methods widely used in classification as our baseline:
SVM, CNN, LSTM, C-LSTM [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], in order to identify the
ones with the best performance. For SVM and CNN models,
we initialized word embeddings by the average of the word
embedding over all words in the sentence via
PubMed-andPMC-w2v [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>Our experiment indicates that CNN + w2v model has the
best prediction performance in comparison to the other models
selected in our exploration, with a Precision of 0.87, a Recall
of 0.88, and a F1-score of 0.875. We therefore adopted CNN +
PubMed-and-PMC-w2v to perform this classification task and
were able to categorize the phrases into the four pre-mentioned
categories.</p>
      </sec>
      <sec id="sec-2-3">
        <title>C. Phrase vector representations</title>
        <p>The purpose of this work is to allow the matching of patients
data and clinical trials by comparing unstructured data from
both datasets. Our claim is that by measuring the similarity
of primitive semantic medical units (medical phrases) of a
patient’s Discharge Summary and Eligibility Criteria, we can
generate a score value supporting the matching task.</p>
        <p>
          There are plenty of measures of semantic similarity between
sentences used in NLP. Unsupervised and supervised methods
have been used to calculate the semantic similarity between
two sentences in the biomedical domain [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Recently, a
number of novel approaches have been proposed to address
this problem by producing sentence vectors [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. As an
example, Neural sentence-embedding methods [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] have been
shown to outperform traditional approaches, such as TF-IDF
and word overlap based measures.
        </p>
        <p>
          1) Universal sentence embeddings: The concept of
universal sentence embeddings has grown in popularity as it
leverages models trained on large text corpora. These
pretrained models can be used in a wide range of downstream
tasks, such as providing versatile sentence-embedding models
that convert sentences into vector representations. Notable
works include ELMo [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], GPT [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], and BERT [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
        <p>
          2) BioBERT: BERT (Bidirectional Encoder
Representations from Transformers) is a neural network language model
trained on plain text for masked word prediction and next
sentence prediction tasks. BERT applies multi-layer bidirectional
transformer encoder with self-attention. According to [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ],
BERT overall achieved state-of-the-art performances in many
Natural Language Processing tasks and was significantly better
than other models. However, compared against more recent
models, XLNet [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] outperforms BERT and achieves better
prediction metrics on the GLUE benchmark [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], but is not yet
widely used in the medical field. Applying the same
architecture as BERT, Lee et al. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] proposed the BioBERT language
model trained on biomedical corpora including PubMED and
PMC. The BioBERT model showed promising results in the
biomedical domain.
        </p>
        <p>
          3) Phrase embedding: In this respect, to generate
contextrich phrase embeddings, we chose BioBERT as the language
model in conjunction with the Bert-as-service library [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ].
Bert-as-service is a feature extraction service based on BERT
which uses two strategies to derive a fixed-sized vector. In the
default strategy, Bert-as-service does average pooling of all
of the tokens of second-to-last hidden layer, while the second
uses the output of the special CLS token and is recommended
only after fine-tuning BERT on a downstream task.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>D. Phrases Similarity Measures</title>
        <p>The similarity between two vectors can be evaluated using
various similarity measures such as Cosine similarity,
Eu</p>
        <sec id="sec-2-4-1">
          <title>Discharge</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Summary</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>Eligibility</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>Criteria</title>
        </sec>
        <sec id="sec-2-4-5">
          <title>Phrases Phrase</title>
        </sec>
        <sec id="sec-2-4-6">
          <title>Segmentation Classification</title>
        </sec>
        <sec id="sec-2-4-7">
          <title>Phrase</title>
        </sec>
        <sec id="sec-2-4-8">
          <title>Embedding</title>
          <p>Diagnosis
Drug
Procedure
Diagnosis
Drug
Procedure
Fine-tuned
BioBERT</p>
        </sec>
        <sec id="sec-2-4-9">
          <title>Pairwise Cosine</title>
        </sec>
        <sec id="sec-2-4-10">
          <title>Distance</title>
        </sec>
        <sec id="sec-2-4-11">
          <title>Maximum</title>
        </sec>
        <sec id="sec-2-4-12">
          <title>Cosine similarity</title>
        </sec>
        <sec id="sec-2-4-13">
          <title>Ranking &amp;</title>
        </sec>
        <sec id="sec-2-4-14">
          <title>Scoring</title>
          <p>
            clidean distance, and Manhattan distance. Since these simi- reducing thus the need for many heavily-engineered
tasklarity metrics are a linear space in which all dimensions are specific architectures.
weighted equally, we perform here the similarity matching In the context of natural language understanding (NLU)
metrics of different phrases by ranking these phrases according technology, comparing the relationship between two
sento the cosine similarity. Therefore, the rank of similarity can tences is based on several downstream tasks such as Natural
be obtained by the equCalintiiocanlsTrpiarelsented in (1) and (2). Language Inference (NLI) and Semantic Textual Similarity
iec1 iec2 iec3 exec·4y (STS) [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ]. Besides that, authors in [
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] have shown that
PP21 00..79 cos00(..x63 , y)00=..31 ||x00|..|85· ||y|| (1) efimneb-etudndiinnggsBwERhiTchonacNhiLeIveanadn SiTmSprdoavteamseetsntcroefat1es1.7senptoeinnctes
P3 0.2 0.4 0.2 0.1 compared to InferSent [
            <xref ref-type="bibr" rid="ref34">34</xref>
            ] and 5.5 points compared to the
P4 0i.5f co0s.8(A, B0.7) &gt; 0c.o2s(A, C ) (2) Universal Sentence Encoder [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. In this context, we first
thePn5 A0.i1s m0o.2re s0i.8mila0r.9 to B than C . fine-tuned BioBERT on STS-B dataset that generated our
          </p>
          <p>BioBERT-based model. We then further fine-tuned on MedNLI</p>
          <p>
            Whereas a pre-trained BioBERT knowledge often shows a dataset. We used the fine-tuning classifier from BERT
sysgood performanceec1 for ecc2ertaienc3 taskesc4, as Swcoere shall see later on, tems [
            <xref ref-type="bibr" rid="ref35">35</xref>
            ].
this prior knPPo21wle35d..80ge is03..n83ot s01u..04fficie--42n..45t to 07c..o23mpute the similarity • MedNLI [
            <xref ref-type="bibr" rid="ref36">36</xref>
            ]: is a large, publicly available, expert
annoof sentencesP3base0d.6 on 1t.h7eir e0.m7bed0d.0ings3..0Indeed, we first tried tated dataset drawn from the medical history section of
to compute Pt4he 2c.o5sine5.0simi4l.a3rity -0o.6f se1n1t.2ences, annotated by MIMIC-III. MedNLI includes a set of clinical sentence
experts, usinPg5 ex0t.r0acte0d.0emb5e.0ddin-g5.0fro m0.0 pre-trained BioBert, pairs(14,049 pairs). They were annotated with one of
without any fine-tuning. The result of the comparison was three classes: entailment, contradiction, and neutral.
unsatisfactory and unacceptable (table II). The most significant • STS-B [
            <xref ref-type="bibr" rid="ref37">37</xref>
            ]: is a collection of sentence pairs selected from
sentence is the exact opposite, for example; the most similar news headlines. The dataset consists of paired sentences
sentence of ”History of CVA” was ”patient has normal brain (8,628 pairs) labelled by humans with a similarity score
MRI” with similarity value of 0.91 which was annotated of 1 to 5 denoting how similar the two sentences are in
by experts as ”contradiction”, and the ”Entailment” sentence terms of semantic meaning.
”patient has history of stroke” appears in the second place 2) Evaluation of fine-tuned BioBERT: We evaluated the
with similarity value of 0.89. Therefore, foregoing experiments new BioBERT model by computing the cosine similarity
reinforced our belief that it is necessary to fine-tune BioBERT between the phrase embeddings. We observed that the model,
on our downstream task. was not just able to rank phrases in terms of similarity, but
1) Supervised Fine-tuning: Transfer learning is the process also gave a more appropriate cosine value. A representative
of extending a pre-trained model by leveraging data from an sample of the results is depicted in Table II.
additional domain for a better model generalization [
            <xref ref-type="bibr" rid="ref32">32</xref>
            ]. The
most common transfer learning techniques in NLP is fine- E. Matching Patients to Clinical Trials
tuning. Fine-tuning involves copying the weights from a pre- After fine-tuning the BioBERT model for optimized cosine
trained network and tuning them using labeled data from the similarity and creating both Discharge Summary and Clinical
downstream tasks. BERT is a fine-tuning based representation Trial phrases embeddings, we proceeded to find Clinical Trial
model that achieves state-of-the-art performance on a large participants from an EMR dataset.
suite of sentence-level tasks, with pre-trained representations Formally, we denote:
          </p>
          <p>History of hypercholesterolemia and the patient was in a MVC.
spoempteicyuelacresradgiosewasaes sin/pvoglavsetrdicinbayploasws- the patient has no medical history.
speed MVC.</p>
          <p>the patient has no significant injuries.
• DSi = {phi,1, phi,2, ..., phi,r} as the phrases extracted</p>
          <p>from Discharge Summary of patient Pi.
• I EC = {iec1, iec2, ..., iecp} as the phrases extracted</p>
          <p>from Inclusion Eligibility Criteria.
• EEC = {eec1, eec2, ..., eecq} as the phrases extracted</p>
          <p>from Exclusion Eligibility Criteria.
• EC = {ec1, ec2, ..., ecl} = I EC [ EEC | l = p + q as</p>
          <p>
            all phrases extracted from Eligibility Criteria.
• S 2 [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]n⇤ l as the cosine Similarity matrix, where
n and l are the number of Patients and EC elements,
respectively.
          </p>
          <p>1) Matching Patient to Eligibility Criteria: Once phrases
embedding are computed for the patients and the clinical
trial eligibility criteria, we calculate the similarity between
phrases of the same class (Diagnosis, Drug, Procedure,... ) as
defined in sub-section III-B. An element si,j of S represents
the similarity between patient criteria Pi and single eligibility
criteria ecj . The similarity function is defined by calculating
the cosine between each phrase phi,r extracted from DSi and
ecj , then only the higher cosine value of similarity is retained
for si,j and all other values are discarded.</p>
          <p>si,j =</p>
          <p>max (cos(phi,r, ecj ))
8 phi,r2 DSi</p>
          <p>i 2 [1, n] &amp;j 2 [1, l]</p>
          <p>Once the similarity values obtained, the final representation
of S would be as follows:</p>
          <p>2 maxph1,r (cos(ph1,r, ec1))
S = 4 .</p>
          <p>.</p>
          <p>.</p>
          <p>. 5
maxphn,r (cos(phn,r, ecl))</p>
          <p>3
2) Ranking and Scoring Patients: The semantic cosine
similarity calculated in the previous paragraph enables a
proportional similarity instead of exact text semantic matching.</p>
          <p>Therefore, when we compare similarity values obtained for
different features (eligibility criteria) in the generated matrix
S, we notice that just because the value of similarity is higher,
that does not mean that the similarity with the patient is
greater. For example if sx,1 and sy,2 represent the highest value
of the features ec1 and ec2, respectively, and if sx,1 &gt; sy,2,
this does not mean that Px has a phrase more similar to ec1
than Py for ec2 (as a noticed in equation 2), but only means
that Px and Py are ranked respectively at the top similar of
the list for ec1 and ec2. The same logic applies for the lowest
value, which represents the last order of similarity.</p>
          <p>This variation in the similarity values between features
requires a range normalization step to enable rank similarity
instead of cosine similarity, which supports perfectly the
computation of a matching score between patients and the
Clinical Trial. To this end, we generated a new matrix R by
applying the following feature scaling normalization:
ri,j =
( n ⇥ maxs8ii,(jsi,mj)in8 mi(insi8,ij()si,j )
( n) ⇥ maxs8ii,(jsi,mj)in8 mi(ins8i,ij()si,j )
;
;
ecj 2 I EC
ecj 2</p>
          <p>EEC
(3)</p>
          <p>Finally, the matching score M of Patient Pi with a Clinical
Trial is determined by:</p>
          <p>M(Pi, CT ) =</p>
          <p>l
X rij.</p>
          <p>j=1</p>
          <p>
            To validate our framework, we used two datasets;
MIMICIII (Medical Information Mart for Intensive Care) [
            <xref ref-type="bibr" rid="ref38">38</xref>
            ]
comprising information relating to patients admitted to critical care
units, and Clinical Trials 2 a Web-based resource providing
access to information on supported clinical studies.
(4)
(5)
Fig. 3. The eligibility criteria specified in the NCT04078425 clinical trial
          </p>
          <p>MIMIC III Clinical Dataset is a critical care database that
contains 2,083,108 medical reports from 46,520 patients. We
experimented with a randomly selected dataset of 100
Discharge Summaries from patients last visit, excluding patients
whose ages are under 18. The segmentation stage produces an
average of 400 phrases per report.</p>
          <p>We selected a clinical trial that identifies the role of
Aldosterone antagonist in patients of heart failure with preserved
ejection fraction (NCT04078425). Fig. 3 shows the five
eligibility criteria of this clinical trial.</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>B. Evaluation of the obtained results</title>
        <p>Table III presents the results for a sample of ten patients. In
order to evaluate the clinical correctness of patients matching
to the clinical trial(NCT04078425), a validation task was
performed manually by a nurse and a computer science student.
The noteworthy fact is that the evaluation of the matching
does not reveal false positives in the score results. Indeed, the
similarity scores reflect the order of matching between patients
and the clinical trial. The score distribution ranged from (-15)
to (8), and eligible patients to be retained for further screening
by experts were those with a score greater than 5.</p>
        <p>We should note that the scores would be more realistic if
the segmentation process was more accurate. For instance, the
sentence ”you were thought to have a blood clot in your right
leg” was segmented by Metamap into ”a blood clot in your
right leg” which would result in a false outcome.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>V. CONCLUSION</title>
      <p>EMRs contain a large portion of unstructured data that need
to be matched with eligibility criteria for trial-patient
enrollment. Indeed, the gradual improvement of artificial intelligence
technology could reduce the number of physician-hours spent
in screening patient eligibility. To tackle the problem, we
proposed a framework designed to automatically recommend the
most suitable patients for a clinical trial. The framework adopts
a pre-trained language model (BioBERT) and uses STS-B and
MedNLI datasets to improve the accuracy of the model via
transfer learning. This work verified that the fine-tuning of
BioBERT shows better performance in calculating the
similarity between two medical sentences using embedding-based
metrics. In future works, we will also explore EMRs structured
tables in order to significantly improve the performance and
accuracy of our trial-patient matching framework.</p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGMENT</title>
      <p>The authors would like to thank Marvin Moughabghab for
his efforts and contributions to this work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dhayne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kilany</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Taher</surname>
          </string-name>
          , “
          <article-title>In search of big medical data integration solutions-a comprehensive survey</article-title>
          ,
          <source>” IEEE Access</source>
          , vol.
          <volume>7</volume>
          , pp.
          <volume>91</volume>
          <fpage>265</fpage>
          -
          <lpage>91</lpage>
          290,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Moor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dugas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Claerhout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Karakoyun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ohmann</surname>
          </string-name>
          , P.-Y. Lastic,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ammour</surname>
          </string-name>
          et al.,
          <article-title>“Using electronic health records for clinical research: the case of the ehr4cr project</article-title>
          ,
          <source>” Journal of biomedical informatics</source>
          , vol.
          <volume>53</volume>
          , pp.
          <fpage>162</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dugas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Mu¨ller-</article-title>
          <string-name>
            <surname>Tidow</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kirchhof</surname>
          </string-name>
          , and H.-U. Prokosch, “
          <article-title>Routine data from hospital information systems can support patient recruitment for clinical studies,” Clinical Trials</article-title>
          , vol.
          <volume>7</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>189</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Rosenbloom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lorenzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Stead</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. B. Johnson, “
          <article-title>Data from clinical notes: a perspective on the tension between structure and flexible documentation</article-title>
          ,
          <source>” Journal of the American Medical Informatics Association</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>186</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dhayne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kilany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haque</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Taher</surname>
          </string-name>
          , “
          <article-title>Sedie: A semanticdriven engine for integration of healthcare data,” in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</article-title>
          . IEEE,
          <year>2018</year>
          , pp.
          <fpage>617</fpage>
          -
          <lpage>622</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fosler-Lussier</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Lai</surname>
          </string-name>
          , “
          <article-title>How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?” AMIA Summits on Translational Science Proceedings</article-title>
          , vol.
          <year>2014</year>
          , p.
          <fpage>218</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peleg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bobak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Sim</surname>
          </string-name>
          , “
          <article-title>A practical method for transforming free-text eligibility criteria into computable criteria</article-title>
          ,
          <source>” Journal of biomedical informatics</source>
          , vol.
          <volume>44</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>250</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kripalani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>LeFevre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. O.</given-names>
            <surname>Phillips</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basaviah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Baker</surname>
          </string-name>
          , “
          <article-title>Deficits in communication and information transfer between hospital-based and primary care physicians: implications for patient safety and continuity of care,” Jama</article-title>
          , vol.
          <volume>297</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>831</fpage>
          -
          <lpage>841</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Milian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucur</surname>
          </string-name>
          , A. ten
          <string-name>
            <surname>Teije</surname>
          </string-name>
          , F. van
          <string-name>
            <surname>Harmelen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Paulissen</surname>
          </string-name>
          , “
          <article-title>Enhancing reuse of structured eligibility criteria and supporting their relaxation</article-title>
          ,
          <source>” Journal of biomedical informatics</source>
          , vol.
          <volume>56</volume>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>219</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cimino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dolby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kershenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          , E. Schonberg, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          , “
          <article-title>Matching patient records to clinical trials using ontologies,” in The Semantic Web</article-title>
          . Springer,
          <year>2007</year>
          , pp.
          <fpage>816</fpage>
          -
          <lpage>829</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Hruby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rusanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Elhadad</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Weng</surname>
          </string-name>
          , “
          <article-title>Eliie: An open-source information extraction system for clinical trial eligibility criteria</article-title>
          ,
          <source>” Journal of the American Medical Informatics Association</source>
          , vol.
          <volume>24</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>1062</fpage>
          -
          <lpage>1071</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Ryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hardin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Makadia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shang</surname>
          </string-name>
          , T. Kang et al.,
          <article-title>“Criteria2query: a natural language interface to clinical databases for cohort definition</article-title>
          ,
          <source>” Journal of the American Medical Informatics Association</source>
          , vol.
          <volume>26</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>305</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shivade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lopetegui</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. De Marneffe</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>FoslerLussier, and</article-title>
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Lai</surname>
          </string-name>
          , “
          <article-title>Textual inference for eligibility criteria resolution in clinical trials</article-title>
          ,
          <source>” Journal of biomedical informatics</source>
          , vol.
          <volume>58</volume>
          , pp.
          <fpage>S211</fpage>
          -
          <lpage>S218</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kennebeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Dexheimer</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. M. McAneney</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lingren</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>and I. Solti</given-names>
          </string-name>
          , “
          <article-title>Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department</article-title>
          ,
          <source>” Journal of the American Medical Informatics Association</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>166</fpage>
          -
          <lpage>178</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>D. D. Palmer</surname>
          </string-name>
          , “
          <article-title>Tokenisation and sentence segmentation,” Handbook of natural language processing</article-title>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>35</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          , “
          <article-title>Effective mapping of biomedical text to the umls metathesaurus: the metamap program</article-title>
          .”
          <source>in Proceedings of the AMIA Symposium. American Medical Informatics Association</source>
          ,
          <year>2001</year>
          , p.
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson and F.-M. Lang</surname>
          </string-name>
          , “
          <article-title>An overview of metamap: historical perspective and recent advances</article-title>
          ,
          <source>” Journal of the American Medical Informatics Association</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>236</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Papadakis</surname>
          </string-name>
          , E. Ioannou,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Niederee</surname>
          </string-name>
          , and W. Nejdl, “
          <article-title>A blocking framework for entity resolution in highly heterogeneous information spaces</article-title>
          ,
          <source>” IEEE Transactions on Knowledge and Data Engineering</source>
          , vol.
          <volume>25</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>2665</fpage>
          -
          <lpage>2682</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Lau</surname>
          </string-name>
          , “
          <article-title>A c-lstm neural network for text classification</article-title>
          ,
          <source>” arXiv preprint arXiv:1511.08630</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moen</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. S. S.</given-names>
            <surname>Ananiadou</surname>
          </string-name>
          , “
          <article-title>Distributional semantics resources for biomedical text processing</article-title>
          .”
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sog</surname>
          </string-name>
          <article-title>˘ancıog˘lu, H. O¨ztu¨rk, and</article-title>
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>O¨zgu¨r, “Biosses: a semantic sentence similarity estimation system for the biomedical domain</article-title>
          ,
          <source>” Bioinformatics</source>
          , vol.
          <volume>33</volume>
          , no.
          <issue>14</issue>
          , pp.
          <fpage>i49</fpage>
          -
          <lpage>i58</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , S.-y. Kong,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Limtiaco</surname>
          </string-name>
          ,
          <string-name>
            R. S. John,
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guajardo-Cespedes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          et al., “Universal sentence encoder,” arXiv preprint arXiv:
          <year>1803</year>
          .11175,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          , “
          <article-title>Biosentvec: creating sentence embeddings for biomedical texts</article-title>
          ,” arXiv preprint arXiv:
          <year>1810</year>
          .09302,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gardner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , “
          <article-title>Deep contextualized word representations</article-title>
          ,” arXiv preprint arXiv:
          <year>1802</year>
          .05365,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Sutskever</surname>
          </string-name>
          , “
          <article-title>Improving language understanding by generative pretraining,” URL https://s3-us-west-2</article-title>
          . amazonaws. com/openaiassets/researchcovers/languageunsupervised/language understanding paper. pdf,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , “Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,” arXiv preprint arXiv:
          <year>1810</year>
          .04805,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Talman</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chatzikyriakidis</surname>
          </string-name>
          , “
          <article-title>Testing the generalization power of neural network models across nli benchmarks</article-title>
          ,”
          <source>in Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell, R. Salakhutdinov, and
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , “Xlnet:
          <article-title>Generalized autoregressive pretraining for language understanding</article-title>
          ,” arXiv preprint arXiv:
          <year>1906</year>
          .08237,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Levy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          , “
          <article-title>Glue: A multi-task benchmark and analysis platform for natural language understanding</article-title>
          ,” arXiv preprint arXiv:
          <year>1804</year>
          .07461,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          , “
          <article-title>Biobert: pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,” arXiv preprint arXiv:
          <year>1901</year>
          .08746,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , “
          <article-title>bert-as-</article-title>
          <string-name>
            <surname>service</surname>
          </string-name>
          ,” https://github.com/hanxiao/bert-as-service,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          , and T. Wolf, “
          <article-title>Transfer learning in natural language processing</article-title>
          ,”
          <source>in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , “
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint</article-title>
          arXiv:
          <year>1908</year>
          .10084,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrault</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          , “
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          ,
          <source>” arXiv preprint arXiv:1705.02364</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <article-title>“google-research/bert: Tensorflow code and pre-trained models for bert</article-title>
          ,” https://github.com/google-research/bert, (Accessed on 09/17/
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Romanov</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Shivade</surname>
          </string-name>
          , “
          <article-title>Lessons from natural language inference in the clinical domain</article-title>
          ,” arXiv preprint arXiv:
          <year>1808</year>
          .06752,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Lopez-Gazpio</surname>
          </string-name>
          , and L. Specia, “
          <article-title>Semeval2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation</article-title>
          ,
          <source>” arXiv preprint arXiv:1708.00055</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , T. J.
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>H. L.</given-names>
          </string-name>
          <string-name>
            <surname>Li-wei</surname>
            ,
            <given-names>M.</given-names>
            Feng, M.
          </string-name>
          <string-name>
            <surname>Ghassemi</surname>
            , B. Moody, P. Szolovits,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Celi</surname>
          </string-name>
          , and R. G. Mark, “
          <article-title>Mimic-iii, a freely accessible critical care database,” Scientific data</article-title>
          , vol.
          <volume>3</volume>
          , p.
          <fpage>160035</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>