<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Query Taxonomy Describes Performance of Patient-Level Retrieval from Electronic Health Record Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Steven R. Chamberlin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven D. Bedrick</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aaron M. Cohen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yanshan Wang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew Wen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sijia Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongfang Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William R. Hersh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Spoken Language Understanding, Oregon Health &amp; Science University</institution>
          ,
          <addr-line>Portland, OR</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Medical Informatics &amp; Clinical Epidemiology, Oregon Health &amp; Science University</institution>
          ,
          <addr-line>Portland, OR</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic</institution>
          ,
          <addr-line>Rochester, MN</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Performance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure. We used an IR test collection containing 56 patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation. We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information retrieval</kwd>
        <kwd>patient cohort discovery</kwd>
        <kwd>electronic health record</kwd>
        <kwd>topic taxonomy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Some of the characteristics derived from a query taxonomy
could lead to improved selection of approaches based on
the structure of the topic of interest. Insights gained here
will help guide future work to develop new methods for
patient-level cohort discovery with EHR data.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Background</title>
      <p>
        The intent of this research is to define and test a query
taxonomy, applied to patient cohort definitions, which can
Copyright © 2019 for this paper by its authors. Use
permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0)
explain the performance variations seen when retrieving
these cohorts from electronic health record (EHR) data
using automated methods. Also of interest is the possible
relationship between a query taxonomy and different
methods of retrieval and associated parameter settings.
Patient cohort discovery in health records is an important
task that is often used in academic institutions for research
purposes, such as recruiting for clinical trials [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This can
be a very labor-intensive task requiring time spent to design
custom queries for each cohort definition, or topic.
Automated methods could improve the efficiency of this
task, but information retrieval methods have not been
wellstudied in this domain [
        <xref ref-type="bibr" rid="ref10 ref2 ref3 ref4 ref6">2-5</xref>
        ].
      </p>
      <p>
        There has been promising research in medical record
retrieval methods, some using publicly available EHR test
collections [
        <xref ref-type="bibr" rid="ref5 ref8 ref9">6, 7</xref>
        ]. The Text Retrieval Conference (TREC)
Medical Records Track, in 2011 and 2012, used one of
these public EHR sources for methods development, but
retrieval was only at the encounter level [
        <xref ref-type="bibr" rid="ref11 ref13">8, 9</xref>
        ]. Other
patient-level cohort identification research has used only
structured data [
        <xref ref-type="bibr" rid="ref15">10</xref>
        ], or focused on cohorts more broadly
defined than that seen for research recruitment [
        <xref ref-type="bibr" rid="ref16">11</xref>
        ].
Methods have also been developed to locate clinical study
inclusion criteria in EHR data, but not patient-level cohorts
[12]. Methods using natural language processing, deep
learning, and structured interfaces have been developed to
optimize queries by the addition of context categories to
EHR data, automate structuring of free text and
classification, and to automatically convert criteria into
structured queries [13-16]. Some methods focus on task
definitions that differ from cohort identification, such as
phenotyping [17-19]. For our purposes, cohort definitions
not only contain disease diagnoses, but other complex
features, such as lab tests, surgical procedures, medications,
lab values, temporal relationships as well as combinations
of structured and unstructured data.
      </p>
      <p>
        Our previous research studied the performance of
automated word-based queries used for complex
patientlevel cohort discovery with raw EHR data [
        <xref ref-type="bibr" rid="ref6">5, 20</xref>
        ], testing
different parameter variations (n=48) for these queries
against 56 complex patient cohort definitions, or topics.
Performance was generally poor for these queries, with
86% of the topics having a median B-Pref [21] under 0.25
(scale 0-1) across the 48 query parameter variations. These
queries also underperformed when compared to custom
designed Boolean queries. There were also large
performance variations between and within the 56 topics.
The range of median B-Pref across topics was 0 at a
minimum and .895 at a maximum, and within topics ranges
were seen as small as .03 points and as large as .60 points.
And finally, there were also differences in median B-Pref
between the 48 query parameter settings, although these
differences were not as dramatic as that seen for the topics.
The variation in performance seen across complex patient
cohorts in our previous research has led us to this research
to explain, and predict, this variation by decomposing a
patient-level cohort definition into a standard taxonomy. To
do this we propose to use query performance data from this
previous research to test our taxonomy definitions.
Research on query decomposition with the intent to predict
performance has been done in other domains [22], but to
our knowledge, this type of taxonomy has not been
previously defined or tested for this type of cohort
discovery task using EHR data.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Materials and Methods</title>
      <sec id="sec-3-1">
        <title>2.1 Test Data</title>
        <p>
          To test the taxonomy developed for this project, we used
the query performance data generated in a previous medical
IR study [
          <xref ref-type="bibr" rid="ref6">5</xref>
          ]. We applied the Cranfield IR evaluation
methodology [23] using the trec_eval program [
          <xref ref-type="bibr" rid="ref35">24</xref>
          ] (Fig 1).
We used the B-Pref statistic as the performance measure
for our evaluation. This statistic measures how many
relevant patients were retrieved before non-relevant
patients in ranked lists, and is used when relevance judging
is incomplete, which is the case with the data used in this
research. Due to the large volume of patients returned from
the queries, random samples had to be selected for judging.
        </p>
        <p>Cranfield IR Evaluation Methodology</p>
        <p>EHR
Demographics
Vitals
Medications
Hospital/AmbulatoryEnc
ClinicalNotes
ProblemLists
Laboratory/Microbiology
Surgery/ProcedureOrders
ResultComments</p>
        <p>TestEHRData
(99,965Patients,
Elasticsearch)
TestTopics(56)
trec_eval
PatientLevelRetrieval
- Automatedword-based
(Fourparametersvaried)</p>
        <p>PatientRelevance
AssessmentInterface
(PRAI)</p>
        <p>Perfo(rBm-aPnrecfe)Data</p>
        <p>TopicTaxonomy
Figure 1. Overview of the generation of performance data used to
test and model the query taxonomy definition.</p>
        <p>PerformanceSimulationby
TopicTaxonomy/RetrievalModel
Patient data originated from an Epic (Verona, WI) EHR
system. A total of 99,965 unique patients with 6,273,137
associated encounters were stored in the Elasticsearch
(v1.7.6) IR platform to evaluate the retrieval methods.
There were a variety of document types associated with
each patient: demographics, vitals, medications
(administered, current, ordered), hospital and ambulatory
encounters with associated attributes and diagnoses,
clinical notes, problem lists, laboratory and microbiology
results, surgery and procedure orders, and result comments.
These patients had to have at least three primary care
encounters between 2009 and 2013.</p>
        <p>The 56 topics were derived from actual patient cohort
requests seen at two major medical research institutions,
Oregon Health &amp; Science University and The Mayo Clinic.
A detailed example of a topic, in three representations, can
be seen in Table 1. Examples of other topic summary
descriptions include ‘Adults with IBD who haven’t had GI
surgery’, ‘Adults with a Vitamin D lab result’,
‘Postherpetic neuralgia treated with topical and systemic
medication’, ‘Children see in ED with oral pain’, and ‘ACE
inhibitor-induced cough’.</p>
        <sec id="sec-3-1-1">
          <title>Topic Representation (See Table 1) – A</title>
          <p>(summary statement), B (clinical case), or C
(detailed criteria)
Text Subset – only clinical notes or all document
types (including structured data reporting as text)</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Aggregation Method – patient relevance score</title>
          <p>
            calculated by summation (sum) of all documents
or by maximum (max) value
Retrieval Model – BM25, also known as Okapi
[25], Divergence from randomness (DFR) [
            <xref ref-type="bibr" rid="ref22">26</xref>
            ],
Language modeling with Dirichlet smoothing
(LMDir) [27], Default Lucene scoring, based on
the term frequency-inverse document frequency
(TF*IDF) model [28]
Stratified random samples of 45 patients were selected
from the top ranked 1000 patients retrieved from each run
parameter iteration for each topic. The samples from all 48
iterations were combined for each topic and duplicates
removed. The final judgment pools ranged in size from 450
to 780 patients for the 56 topics. Manual relevance
judgment by clinically trained reviewers was performed on
these pools. After the relevance assessment, final
performance statistics were generated with the trec_eval
program.
          </p>
          <p>The final dataset contained the B-Pref performance statistic
for all combinations of topics and query run parameters (56
topics x 48 run iterations = 2,688 unique queries).
.
2.2 Topic Taxonomy Characteristics
Our first step, to explain and predict the performance of the
word-based queries, was to create a topic taxonomy
composed of 59 features (Table 2). Three of the authors,
who were trained clinically, iteratively developed a list of
features that covered cohort inclusion or exclusion criteria
of medical diagnoses and classifications, medications,
procedures, lab tests, clinician information, patient
demographics, information about the clinical setting,
temporal measures and other aspects. Each of the 56 topics
were then classified by these 59 features by the same three
individuals. Fleiss Kappa was used to test interrater
reliability [29].
We wanted to examine any possible association between
the query performance, as measured by B-Pref, and the 59
taxonomy characteristic classifications of the 56 topics. To
do this we performed an exploratory data analysis by first
creating a heatmap of run parameter settings by topics, with
B-Pref as the performance metric. Using this heatmap, we
clustered the 56 topics by query performance. Next, using
this performance-based topic cluster order, we created a
second heatmap of taxonomy characteristic assignment by
topics, using the level of interrater agreement (0-3) as the
performance metric. These heatmaps were compared for
pattern similarities between performance clustering and
taxonomy assignments to see if the B-Pref clustering
patterns for topics were also seen with topic clusters
associated with taxonomy characteristics.
2.3 Topic Taxonomy Structural Binary Features
To simplify the taxonomy definitions, focus more strictly
on structural complexity, rather than content,
representation, and to create features for model
development and statistical testing, we defined the
following six binary features by grouping some of the 59
taxonomy characteristics into categories (Table 3). We
hypothesize that these six features capture the subset of
taxonomy characteristics, and topic structure, and would
more strongly correlated with performance. One
investigator (SC) identified the taxonomy characteristics
used to define these features as relevant to the topic, based
on our experience designing and executing manual Boolean
queries associated with each topic. These assignments were
reviewed by the other investigators.</p>
          <p>The first binary feature was positive if there was a temporal
component in the topic (‘Temporal’, y/n). The 56 topics
contain a variety of temporal conditions, including age at
first diagnosis, time with diagnosis, chronological order of
disease onset for several diagnoses, and medication use
before or after first diagnosis. The second binary feature
was positive if the topic could not be defined exclusively
with the structured data present in the data set (ICD, CPT,
disease and drug names) and required some free text
(‘Text’, y/n). An example for this would be a topic that
checked for the presence of a side effect, only included in
clinical notes in the data set, associated with a medication.
The third binary feature was positive if the topic required a
medication list check, either exclusions or inclusions or
both (‘Medication’, y/n). The fourth binary feature was
positive if there was a procedure in the topic. This includes
any surgical or non-surgical procedure (‘Procedure’, y/n).
The fifth binary feature was positive if additional value
criteria were required from lab tests, imaging, or physical
exams beyond just having these tests in the record.
(‘Additional’, y/n). And finally, the sixth binary feature
was positive if the topic required a specific disease
diagnosis or diagnoses. Some topics were defined for
cohorts who only received certain screening tests without
an explicit disease requirement (‘Condition’, y/n).
Using the example for Topic 15 in Table 1, there is a
medical condition explicitly required (rheumatoid arthritis,
Condition=Y), an included lab test (anti-CCP,
Procedure=Y), and an additional value criteria required for
the lab test (IgG&gt;40 units, Additional=Y). The other three
binary features would be ‘N’ for this topic</p>
          <p>The relationship between these six taxonomy features and
query performance was investigator, by testing for any
performance related interactions between these features and
the four word-based query parameters (topic representation,
text subset, aggregation method, and retrieval model).
These interactions capture the relationship between
wordbased query parameters and inherent topic structure related
to complexity (binary taxonomy features).</p>
          <p>We used a beta regression model for this investigation. This
model was trained on the B-Pref performance data as the
dependent variable, with the four word-based query
parameters, the six binary taxonomy features and all
firstorder interactions between the parameters as the
independent variables. Due to data limitations we felt that
model coefficients and tests of significant might not be
generalizable beyond this data set. We instead used this
model to predict B-Pref on all possible permutations of
values of the parameters and features, and to investigate the
patterns of the predicted B-Pref in this predicted and
simulated parameter/feature space. Since this simulated
data contained all possible combinations of values of the
four word-based parameters and the six binary taxonomy
features, there were a total of 3,072 entries. Using the
simulated data, we estimated the effect of the six binary
taxonomy features individually, and the effect of the Topic
Representations. We also used this simulated data for an
exploratory data analysis, using a heatmap, to assess more
complex interactions between the parameter space
(interventions) and the binary feature space (inherent topic
structure).</p>
          <p>
            A beta regression mean model was selected because the
response variable, B-Pref, is continuous, restricted to the
unit interval [
            <xref ref-type="bibr" rid="ref1">0,1</xref>
            ], and asymmetrically distributed. The
logit link function was used for these analyses. The
regression was done with R (v3.3.1) using the package
betareg (v3.1-2).
3. Results
3.1 Taxonomy Analysis – 59 Characteristics
0.9
0.8
0.7
app
saK0.6
lisFe
0.5
0.4
We found moderate, substantial or almost perfect
agreement by Fleiss kappa on 50 of the 56 topics, rated by
the three clinically trained raters for the 59 query taxonomy
characteristics (Fig 2). Topic distribution is in Fig 3.
          </p>
          <p>Inter-rater Agreement for All 56 Topics
0.3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
Topics
Figure 2. Interrater agreement for the 59 taxonomy characteristics
applied to 56 topics. Fleiss Kappa was calculated for each topic
based on agreement between three clinical trained raters on the 59
taxonomy characteristic assignments.</p>
          <p>Almost Perfect
Substantial
Moderate
Fair
We next used heatmaps to investigate the relationship
between word-based query performance and assignment to
the 59 taxonomy characteristics (Fig 4). Topic clusters,
based on B-Pref performance (left heatmap), were
maintained for taxonomy characteristics (right heatmap),
but column clustering was allowed for this heatmap.
Performance-based clustering of topics can be seen for the
B-Pref heatmap, but there do not appear to be similar
patterns found in the taxonomy heatmap, while maintaining
the same topic order. There does not appear to be an
association between performance and the 59 taxonomy
characteristics.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Taxonomy Analysis – 6 Binary Features</title>
        <p>For our data, the beta regression model output did show
that five of the six binary taxonomy features were
associated with poorer performance, as measured by
BPref. One feature, ‘text’, was associated with better
performance. Features associated with poorer performance
were designed to capture increased topic complexity in
various ways, so this result is not surprising. The feature
‘text’ captures the ability of purely structured data to
describe a medical topic, with or without added free text.
Our result indicates that topics that require text, in addition
to structured data, might perform better. And there were
notable interactions between the taxonomy features and the
run parameters, particularly between the feature ‘temporal’
and Topic Representation. Interestingly this analysis did
not point to any notable interactions between the four
word-based parameters. But it is not clear if these results
are generalizable due to the specific nature of our 56 topic
descriptions.</p>
        <p>We then used the beta regression model, containing the
four word-based parameters, six binary taxonomy features
and the interactions between the parameters and features, to
predict B-Pref with a simulated dataset. This dataset
contained all possible permutations of the ten predictors.
We varied each of the six binary taxonomy flags
independently, while holding all other values constant, to
estimate the impact of these flags. We also did this for
Topic Representation (Fig 5). We again saw that five of the
six binary taxonomy features were associated with poorer
performance, and the feature ‘text’ was associated with
improved performance. We also saw Topic Representation
B associated with improved performance.
We also created a heatmap of the predicted B-Pref values
generated from the simulated data (Fig 6). The x axis
contained all possible permutations of the four word-based
query parameters and the y axis contained all possible
permutations of the six binary taxonomy features, and
hierarchical clustering was done in both dimensions. Clear
patterns of performance clustering can be seen, particularly
around the combinations of three of the binary taxonomy
features, temporal, text and condition. These three features
are conceptually different from the other three (medication,
procedure, additional) in that the latter are simple additions
of information but the former represent more complex topic
structural aspects. In addition, within specific combinations
of these flags there are also clear variations in performance
across different word-based parameter settings. In the
bottom horizontal cluster, the best performance for topics
without a temporal, text and condition component (blue
rectangle) is seen with a completely different set of
parameter settings than for topics with all three of these
structural components (red rectangle). This performance
pattern is an example of a possible interaction between the
parameters and features, which could help guide the
selection of parameters to optimize retrieval results based
on the taxonomic attributes of the topics.
The findings in our previous research, and the performance
variation within and across topics, led us to pursue two
further methods to understand and improve our results. In
an attempt to understand our results, we developed a
taxonomy for the topics that we hoped would identify
characteristics associated with the differences in results.
We first developed an exhaustive 59 parameter taxonomy
that did not reveal any associations. However, when we
reduced the taxonomy to six binary variables, we did find
association with performance. As also shown by
comparable work at Mayo Clinic [30], it may be possible
with further prospective analysis that query taxonomy
might lead to selection of different query approaches based
on characteristics of the topic.</p>
        <p>This work provides some evidence that applying a query
taxonomy might improve performance. Further work with
methods such as machine learning might yield
improvements, although it is not clear what features will
lead to performance improvement across varying topical
criteria for different queries.</p>
        <p>There were a number of limitations to this work. Our
records were limited to a single academic medical center.
There are many additional retrieval methods we could have
assessed, but we would not have the resources to carry out
the additional relevance judgments required as those
additional methods would add new patients to be judged. It
is also difficult to generalize our results due to the
specificity of the topics. This will always be a limitation for
this type of work since it would be extremely difficult to
represent all possible cohort requests that could be seen for
all forms of medical research. Finally, there is a global
limitation to work with EHR data for these sorts of use
cases in that raw, identifiable patient data is not easily
sharable such that other researchers could compare their
systems and algorithms with ours using our data [31].
ACKNOWLEDGMENTS
This work was supported by NIH Grant 1R01LM011934.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Obeid</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.,
          <article-title>A survey of practices for the use of electronic health records to support research recruitment</article-title>
          .
          <source>Journal of Clinical and Translational Science</source>
          ,
          <year>2017</year>
          . 1: p.
          <fpage>246</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.,
          <article-title>Increasing the efficiency of trialpatient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients</article-title>
          .
          <source>BMC Medical Informatics &amp; Decision Making</source>
          ,
          <year>2015</year>
          .
          <volume>15</volume>
          : p.
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.,
          <article-title>Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <year>2015</year>
          .
          <volume>22</volume>
          : p.
          <fpage>166</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.,
          <article-title>A real-time automated patient screening system for clinical trials eligibility in an emergency department: design and evaluation</article-title>
          .
          <source>JMIR Medical Informatics</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <volume>7</volume>
          (
          <issue>3</issue>
          ): p.
          <fpage>e14185</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chamberlin</surname>
            <given-names>SR</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.S.</given-names>
            ,
            <surname>Cohen</surname>
          </string-name>
          <string-name>
            <given-names>AM</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          <string-name>
            <given-names>Y</given-names>
            ,
            <surname>Wen</surname>
          </string-name>
          <string-name>
            <given-names>A</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          <string-name>
            <given-names>S</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          <string-name>
            <given-names>H</given-names>
            ,
            <surname>Hersh</surname>
          </string-name>
          <string-name>
            <surname>WR</surname>
          </string-name>
          <article-title>Electronic Health Record Data for a Cohort Discovery Task</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>medRxiv</surname>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al.
          <article-title>Creation of a repository of automatically de-identied clinical reports: processes, people, and permission</article-title>
          .
          <source>in Proceedings of the American Medical Informatics Association Clinical Reserach Informatics</source>
          .
          <year>2011</year>
          . San Francisco, CA.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          7.
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          , et al.,
          <article-title>MIMIC-III, a freely accessible critical care database</article-title>
          .
          <source>Sci Data</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <volume>3</volume>
          : p.
          <fpage>160035</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          8.
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.a.R.T.</given-names>
          </string-name>
          <article-title>Overview of the TREC 2011 Medical Records Track.</article-title>
          .
          <source>in The Twentieth Text REtrieval Conference Proceedings (TREC</source>
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          2011.
          <article-title>Gaithersburg, MD: National Institute of Standards and Technology</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          9.
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.a.W.H.</given-names>
          </string-name>
          <article-title>Overview of the TREC 2012 Medical Records Track</article-title>
          . in
          <source>The Twenty-First Text REtrieval Conference Proceedings (TREC</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          2012.
          <article-title>Gaithersburg, MD: National Institute of Standards and Technology</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          10.
          <string-name>
            <surname>Glicksberg</surname>
            ,
            <given-names>B.S.</given-names>
          </string-name>
          , et al.,
          <source>Automated disease cohort selection using word embeddings from Electronic Health Records. Pac Symp Biocomput</source>
          ,
          <year>2018</year>
          .
          <volume>23</volume>
          : p.
          <fpage>145</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sarmiento</surname>
            ,
            <given-names>R.F.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Dernoncourt</surname>
          </string-name>
          ,
          <article-title>Improving Patient Cohort Identification Using Natural Language Processing</article-title>
          ,
          <source>in Secondary Analysis of Electronic Health Records. 2016</source>
          , Springer Copyright 2016,
          <article-title>The Author(s)</article-title>
          .
          <source>Cham (CH)</source>
          . p.
          <fpage>405</fpage>
          -
          <lpage>417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Stubbs</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.,
          <article-title>Cohort selection for clinical trials: n2c2 2018 shared task track 1</article-title>
          .
          <string-name>
            <given-names>J</given-names>
            <surname>Am Med Inform Assoc</surname>
          </string-name>
          ,
          <year>2019</year>
          .
          <volume>26</volume>
          (
          <issue>11</issue>
          ): p.
          <fpage>1163</fpage>
          -
          <lpage>1171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Ateya</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.C.</given-names>
            <surname>Delaney</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Speedie</surname>
          </string-name>
          ,
          <article-title>The value of structured data elements from electronic health records for identifying subjects for primary care clinical trials</article-title>
          .
          <source>BMC Med Inform Decis Mak</source>
          ,
          <year>2016</year>
          .
          <volume>16</volume>
          : p.
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , et al.,
          <article-title>EliIE: An open-source information extraction system for clinical trial eligibility criteria</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <year>2017</year>
          .
          <volume>24</volume>
          (
          <issue>6</issue>
          ): p.
          <fpage>1062</fpage>
          -
          <lpage>1071</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>D.</surname>
          </string-name>
          Demner-Fushman,
          <article-title>Automated classification of eligibility criteria in clinical trials to facilitate patient-trial matching for specific patient populations</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <year>2017</year>
          .
          <volume>24</volume>
          (
          <issue>4</issue>
          ): p.
          <fpage>781</fpage>
          -
          <lpage>787</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.,
          <article-title>Criteria2Query: a natural language interface to clinical databases for cohort definition</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <volume>26</volume>
          (
          <issue>4</issue>
          ): p.
          <fpage>294</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Denny</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bastarache</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roden</surname>
          </string-name>
          ,
          <article-title>Phenome-wide association studies as a tool to advance precision medicine</article-title>
          .
          <source>Annual Review of Genomics and Human Genetics</source>
          ,
          <year>2016</year>
          .
          <volume>17</volume>
          : p.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Richesson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.,
          <article-title>Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods</article-title>
          .
          <source>Artificial Intelligence in Medicine</source>
          ,
          <year>2016</year>
          .
          <volume>71</volume>
          : p.
          <fpage>57</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.,
          <article-title>Defining phenotypes from clinical data to drive genomic research</article-title>
          .
          <source>Annual Review of Biomedical Data Science</source>
          ,
          <year>2018</year>
          . 1: p.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.,
          <article-title>Intra-institutional EHR collections for patient-level information retrieval</article-title>
          .
          <source>Journal of the American Society for Information Science &amp; Technology</source>
          ,
          <year>2017</year>
          .
          <volume>68</volume>
          : p.
          <fpage>2636</fpage>
          -
          <lpage>2648</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          .
          <article-title>Retrieval evaluation with incomplete information</article-title>
          .
          <source>in Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          al.,
          <string-name>
            <surname>F.e.</surname>
          </string-name>
          , From Evaluating to Forecasting Performance: How to Turn Information Retrieval,
          <source>Natural Language Processing, Recommender Systems into Predictive Sciences Dagstuhl Manifestos</source>
          ,
          <year>2018</year>
          .
          <volume>7</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>96</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Cleverdon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Keen</surname>
          </string-name>
          ,
          <article-title>Factors determining the performance of indexing systems</article-title>
          (Vol.
          <volume>1</volume>
          :
          <string-name>
            <surname>Design</surname>
          </string-name>
          , Vol.
          <volume>2</volume>
          : Results).
          <year>1966</year>
          , Aslib Cranfield Research Project: Cranfield, England.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>2011, San Rafael, CA: Morgan &amp; Claypool.</mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.</given-names>
            and S.
          </string-name>
          <string-name>
            <surname>Walker</surname>
          </string-name>
          .
          <article-title>Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval</article-title>
          .
          <source>in Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          .
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>ACM Transactions on Information Systems</source>
          ,
          <year>2002</year>
          .
          <volume>20</volume>
          : p.
          <fpage>357</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>J. Lafferty</surname>
          </string-name>
          ,
          <article-title>A study of smoothing methods for language models applied to information retrieval</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <year>2004</year>
          .
          <volume>22</volume>
          : p.
          <fpage>179</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <given-names>Information</given-names>
            <surname>Processing</surname>
          </string-name>
          and Management,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          24: p.
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Fleiss</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Levin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Paik</surname>
          </string-name>
          ,
          <article-title>The Measurement of Interrater Agreement, in Statistical Methods for Rates and Proportions</article-title>
          ,
          <string-name>
            <given-names>Third</given-names>
            <surname>Edition</surname>
          </string-name>
          .
          <year>2003</year>
          , John Wiley &amp; Sons: Hoboken, NJ. p.
          <fpage>598</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.,
          <article-title>Test collections for electronic health record-based clinical information retrieval</article-title>
          .
          <source>JAMIA Open</source>
          ,
          <year>2019</year>
          : p. Epub ahead pf print.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <article-title>Overview of the Health Search and Data Mining (HSDM</article-title>
          <year>2020</year>
          ) Workshop.
          <year>2020</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>