<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Hypothetical</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Identifying Contextual Information in Clinical Texts: A Study of Two Domains</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emilia Apostolova</string-name>
          <email>emilia.aposto@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noriko Tomuro</string-name>
          <email>tomuro@cs.depaul.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DePaul University</institution>
          ,
          <addr-line>Chicago, IL</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Library of Medicine</institution>
          ,
          <addr-line>Bethesda, MD</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <volume>453</volume>
      <issue>264</issue>
      <fpage>99</fpage>
      <lpage>103</lpage>
      <abstract>
        <p>Contextual information, such as the affirmation or negation of medical problems, is key in Information Extraction (IE) from clinical texts. As there are very few available annotated clinical datasets, the practical question of training corpora reusability arises. Clinical IE systems face the challenge of disparate clinical sub-domain narratives typically lacking targeted training/testing data. We evaluated the crossdomain reusability of a clinical dataset annotated with contextual information medical concepts and their assertion status (present, absent, hypothetical, possible, conditional, not associated with the patient). Systems developed on a training dataset consisting of discharge summaries and progress notes were then evaluated on a new sub-domain - radiology reports. We developed a machine learning and a rule-based system and observed that there was a minor performance drop when applied to a new dataset of radiology reports. The rule-based system exhibited stable performance with no statistically significant change. While the machine learning approach had a statistically significant performance drop, it still outperformed the rule-based system. Results suggest that an annotated training dataset could be reused across clinical subdomains for the task of identifying contextual information.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Biomedical IE systems and specifically IE systems
targeting clinical texts typically involve the
extraction of medical problems. Identifying correctly the
context of these medical problems is an equally
important task. Contextual information refers to
concept attributes such as negation status (is the
medical problem affirmed, negated, or speculated;
e.g. tumor was felt to be unlikely), temporality (is
the medical problem current or past; e.g. a prior
history of pneumonia 2 years ago), subject
identification (is the medical problem associated with
the patient; e.g. his father had prostate cancer).</p>
      <p>
        The importance of correctly identifying
contextual information is attested by an increasing
interest in the task. A number of contextual
discovery algorithms have been developed. Most
notably, the NegEx1
        <xref ref-type="bibr" rid="ref3">(Chapman et al., 2001)</xref>
        negation discovery algorithm was developed and
subsequently implemented in a number of
Biomedical NLP systems, including MetaMap
        <xref ref-type="bibr" rid="ref1">(Aronson,
2001)</xref>
        , CaTIES2, and Mayo Clinic’s Clinical IE
        <xref ref-type="bibr" rid="ref6">system (Savova et al., 2008</xref>
        ). Subsequently
ConText
        <xref ref-type="bibr" rid="ref4">(Chapman et al., 2007)</xref>
        was developed, as a
NegEx extension that identifies additional
contextual features from clinical documents, such as
temporality and subject identification. Mey
        <xref ref-type="bibr" rid="ref6">stre et al.
(2008</xref>
        ) present a good overview of other systems
and algorithms aiming at contextual information
extraction from clinical texts.
      </p>
      <p>
        The availability of annotated clinical text
corpora is of crucial importance for the development
of contextual information extraction systems. Not
surprisingly, few annotated clinical text corpora
are publicly available due to patient privacy
restrictions and data ownership complications. The
Bio
        <xref ref-type="bibr" rid="ref6">Scope corpus (Szarvas et al., 2008</xref>
        ) consists
of biomedical texts annotated for negation,
speculation and their linguistic scope, and includes
1,954 radiology report excerpts (typically 3 to 4
sentences) that were used in the Computational
Medicine Center’s 2007 Medical NLP Challenge3.
The 2010 i2b2 (Informatics for Integrating
Bi1http://www.dbmi.pitt.edu/chapman/NegEx.html
2http://caties.cabig.upmc.edu/Wiki.jsp?page=Home
3http://www.computationalmedicine.org/challenge
ology and the Bedside4) NLP Shared Task
released 826 clinical records (discharge summaries
and progress notes) annotated with medical
problems and their contextual information5.
      </p>
      <p>As publicly available clinical text corpora are
sparse, the acquisition of task-specific annotated
datasets remains problematic. Clinical IE
systems target various clinical sub-genres (radiology
reports, pathology reports, discharge summaries,
etc.) and possibly hospital-specific report
formating and content characteristics. A very practical
concern for the development of such systems is
whether or not, and how much, the tools developed
using the available corpora annotated with
contextual information could be used in other clinical
sub-domains. In this paper we explore this
question by evaluating the performance of a contextual
IE system developed for the clinical sub-genre of
discharge summaries on a different sub-genre -
radiology reports.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>Dataset</title>
        <p>As participants of the 2010 i2b2 NLP Shared Task,
we have developed a clinical IE system that
extracts contextual information of medical problems.
The system was developed using the 2010 i2b2
challenge training dataset consisting of 349
clinical records (discharge summaries and progress
notes). The challenge data was annotated for
concepts referring to medical problems, tests, and
treatments (Table 1). In addition, each medical
problem was annotated with its ‘assertion status’
- one of 6 categories of assertions as described
below:
Present - it is asserted that the patient experiences
the problem (the default category).</p>
        <p>Absent - it is asserted that the problem does not
exist in the patient.</p>
        <p>Possible - it is asserted that the problem may be
present in the patient, but there is uncertainty
expressed.</p>
        <p>Conditional - it is asserted that the patient
experiences the problem only under certain conditions
(e.g. allergies).</p>
        <p>Hypothetical - it is asserted that the patient may
develop the medical problem.</p>
        <p>4https://www.i2b2.org/NLP/Relations/
5The 2010 i2b2 Shared Task data (currently available to
challenge participants) will be made available to the research
community at large one year after the evaluation.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Not associated with the patient - the medical</title>
        <p>problem is associated with someone who is not the
patient.</p>
        <p>Assertion category examples are shown in Table
2. The distribution of the 6 assertion categories
within the i2b2 training dataset is shown in Figure
1.</p>
        <p>Concept Category
Medical Problem (any
abnormality observed
in patient)
Test (procedures,
panels, and measures)
Treatment (procedures,
interventions, and
substances)</p>
        <p>Example
She developed diabetes.</p>
        <p>Chest x-ray revealed
clear lungs.</p>
        <p>He was placed on a morphine
drip.</p>
        <p>Systems developed with the training dateset
described above were tested against two datasets.
One was provided by the i2b2 challenge - 477
discharge summaries and progress notes similar to
the training dataset. The second dataset was
created by annotating 70 deidentified Lung CT
radiology reports6. The radiology reports were
anno6The reports were randomly selected from a proprietary
dataset of deidentified radiology reports spanning a period of
3 years from the Department of Radiology at Northwestern
University Medical School.
tated by a single annotator for tests, treatments,
medical problems and their assertion types
following the 2010 i2b2 challenge annotation guidelines.
The dataset comprised of 2,322 sentences and 32,
592 tokens, with an average report length of 33
sentences. Lung CT studies were chosen as the
nature of the procedure usually results in more
verbose reports with an abundance of findings (e.g.
as compared to routine exams such as
mammography). The annotator identified a total of 1,564
medical problems, 431 tests, and 92 treatments.
The distribution of the assertion categories across
the annotated medical problems is shown in
Figure 2. Unlike discharge summaries and progress
notes, radiology reports do not list allergies, ‘as
needed’ medication prescriptions, or narrate
family history. As a result, there were no instances of
the categories conditional, hypothetical, and
associated with someone else. Also notable is that
the percentage of present medical problems is
relatively lower (60%) compared to the dataset of
discharge summaries/progress notes (67%), while the
percentages of possible and absent medical
problems are higher (5% vs. 14% and 21% vs. 26%
respectively). The difference is again due to the
nature of radiology reports, they are often used to
rule out conditions and to suggest further
investigation of possible medical problems.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.2 System Description</title>
        <p>We evaluated the performance of a rule-based
approach and a machine learning approach against
the two datasets. Both systems were developed
using the i2b2 training dataset consisting of 349
discharge summaries and progress notes.</p>
        <p>
          The rule-based system is a slightly modified
implementation of the ConText algorithm
          <xref ref-type="bibr" rid="ref4">(Chapman et al., 2007)</xref>
          . The ConText algorithm relies
on hand-crafted sets of trigger terms in
proximity of clinical conditions to discover if the
conditions are affirmed, negated, or possible; recent,
historical, or hypothetical; experienced by the
patient or other. Trigger terms are phrases
preceding or following medical problems such as no
evidence to suggest, negative for, may be ruled out,
etc. We slightly modified the algorithm by
extending the list of ‘possible’ trigger-terms (118
additional expressions including morphological
variants), extending the list of ‘absent’ and
‘hypothetical’ trigger terms (4 and 2 additional expressions
respectively), and introducing a small set of
‘conditional’ trigger terms (12 expressions). We also
disregarded ‘historical’ cues as the challenge task
does not differentiate between historical and
recent medical problems.
        </p>
        <p>
          The Machine Learning system models the
problem as a classification task that assigns each of the
annotated medical problems into one of the 6
categories. We trained a one-against-all SVM
          <xref ref-type="bibr" rid="ref2">(Chang
and Lin, 2001)</xref>
          classifier - a series of binary
classifiers for each assertion category against all other
categories. Empirically, we identified an optimum
set of features as described below. The GATE
framework
          <xref ref-type="bibr" rid="ref5">(Cunningham et al., 2002)</xref>
          was used to
generate and experiment with features sets.
        </p>
        <p>Feature Set:
1. Token window of size 5: Tokens
surrounding the medical problem (within sentence
boundaries). Numbers were normalized (converted to
the string $number). Tokens belonging to
concepts were converted to their corresponding
concept types (e.g. ‘coronary bypass surgery’ was
substituted by the concept’s category - treatment).</p>
        <p>2. Negative prefix: This feature targets the
discovery of absent medical problems identified as
such by the presence of a morphological prefix, as
in ‘afebrile’ or ‘nontender’. Possible values are a-,
ab-, un-, an-, anti-, dis-, non-, in-, il-, ir-, or im-.</p>
        <p>3. Section heading preceding the problem
concept: Section headings could be helpful in
identifying most assertion categories. For example,
problems that fall under the heading ‘Family
History’ typically fall into the ‘not associated with
patient’ assertion category. Headings were identified
as the last string preceding the problem concept
that matches the regular expression ‘Beginning of
line, One or more characters, Colon, White space,
End of line’.</p>
        <p>4. ConText Cues: Occasionally cues or trigger
terms outside the limitations of the 5-token
window were necessary for a human reader to identify
the assertion category. This feature was used to
identify ConText cues preceding or following the
medical concept outside the token window size.</p>
        <p>5. Semantic Type: Conditional medical
problems are typically related to allergy symptoms and
other temporary conditions (e.g. penicillin causes
a rash; dyspnea on exertion). A hand-crafted
dictionary was created to map problem concepts to
such semantic types. For example, expressions
such as rash, hive, itching, dyspepsia, etc. were
mapped to the semantic type allergy symptom.</p>
        <p>In addition, token-window Part-of-Speech,
UMLS7 term and semantic type features were also
considered. However, these features had no
positive effect on the system performance and were
excluded from the final system.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>Results from applying the original and the
modified versions of the ConText algorithm on the
i2b2 dataset of 477 discharge summaries/progress
notes are shown in Tables 3 and 4 respectively. As
shown, the addition of hand-crafted trigger terms
improved performance across most assertion
categories. Table 5 shows results obtained on the same
dataset by the SVM-based system.</p>
      <p>Category TP FN FP R
Present 12663 362 2061 97.22
Absent 2877 732 461 79.72
Hypothetical 327 390 42 45.61
Possible 53 830 15 6.0
Conditional 0 171 0 0.0
Not patient 84 61 13 57.93
Overall 16004 2546 2592 86.27</p>
      <p>Results from applying the original and the
modified versions of the ConText algorithm on the
test dataset of 70 CT Lung radiology reports are
shown in Tables 6 and 7 respectively. The
performance gain from the addition of hand-crafted
trigger terms is more notable in the radiology
dataset. The i2b2 dataset of discharge summaries
7Unified Medical Language System c The National
Library of Medicine
P
93.23
95.82
93.72
81.48
75.86
91.73
93.32
F1
95.71
94.02
85.86
62.62
38.42
83.45
93.32
contained a very low portion of ‘possible’
assertions (5%), and even though the new rules
improved the F1-score from 11.14 to 50.57, the
overall performance gain was negligible. However, the
radiology report dataset contained a larger portion
of ‘possible’ assertions (14%) that account for the
improved performance over the base-line. Table 8
shows results obtained on the same dataset by the
SVM-based system.</p>
      <p>As could be seen by comparing Tables 4 and
7, the overall performance of the rule-based
approach decreased from an F1-score of 86.90 to
86.32 when applied to a new clinical sub-domain
- radiology. The performance drop was not
statistically significant (we used a two-tailed Z-test
on two proportions with a confidence level of
95%). Introducing new heuristics to the ConText
rule-based system proved beneficial as the original</p>
      <p>ConText algorithm developed for discharge
summaries dropped from an F1-score of 86.17 to 83.12
(Tables 3 and 6) on the new domain (statistically
significant with a confidence level of 99%).</p>
      <p>As shown in Tables 5 and 8, the machine
learning system dropped in performance from an
F1score of 93.32 to 90.92 (statistically significant
with a confidence level of 99%). The
performance of the SVM classifier was hindered by the
new dataset as the system was trained on
‘genrespecific’ features such as Section Headings. Even
though performance dropped, the machine
learning system still significantly outperformed the
rule-based approach.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Clinical IE systems face the challenge of content
and format differences across narrative sub-genres
and environments. As publicly available clinical
corpora are sparse, the practical question of re-use
of existing training corpora arises. While not all
clinical IE tasks would render themselves to
annotated corpora re-use, the task of contextual
information extraction is common across clinical
subdomains and expectations were that tools
developed using the existing corpora could be ported to
different types of clinical texts.</p>
      <p>We developed two systems using the recently
released i2b2 corpus containing medical problems
annotated with contextual information. The
performance of the two systems was evaluated against
an independent dataset of clinical records from a
different domain - radiology. Performance of the
SVM-based machine learning system deteriorated,
while performance the rule-based system proved
more robust. However, the machine learning
system still significantly outperformed the
rulebased approach. Results suggest that adapting
systems for identifying contextual information can be
avoided as they can be successfully ported to new
clinical domains.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>A.R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program</article-title>
          .
          <source>In Proceedings of the AMIA Symposium, page 17</source>
          . American Medical Informatics Association.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Chih-Chung Chang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          ,
          <year>2001</year>
          .
          <article-title>LIBSVM: a library for support vector machines</article-title>
          . Software available at http://www.csie.ntu. edu.tw/˜cjlin/libsvm.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>W.W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bridewell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.F.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.G.</given-names>
            <surname>Buchanan</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>A simple algorithm for identifying negated findings and diseases in discharge summaries</article-title>
          .
          <source>Journal of biomedical informatics</source>
          ,
          <volume>34</volume>
          (
          <issue>5</issue>
          ):
          <fpage>301</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>W.W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.N.</given-names>
            <surname>Dowling</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>ConText: An algorithm for identifying contextual features from clinical text</article-title>
          .
          <source>In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing</source>
          , pages
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>D.H. Cunningham</surname>
            ,
            <given-names>D.D.</given-names>
          </string-name>
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.K.</given-names>
          </string-name>
          <string-name>
            <surname>Bontcheva</surname>
            , and
            <given-names>M.V.</given-names>
          </string-name>
          <string-name>
            <surname>Tablan</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>GATE: A framework and graphical development environment for robust NLP tools and applications</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>S.M. Meystre</surname>
            ,
            <given-names>G.K.</given-names>
          </string-name>
          <string-name>
            <surname>Savova</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          <string-name>
            <surname>Kipper-Schuler</surname>
          </string-name>
          ,
          <source>and JF Hurdle</source>
          .
          <year>2008</year>
          .
          <article-title>Extracting information from textual documents in the electronic health record: a review of recent research</article-title>
          .
          <source>Yearbook of medical informatics, page 128.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>G.K. Savova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kipper-Schuler</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          <string-name>
            <surname>Buntrock</surname>
            , and
            <given-names>C.G.</given-names>
          </string-name>
          <string-name>
            <surname>Chute</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>UIMA-based Clinical Information Extraction System</article-title>
          .
          <source>In Proc. UIMA for NLP Workshop</source>
          . LREC.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Szarvas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vincze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Farkas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Csirik</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts</article-title>
          .
          <source>In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing</source>
          , pages
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>