ICBO 2014 Proceedings


        Quality of Care Metric Reporting from
Clinical Narratives: Assessing Ontology Components

                          Sina Madani                                                                    Reza Alemy
      Department of Clinical Analytics & Informatics                                        School of Health Information Science
     University of Texas, MD Anderson Cancer Center                                                University of Victoria
                    Houston, TX, USA                                                               Victoria, BC, Canada
                ahmadani@mdanderson.org                                                                alemy@uvic.ca


                                                           Dean F. Sittig, Hua Xu
                                                    School of Biomedical Informatics
                                          University of Texas Health Science Center at Houston
                                                           Houston, TX, USA


Abstract—The Institute of Medicine reports a growing demand in                     The current “standard” information extraction systems
recent years for quality improvement within the healthcare industry.           perform at the lexical or statistical layers of the clinical
In response, numerous organizations have been involved in the                  narratives; however, the embedded semantic layers should
development and reporting of quality measurement metrics.                      also be addressed properly in order to enhance the efficiency
However, disparate data models from such organizations shift the               of such systems. It has been shown in non-healthcare related
burden of accurate and reliable metrics extraction and reporting to            fields that semantic modeling and ontological approaches can
healthcare providers. Furthermore, manual abstraction of quality               be used effectively for interoperability operations among
metrics and diverse implementation of Electronic Health Record                 diverse environments [7].
(EHR) systems deepens the complexity of consistent, valid, explicit,
and comparable quality measurement reporting within healthcare                     Development and application of ontologies in the domain
provider organizations. The main objective of this research is to              of quality measurements have recently become the focus of
evaluate an ontology-based information extraction framework to                 some researchers. Lee et al.[8] evaluated a Virtual Medical
utilize unstructured clinical text for extraction and reporting quality        Record (VMR) [9] method within the Standard-Based
of care metrics that are interpretable and comparable across                   Sharable Active Guideline Environment (SAGE)[10] for the
healthcare institutions.                                                       purpose of extraction of cancer quality metrics from EMR
                                                                               systems and concluded that the VMR approach requires
                                                                               additional extensions in order to capture temporal, workflow,
   Keywords—ontology; information extraction; quality of care                  and planned procedures concepts. In another short study by
metric; clinical narratives
                                                                               Hung [11] ontological modeling was evaluated for National
                                                                               Quality Forum’s endorsed cardiovascular quality metrics. The
                        I. INTRODUCTION                                        analysis was limited to the evaluation of modeling languages,
    The Institute of Medicine reports a growing demand in                      identification of high-level domain concepts, and percentage
recent years for quality improvement within the healthcare                     of reference terminology coverage for concept components.
industry[1]. In response, numerous organizations have been                     Soysal et al. [12] developed and evaluated an ontology-driven
involved in the development and reporting of quality of care                   system for information extraction from radiology reports.
measurement metrics. However, the quality metrics                              Their objective was to derive an information model from the
development process is subjective in nature [2] and competing                  narrative texts using an ontology-driven approach and
interests exist among stakeholders. As a result, conflicting                   manually created rules. Performance-wise, they only evaluated
data definitions from different sources shift the burden of                    class relationships extracted from the narrative texts.
accurate and reliable quality of care metrics extraction and                       The real meaning of a concept is relative to the context in
reporting to the healthcare providers [3, 4]. Furthermore,                     which the concept is expressed and, therefore, can be
manual abstraction of quality of care metrics [4], diverse                     represented in different ways in a given ontology.
implementation of Electronic Health Record (EHR) Systems                       Identification of such contexts and their representational
[4, 5], and the lack of standards for integration across                       variations in expression and providing equivalencies among
disparate clinical and research data sources [6] deepens the                   such representations are crucial tasks in any knowledge
complexity of consistent, valid, explicit, and comparable                      modeling and information extraction activity, especially in
quality of care extraction and reporting tasks within healthcare               clinical expressions where contexts are defined mostly by
provider organizations.                                                        section headers (like Family Medical History or Assessment).


                                                                          47
                                                   ICBO 2014 Proceedings

    While transcription departments in relatively large                 certification, auditing, and training programs. Shiloach et al.
hospitals tend to follow standards for documenting section              [15] looked into inter-rater reliability metrics and found a
headers, healthcare providers are often allowed to create their         1.56% disagreement rate among abstractors of the
own versions of section headers in clinical notes. Denny et al.         participating hospitals in NSQIP program. NSQIP data also
[13] trained a classifier on a dataset of 10,677 clinical notes         shows that reliability has been improved with continuous
based on boundary detection and manual annotation of section            training and auditing since the start of the program in 2005.
headers . He reported Precision and Recall of 95.6% and 99%
respectively. In another study by Li et al. [14] a Hidden               C. Natural Language Processing Engine
Markov Model was used for section header classification                     We implemented the National Institute of Health natural
within clinical notes. They labeled sections with 15 pre-               language processing engine (MetaMap v2012) [16] that is
defined section header categories (like Past Medical History).          available for free for research community. A Python script
The classifier achieved a per-section and per-note accuracy of          pulled clinical notes from EMR repository and submit the text
93% and 70% respectively within a dataset of 9,697 clinical             content of each section header, for any given clinical note, to
notes.                                                                  the MetaMap for NLP analysis. In order to reduce the noise in
    The main objective of this research is to evaluate                  the output we limited MetaMap processing options to RxNorm
ontological components in a natural language processing                 & SNOMED terminologies, minimum evaluation score of
(NLP) system for the purpose of unambiguous extraction of               580, and certain Unified Medical Language System semantic
quality of care metrics. Such complementary addition to the             group (Disorders) and semantic type (Pharmacologic
existing information extraction system helps enterprise data            Substance) [17]. One XML file was generated for each note
integration more efficiently (time & cost) in terms of                  (46,835 totals) and contained patient encrypted metadata and
unambiguous data exchange and more objective analytics as               the NLP results of the section header contents of the note.
part of the enterprise reporting system.
                                                                        D. Data Format and Repository Type
                    II. METHODOLOGIES                                       In order to decrease the size of the XML data obtained
                                                                        from the previous step we pruned unwanted XML elements
A. Input Data                                                           from MetaMap’s output. Subsequently, we converted the
   The dataset that we received from MD Anderson (MDA)                  XML files into a RDF format and loaded them into a local
Quality Engineering Department included the National                    instance of AllegroGraph® repository. We also used SPARQL
Surgical Quality Improvement Program (NSQIP) data                       Protocol and RDF Query Language [18] to perform federated
elements abstracted from 2,085 patients who had undergone               queries across different ontologies and the RDF repository
surgery in 2011. It includes a spreadsheet of quality of care           (Figure I)
metrics, such as patient’s Diabetes or Hypertension, as
Boolean values (Yes/No) for each patient. We considered this
reported operational dataset as the gold standard for our study.
    All transcribed documents of the 2,085 patients were
extracted from the MDA Electronic Medical Record (EMR)
repository (46,835 notes). Python scripting was used to
eliminate unwanted characters and extract section headers. A
typical clinical note is composed of regions of texts. Each
region consists of a section header (like Chief Complaint,
History of Present Illness, Physical Exam, etc.) and the
relevant content in free text format.

B. Metric Selection
    Abstractors at MDA abstract and report quality of care
metrics in the preoperative risk assessment section of the form               FIGURE I. NLP PIPELINE & ONTOLOGY COMPONENTS
and send them to NSQIP. We have selected the top 5 of these
variables in terms of frequency of positive cases (Boolean
value=”Yes”) among our gold standard and for the purpose of                                     III. RESULTS
our research. These metrics include Diabetes Mellitus,
                                                                        A. Section Header Ontology
Hypertension, Transient Ischemic Attack (TIA), Cardiac
Surgery, and Nervous System Tumor.                                         In order to evaluate our section header extraction
                                                                        algorithm we randomly selected 500 test notes (100 notes
   Quality of care metrics are generally documented by                  from each identified quality of care metric category) and
physicians in clinical notes. Abstractors have to read such             evaluated for Precision and Recall. Notes were examined by
notes and manually extract and report them to NSQIP. It                 subject matter experts, annotated for section headers, and
should be mentioned that abstractors are nursing staff who              compared to the automated section header extraction
have extensive training in NSQIP abstraction protocols &                algorithm. Precision, Recall, and F-measure were calculated as
guidelines. They are also actively participating in NSQIP               99%, 97%, and 98% respectively.


                                                                   48
                                                                  ICBO 2014 Proceedings

    In order to build our section header ontology from all                      D. Evaluation of Quality Metric Extraction
extracted section headers we used SKOS narrower and                                 We calculated Precision (P), Recall (R), and Micro F-
broader properties for classifying section headers into                         measure (F) to evaluate the percentage agreement between our
hierarchies and closeMatch, and exactMatch properties [19]                      approach and the gold standard. When there are multiple
for assigning synonyms. After getting feedback from subject                     classes of contingency tables, averaging the evaluation scores
matter experts and for SPARQL query purposes each section                       provides a more general picture of all classes combined.
header was categorized as relevant (like Assessment, Medical                    Micro-averaging is the most common averaging method in
History, or Impression) or irrelevant (like Family Medical                      which each extracted instance is given the same weight. For
History, Recommendation, or Complications).                                     each quality of care metric under study we sequentially
                                                                                calculated Precision, Recall, and F-measure in 4 conditions to
B. Quality of Care Metric Ontology                                              measure the cumulative effect of the two ontologies and the
    We identified the root concept for each of the selected                     negation context on the base NLP output. For a given quality
quality of care metrics in SNOMED terminology (Jan 2013                         of care metric, we first performed a query and looked for the
version) and extracted all of their children (or subtypes). The                 root quality metric concept like Diabetes Mellitus. We
SNOMED root concepts include: Cardiac Surgery Procedure,                        captured the result of comparing the outcome of this query
Tumor of Nervous System, Diabetes Mellitus, Hypertension,                       with the gold standard as the base NLP output layer and in the
and Transient Ischemic Attack. According to the quality of                      form of Precision, Recall, and F-measure values. Then we
care metric definition for Diabetes Mellitus, a patient should                  included the quality of care metric ontology in our query and
also take a diabetes related medication in order to be reported                 once again calculated agreement measures. We executed our
as a diabetic patient. For this purpose, we included diabetes                   query two more times after adding negation context and
mellitus medications in the ontology, with mappings to                          section ontology to the previous queries and calculated
RxNorm, from the same reference [20] that abstractors used to                   agreement measures twice more (Table II). False Positives and
match patient medication with diabetes in their manual                          Negatives (FP, FN) were calculated when there was a
abstraction process. We also reviewed this ontology with                        disagreement between each query result and the gold standard.
abstractors and eliminated irrelevant concepts. For example,
concepts like Maternal diabetes mellitus, Gestational diabetes                  TABLE II.       MICRO-AVERAGE RESULTS AFTER ADDITION OF EACH LAYER
mellitus, Maternal hypertension, Pre-eclampsia, Renal
sclerosis with hypertension, and Diastolic hypertension were                          Layer          TP      FP         FN     TN     P        R        F
excluded from the quality of care metric ontology.                                   Base NLP       1099     758        264   8309   0.59     0.81     0.68

                                                                                    +Metric Ont     1256     1029       107   8038   0.55     0.92     0.69
C. Clinical Note Ontology
For this ontology we created seven main classes, together with                    ++ Negation       1253     667        110   8400   0.65     0.92     0.76

their relationships, in Web Ontology Language: Patient, Note,                     +++Section Ont    1234     427        129   8640   0.74     0.91     0.82
Region, Utterance, Phrase, Mapping, and Negation. All
46,835 RDF instances described in the method section were
imported into the clinical note ontology within                                     In order to compare isolated effect of each ontology and
AllegroGraph® repository. The number of instances and                           the negation context on the base NLP output we computed
associated data type properties for each class are shown in                     agreement tests in a non-cumulative mode as well. The micro-
Table 1. Including relationships in instance count, the                         average results of agreement tests for each layer is compared
repository contained 70,907,728 triples. We used SPARQL for                     separately to the gold standard and the difference in F-
filtering unwanted concepts (within quality of care metric                      measure with the base NLP output is calculated (Table III).
ontology), negated concepts, and irrelevant sections (within
section ontology) from our query results.                                        TABLE III.     EFFECT OF EACH ONTOLOGY LAYER ON BASE NLP OUTPUT
                                                                                                                                           Difference with
            TABLE I.        CLINICAL NOTE ONTOLOGY COMPONENTS                         Layer           P             R          F
                                                                                                                                          Base NLP Output
                            Clinical Note Ontology Components                     Base NLP           0.59       0.81          0.68
    Class
                 Instance count                 Data Type properties
                                                                                  Metric Ont         0.55       0.92          0.69              0.01
  Patient               2,085      Patient Id
                                                                                  Negation           0.66       0.88          0.75              0.07
  Note                 46,835      Note type, date, service, id
                                                                                  Section Ont        0.75       0.87          0.80              0.12
  Region               475,692     Section header text

  Utterance        2,343,856       Utterance text                                                          IV. DISCUSSION
  Phrase          11,627,224       Phrase text                                      Recent trends in health care information systems show an
                                                                                increase in requirements for reporting of quality of care
  Mapping          3,263,338       Semantic type, concept, code, score          metrics by health care organizations, specifically for the
  Negation             535,205     Negation trigger, type, concept, code        government mandated programs with huge financial
                                                                                incentives. Healthcare providers consider EMR the best source


                                                                           49
                                                     ICBO 2014 Proceedings

for extracting patient information because it most accurately                  We believe that an ontological approach toward knowledge
reflects the process of patient care. Nevertheless, such a                 modeling and information extraction of quality of care metrics
valuable source of data is usually in narrative format, hence,             from clinical narratives can provide a unique way of
inaccessible for easy structured reports, and highly costly and            improving the clarity of meaning by providing necessary
time consuming for manual extraction by clinical abstractors.              layers of disambiguation, for both human and computational
                                                                           systems. The use of ontology in information extraction system
    Our study introduced a framework that may contribute to                increases the expressivity control of extraction and helps
the advances in “complementary” components for the existing                disambiguate the retrieved concepts. This study illustrates the
information extraction systems. The application of ontology                importance of the “complementary” role of ontologies in the
components for the NLP system in our study has provided                    existing natural language processing tools and how they can
mechanisms for increasing the performance of such tools. The               increase the general performance of the quality metrics
pivot point for extracting more meaningful quality of care                 extraction task.
metrics from clinical narratives is the abstraction of contextual
semantics hidden in the notes. We have defined some of these                   Rigorous evaluations are still necessary to ensure the
semantics and quantified them in multiple layers in order to               quality of these “complementary” NLP systems. Moreover,
demonstrate the importance and applicability of an ontology-               research is needed for creating and updating evaluation
based approach in a quality of care metric extraction system.              guideline and criteria for assessment of the performance and
The application of ontology components introduces powerful                 efficacy of ontology-based information extraction in
new ways of querying context dependent entities from clinical              healthcare and to provide a consistent baseline for the purpose
narratives.                                                                of comparing alternative approaches.
    It is apparent that the effect of ontology components on
information retrieval metrics (Precision, Recall, F-measure)                                               REFERENCES
are largely dependent on the type of the quality of care metric.           [1] P. Maurette, and C. A. M. R. Sfa, “To err is human: building a safer health
Our study shows ontology layers added to the base NLP                                  system,” Annales Francaises D Anesthesie Et De Reanimation, vol.
output, in general, had an increased effect of up to 63% to the                        21, no. 6, pp. 453-454, Jun, 2002.
                                                                           [2] R. D. Miller, Miller's anesthesia, 7th ed., p.^pp. 81-2, Philadelphia, PA:
performance. The cumulative increase in F-measure was                                  Churchill Livingstone/Elsevier, 2010.
highest for Nervous System Tumors, Cardiac Surgery, and                    [3] P. C. Tang, M. Ralston, M. F. Arrigotti, L. Qureshi, and J. Graham,
TIA (63%, 57 %, and 32% respectively) and lowest for                                   “Comparison of methodologies for calculating quality measures
Hypertension and Diabetes (9% & 1 % respectively) which                                based on administrative data versus clinical data from an electronic
could be due to the format of representation of these concepts                         health record system: implications for performance measures,” J
                                                                                       Am Med Inform Assoc, vol. 14, no. 1, pp. 10-5, Jan-Feb, 2007.
within the clinical narratives. Also, we were able to show and             [4] S. Velamuri, “QRDA - Technology Overview and Lessons Learned,” J
compare the effects of each ontology and negation context in                           Healthc Inf Manag, vol. 24, no. 3, pp. 41-8, Summer, 2010.
isolation to the base NLP output. It seems section header                  [5] C. J. McDonald, “The barriers to electronic medical record systems and
ontology has a greater effect on the overall F-measure increase                        how to overcome them,” J Am Med Inform Assoc, vol. 4, no. 3, pp.
compared to the negation context and quality of care metric                            213-21, May-Jun, 1997.
ontology on all quality metrics except for Nervous System                  [6] Q. Chong, A. Marwadi, K. Supekar, and Y. Lee, “Ontology based
                                                                                       metadata management in medical domains,” Journal of Research
Tumors and Cardiac Surgery. On a micro-average level, for all                          and Practice in Information Technology, vol. 35, no. 2, pp. 139-
the 5 concepts combined, section header ontology shows 11%                             154, 2003.
and 5% higher values when compared to the quality of care                  [7] B. Magoutas, C. Halaris, and G. Mentzas, “An ontology for the multi-
metric ontology and negation context respectively.                                     perspective evaluation of quality in e-government services,”
                                                                                       Electronic Government, Proceedings, vol. 4656, pp. 318-329, 2007.
    Our ontology-based framework achieved an overall 0.82                  [8] W. N. Lee, S. W. Tu, and A. K. Das, “Extracting cancer quality indicators
F-measure (Micro) which may be high enough to be                                       from electronic medical records: evaluation of an ontology-based
considered, at minimum, as a decision support tool. Based on                           virtual medical record approach,” AMIA Annu Symp Proc, vol.
                                                                                       2009, pp. 349-53, 2009.
the tolerable false positives or false negatives rates, for a given        [9] P. D. Johnson, S. W. Tu, M. Musen, and I. Purves, "A virtual medical
information extraction task, this framework can be considered                          record for guideline-based decision support." p. 294.
as an introductory or complementary abstraction method and                 [10] S. W. Tu, J. R. Campbell, J. Glasgow, M. A. Nyman, R. McClure, J.
significantly reduces abstractor’s time for extracting quality of                      McClay, C. Parker, K. M. Hrabak, D. Berg, and T. Weida, “The
care metrics hidden in the clinical narratives.                                        SAGE Guideline Model: achievements and overview,” Journal of
                                                                                       the American Medical Informatics Association, vol. 14, no. 5, pp.
                                                                                       589-598, 2007.
                       V. CONCLUSION                                       [11] P. W. Hung, and P. D. Stetson, “Development of a quality measurement
                                                                                       ontology in OWL,” AMIA Annu Symp Proc, pp. 984, 2007.
    We have developed a framework that helps identify                      [12] E. Soysal, I. Cicekli, and N. Baykal, “Design and evaluation of an
contextual semantics within clinical text and extract more                             ontology based information extraction system for radiological
meaningful and unambiguous quality of care metrics for the                             reports,” Comput Biol Med, vol. 40, no. 11-12, pp. 900-11, Nov-
patient care process. Furthermore, by providing bindings to                            Dec, 2010.
                                                                           [13] J. C. Denny, A. Spickard III, K. B. Johnson, N. B. Peterson, J. F. Peterson,
standard terminologies (like SNOMED) the current approach                              and R. A. Miller, “Evaluation of a method to identify and categorize
would help quality of care metric extraction process become                            section headers in clinical documents,” Journal of the American
more objective in nature and deliver structured data for                               Medical Informatics Association, vol. 16, no. 6, pp. 806-815, 2009.
populating clinical warehouses, explicit benchmarking, cohort              [14] Y. Li, S. Lipsky Gorman, and N. Elhadad, "Section classification in
studies, and other clinical analytics where coded data is vital.                       clinical notes using supervised hidden markov model." pp. 744-750.


                                                                      50
                                                               ICBO 2014 Proceedings

[15] M. Shiloach, S. K. Frencher Jr, J. E. Steeger, K. S. Rowell, K. Bartzokis,        [18] "SPARQL Query Language for RDF," 2014; http://www.w3.org/TR/rdf-
           M. G. Tomeh, K. E. Richards, C. Y. Ko, and B. L. Hall, “Toward                          sparql-query/.
           robust information: data quality and inter-rater reliability in the         [19] "SKOS Simple Knowledge Organization System Namespace Document -
           American College of Surgeons National Surgical Quality                                  HTML Variant, 18 August 2009 Recommendation Edition," 2014;
           Improvement Program,” Journal of the American College of                                http://www.w3.org/2009/08/skos-reference/skos.html.
           Surgeons, vol. 210, no. 1, pp. 6-16, 2010.                                  [20] "Patient       Handout       -    Diabetes      Medicaiton,"     2013;
[16] A. R. Aronson, and F. M. Lang, “An overview of MetaMap: historical                            http://nursing.advanceweb.com/sharedresources/advancefornurses/r
           perspective and recent advances,” J Am Med Inform Assoc, vol. 17,                       esources/downloadableresources/n1020303_p32handout.pdf.
           no. 3, pp. 229-36, May-Jun, 2010.
[17] "MetaMap Semantic Groups and Types," 07/10/2014, 2014;
           http://metamap.nlm.nih.gov/SemanticTypesAndGroups.shtml.


                                                                                  51