ICBO 2014 Proceedings Quality of Care Metric Reporting from Clinical Narratives: Assessing Ontology Components Sina Madani Reza Alemy Department of Clinical Analytics & Informatics School of Health Information Science University of Texas, MD Anderson Cancer Center University of Victoria Houston, TX, USA Victoria, BC, Canada ahmadani@mdanderson.org alemy@uvic.ca Dean F. Sittig, Hua Xu School of Biomedical Informatics University of Texas Health Science Center at Houston Houston, TX, USA Abstract—The Institute of Medicine reports a growing demand in The current “standard” information extraction systems recent years for quality improvement within the healthcare industry. perform at the lexical or statistical layers of the clinical In response, numerous organizations have been involved in the narratives; however, the embedded semantic layers should development and reporting of quality measurement metrics. also be addressed properly in order to enhance the efficiency However, disparate data models from such organizations shift the of such systems. It has been shown in non-healthcare related burden of accurate and reliable metrics extraction and reporting to fields that semantic modeling and ontological approaches can healthcare providers. Furthermore, manual abstraction of quality be used effectively for interoperability operations among metrics and diverse implementation of Electronic Health Record diverse environments [7]. (EHR) systems deepens the complexity of consistent, valid, explicit, and comparable quality measurement reporting within healthcare Development and application of ontologies in the domain provider organizations. The main objective of this research is to of quality measurements have recently become the focus of evaluate an ontology-based information extraction framework to some researchers. Lee et al.[8] evaluated a Virtual Medical utilize unstructured clinical text for extraction and reporting quality Record (VMR) [9] method within the Standard-Based of care metrics that are interpretable and comparable across Sharable Active Guideline Environment (SAGE)[10] for the healthcare institutions. purpose of extraction of cancer quality metrics from EMR systems and concluded that the VMR approach requires additional extensions in order to capture temporal, workflow, Keywords—ontology; information extraction; quality of care and planned procedures concepts. In another short study by metric; clinical narratives Hung [11] ontological modeling was evaluated for National Quality Forum’s endorsed cardiovascular quality metrics. The I. INTRODUCTION analysis was limited to the evaluation of modeling languages, The Institute of Medicine reports a growing demand in identification of high-level domain concepts, and percentage recent years for quality improvement within the healthcare of reference terminology coverage for concept components. industry[1]. In response, numerous organizations have been Soysal et al. [12] developed and evaluated an ontology-driven involved in the development and reporting of quality of care system for information extraction from radiology reports. measurement metrics. However, the quality metrics Their objective was to derive an information model from the development process is subjective in nature [2] and competing narrative texts using an ontology-driven approach and interests exist among stakeholders. As a result, conflicting manually created rules. Performance-wise, they only evaluated data definitions from different sources shift the burden of class relationships extracted from the narrative texts. accurate and reliable quality of care metrics extraction and The real meaning of a concept is relative to the context in reporting to the healthcare providers [3, 4]. Furthermore, which the concept is expressed and, therefore, can be manual abstraction of quality of care metrics [4], diverse represented in different ways in a given ontology. implementation of Electronic Health Record (EHR) Systems Identification of such contexts and their representational [4, 5], and the lack of standards for integration across variations in expression and providing equivalencies among disparate clinical and research data sources [6] deepens the such representations are crucial tasks in any knowledge complexity of consistent, valid, explicit, and comparable modeling and information extraction activity, especially in quality of care extraction and reporting tasks within healthcare clinical expressions where contexts are defined mostly by provider organizations. section headers (like Family Medical History or Assessment). 47 ICBO 2014 Proceedings While transcription departments in relatively large certification, auditing, and training programs. Shiloach et al. hospitals tend to follow standards for documenting section [15] looked into inter-rater reliability metrics and found a headers, healthcare providers are often allowed to create their 1.56% disagreement rate among abstractors of the own versions of section headers in clinical notes. Denny et al. participating hospitals in NSQIP program. NSQIP data also [13] trained a classifier on a dataset of 10,677 clinical notes shows that reliability has been improved with continuous based on boundary detection and manual annotation of section training and auditing since the start of the program in 2005. headers . He reported Precision and Recall of 95.6% and 99% respectively. In another study by Li et al. [14] a Hidden C. Natural Language Processing Engine Markov Model was used for section header classification We implemented the National Institute of Health natural within clinical notes. They labeled sections with 15 pre- language processing engine (MetaMap v2012) [16] that is defined section header categories (like Past Medical History). available for free for research community. A Python script The classifier achieved a per-section and per-note accuracy of pulled clinical notes from EMR repository and submit the text 93% and 70% respectively within a dataset of 9,697 clinical content of each section header, for any given clinical note, to notes. the MetaMap for NLP analysis. In order to reduce the noise in The main objective of this research is to evaluate the output we limited MetaMap processing options to RxNorm ontological components in a natural language processing & SNOMED terminologies, minimum evaluation score of (NLP) system for the purpose of unambiguous extraction of 580, and certain Unified Medical Language System semantic quality of care metrics. Such complementary addition to the group (Disorders) and semantic type (Pharmacologic existing information extraction system helps enterprise data Substance) [17]. One XML file was generated for each note integration more efficiently (time & cost) in terms of (46,835 totals) and contained patient encrypted metadata and unambiguous data exchange and more objective analytics as the NLP results of the section header contents of the note. part of the enterprise reporting system. D. Data Format and Repository Type II. METHODOLOGIES In order to decrease the size of the XML data obtained from the previous step we pruned unwanted XML elements A. Input Data from MetaMap’s output. Subsequently, we converted the The dataset that we received from MD Anderson (MDA) XML files into a RDF format and loaded them into a local Quality Engineering Department included the National instance of AllegroGraph® repository. We also used SPARQL Surgical Quality Improvement Program (NSQIP) data Protocol and RDF Query Language [18] to perform federated elements abstracted from 2,085 patients who had undergone queries across different ontologies and the RDF repository surgery in 2011. It includes a spreadsheet of quality of care (Figure I) metrics, such as patient’s Diabetes or Hypertension, as Boolean values (Yes/No) for each patient. We considered this reported operational dataset as the gold standard for our study. All transcribed documents of the 2,085 patients were extracted from the MDA Electronic Medical Record (EMR) repository (46,835 notes). Python scripting was used to eliminate unwanted characters and extract section headers. A typical clinical note is composed of regions of texts. Each region consists of a section header (like Chief Complaint, History of Present Illness, Physical Exam, etc.) and the relevant content in free text format. B. Metric Selection Abstractors at MDA abstract and report quality of care metrics in the preoperative risk assessment section of the form FIGURE I. NLP PIPELINE & ONTOLOGY COMPONENTS and send them to NSQIP. We have selected the top 5 of these variables in terms of frequency of positive cases (Boolean value=”Yes”) among our gold standard and for the purpose of III. RESULTS our research. These metrics include Diabetes Mellitus, A. Section Header Ontology Hypertension, Transient Ischemic Attack (TIA), Cardiac Surgery, and Nervous System Tumor. In order to evaluate our section header extraction algorithm we randomly selected 500 test notes (100 notes Quality of care metrics are generally documented by from each identified quality of care metric category) and physicians in clinical notes. Abstractors have to read such evaluated for Precision and Recall. Notes were examined by notes and manually extract and report them to NSQIP. It subject matter experts, annotated for section headers, and should be mentioned that abstractors are nursing staff who compared to the automated section header extraction have extensive training in NSQIP abstraction protocols & algorithm. Precision, Recall, and F-measure were calculated as guidelines. They are also actively participating in NSQIP 99%, 97%, and 98% respectively. 48 ICBO 2014 Proceedings In order to build our section header ontology from all D. Evaluation of Quality Metric Extraction extracted section headers we used SKOS narrower and We calculated Precision (P), Recall (R), and Micro F- broader properties for classifying section headers into measure (F) to evaluate the percentage agreement between our hierarchies and closeMatch, and exactMatch properties [19] approach and the gold standard. When there are multiple for assigning synonyms. After getting feedback from subject classes of contingency tables, averaging the evaluation scores matter experts and for SPARQL query purposes each section provides a more general picture of all classes combined. header was categorized as relevant (like Assessment, Medical Micro-averaging is the most common averaging method in History, or Impression) or irrelevant (like Family Medical which each extracted instance is given the same weight. For History, Recommendation, or Complications). each quality of care metric under study we sequentially calculated Precision, Recall, and F-measure in 4 conditions to B. Quality of Care Metric Ontology measure the cumulative effect of the two ontologies and the We identified the root concept for each of the selected negation context on the base NLP output. For a given quality quality of care metrics in SNOMED terminology (Jan 2013 of care metric, we first performed a query and looked for the version) and extracted all of their children (or subtypes). The root quality metric concept like Diabetes Mellitus. We SNOMED root concepts include: Cardiac Surgery Procedure, captured the result of comparing the outcome of this query Tumor of Nervous System, Diabetes Mellitus, Hypertension, with the gold standard as the base NLP output layer and in the and Transient Ischemic Attack. According to the quality of form of Precision, Recall, and F-measure values. Then we care metric definition for Diabetes Mellitus, a patient should included the quality of care metric ontology in our query and also take a diabetes related medication in order to be reported once again calculated agreement measures. We executed our as a diabetic patient. For this purpose, we included diabetes query two more times after adding negation context and mellitus medications in the ontology, with mappings to section ontology to the previous queries and calculated RxNorm, from the same reference [20] that abstractors used to agreement measures twice more (Table II). False Positives and match patient medication with diabetes in their manual Negatives (FP, FN) were calculated when there was a abstraction process. We also reviewed this ontology with disagreement between each query result and the gold standard. abstractors and eliminated irrelevant concepts. For example, concepts like Maternal diabetes mellitus, Gestational diabetes TABLE II. MICRO-AVERAGE RESULTS AFTER ADDITION OF EACH LAYER mellitus, Maternal hypertension, Pre-eclampsia, Renal sclerosis with hypertension, and Diastolic hypertension were Layer TP FP FN TN P R F excluded from the quality of care metric ontology. Base NLP 1099 758 264 8309 0.59 0.81 0.68 +Metric Ont 1256 1029 107 8038 0.55 0.92 0.69 C. Clinical Note Ontology For this ontology we created seven main classes, together with ++ Negation 1253 667 110 8400 0.65 0.92 0.76 their relationships, in Web Ontology Language: Patient, Note, +++Section Ont 1234 427 129 8640 0.74 0.91 0.82 Region, Utterance, Phrase, Mapping, and Negation. All 46,835 RDF instances described in the method section were imported into the clinical note ontology within In order to compare isolated effect of each ontology and AllegroGraph® repository. The number of instances and the negation context on the base NLP output we computed associated data type properties for each class are shown in agreement tests in a non-cumulative mode as well. The micro- Table 1. Including relationships in instance count, the average results of agreement tests for each layer is compared repository contained 70,907,728 triples. We used SPARQL for separately to the gold standard and the difference in F- filtering unwanted concepts (within quality of care metric measure with the base NLP output is calculated (Table III). ontology), negated concepts, and irrelevant sections (within section ontology) from our query results. TABLE III. EFFECT OF EACH ONTOLOGY LAYER ON BASE NLP OUTPUT Difference with TABLE I. CLINICAL NOTE ONTOLOGY COMPONENTS Layer P R F Base NLP Output Clinical Note Ontology Components Base NLP 0.59 0.81 0.68 Class Instance count Data Type properties Metric Ont 0.55 0.92 0.69 0.01 Patient 2,085 Patient Id Negation 0.66 0.88 0.75 0.07 Note 46,835 Note type, date, service, id Section Ont 0.75 0.87 0.80 0.12 Region 475,692 Section header text Utterance 2,343,856 Utterance text IV. DISCUSSION Phrase 11,627,224 Phrase text Recent trends in health care information systems show an increase in requirements for reporting of quality of care Mapping 3,263,338 Semantic type, concept, code, score metrics by health care organizations, specifically for the Negation 535,205 Negation trigger, type, concept, code government mandated programs with huge financial incentives. Healthcare providers consider EMR the best source 49 ICBO 2014 Proceedings for extracting patient information because it most accurately We believe that an ontological approach toward knowledge reflects the process of patient care. Nevertheless, such a modeling and information extraction of quality of care metrics valuable source of data is usually in narrative format, hence, from clinical narratives can provide a unique way of inaccessible for easy structured reports, and highly costly and improving the clarity of meaning by providing necessary time consuming for manual extraction by clinical abstractors. layers of disambiguation, for both human and computational systems. The use of ontology in information extraction system Our study introduced a framework that may contribute to increases the expressivity control of extraction and helps the advances in “complementary” components for the existing disambiguate the retrieved concepts. This study illustrates the information extraction systems. The application of ontology importance of the “complementary” role of ontologies in the components for the NLP system in our study has provided existing natural language processing tools and how they can mechanisms for increasing the performance of such tools. The increase the general performance of the quality metrics pivot point for extracting more meaningful quality of care extraction task. metrics from clinical narratives is the abstraction of contextual semantics hidden in the notes. We have defined some of these Rigorous evaluations are still necessary to ensure the semantics and quantified them in multiple layers in order to quality of these “complementary” NLP systems. Moreover, demonstrate the importance and applicability of an ontology- research is needed for creating and updating evaluation based approach in a quality of care metric extraction system. guideline and criteria for assessment of the performance and The application of ontology components introduces powerful efficacy of ontology-based information extraction in new ways of querying context dependent entities from clinical healthcare and to provide a consistent baseline for the purpose narratives. of comparing alternative approaches. It is apparent that the effect of ontology components on information retrieval metrics (Precision, Recall, F-measure) REFERENCES are largely dependent on the type of the quality of care metric. [1] P. Maurette, and C. A. M. R. Sfa, “To err is human: building a safer health Our study shows ontology layers added to the base NLP system,” Annales Francaises D Anesthesie Et De Reanimation, vol. output, in general, had an increased effect of up to 63% to the 21, no. 6, pp. 453-454, Jun, 2002. [2] R. D. Miller, Miller's anesthesia, 7th ed., p.^pp. 81-2, Philadelphia, PA: performance. The cumulative increase in F-measure was Churchill Livingstone/Elsevier, 2010. highest for Nervous System Tumors, Cardiac Surgery, and [3] P. C. Tang, M. Ralston, M. F. Arrigotti, L. Qureshi, and J. Graham, TIA (63%, 57 %, and 32% respectively) and lowest for “Comparison of methodologies for calculating quality measures Hypertension and Diabetes (9% & 1 % respectively) which based on administrative data versus clinical data from an electronic could be due to the format of representation of these concepts health record system: implications for performance measures,” J Am Med Inform Assoc, vol. 14, no. 1, pp. 10-5, Jan-Feb, 2007. within the clinical narratives. Also, we were able to show and [4] S. Velamuri, “QRDA - Technology Overview and Lessons Learned,” J compare the effects of each ontology and negation context in Healthc Inf Manag, vol. 24, no. 3, pp. 41-8, Summer, 2010. isolation to the base NLP output. It seems section header [5] C. J. McDonald, “The barriers to electronic medical record systems and ontology has a greater effect on the overall F-measure increase how to overcome them,” J Am Med Inform Assoc, vol. 4, no. 3, pp. compared to the negation context and quality of care metric 213-21, May-Jun, 1997. ontology on all quality metrics except for Nervous System [6] Q. Chong, A. Marwadi, K. Supekar, and Y. Lee, “Ontology based metadata management in medical domains,” Journal of Research Tumors and Cardiac Surgery. On a micro-average level, for all and Practice in Information Technology, vol. 35, no. 2, pp. 139- the 5 concepts combined, section header ontology shows 11% 154, 2003. and 5% higher values when compared to the quality of care [7] B. Magoutas, C. Halaris, and G. Mentzas, “An ontology for the multi- metric ontology and negation context respectively. perspective evaluation of quality in e-government services,” Electronic Government, Proceedings, vol. 4656, pp. 318-329, 2007. Our ontology-based framework achieved an overall 0.82 [8] W. N. Lee, S. W. Tu, and A. K. Das, “Extracting cancer quality indicators F-measure (Micro) which may be high enough to be from electronic medical records: evaluation of an ontology-based considered, at minimum, as a decision support tool. Based on virtual medical record approach,” AMIA Annu Symp Proc, vol. 2009, pp. 349-53, 2009. the tolerable false positives or false negatives rates, for a given [9] P. D. Johnson, S. W. Tu, M. Musen, and I. Purves, "A virtual medical information extraction task, this framework can be considered record for guideline-based decision support." p. 294. as an introductory or complementary abstraction method and [10] S. W. Tu, J. R. Campbell, J. Glasgow, M. A. Nyman, R. McClure, J. significantly reduces abstractor’s time for extracting quality of McClay, C. Parker, K. M. Hrabak, D. Berg, and T. Weida, “The care metrics hidden in the clinical narratives. SAGE Guideline Model: achievements and overview,” Journal of the American Medical Informatics Association, vol. 14, no. 5, pp. 589-598, 2007. V. CONCLUSION [11] P. W. Hung, and P. D. Stetson, “Development of a quality measurement ontology in OWL,” AMIA Annu Symp Proc, pp. 984, 2007. We have developed a framework that helps identify [12] E. Soysal, I. Cicekli, and N. Baykal, “Design and evaluation of an contextual semantics within clinical text and extract more ontology based information extraction system for radiological meaningful and unambiguous quality of care metrics for the reports,” Comput Biol Med, vol. 40, no. 11-12, pp. 900-11, Nov- patient care process. Furthermore, by providing bindings to Dec, 2010. [13] J. C. Denny, A. Spickard III, K. B. Johnson, N. B. Peterson, J. F. Peterson, standard terminologies (like SNOMED) the current approach and R. A. Miller, “Evaluation of a method to identify and categorize would help quality of care metric extraction process become section headers in clinical documents,” Journal of the American more objective in nature and deliver structured data for Medical Informatics Association, vol. 16, no. 6, pp. 806-815, 2009. populating clinical warehouses, explicit benchmarking, cohort [14] Y. Li, S. Lipsky Gorman, and N. Elhadad, "Section classification in studies, and other clinical analytics where coded data is vital. clinical notes using supervised hidden markov model." pp. 744-750. 50 ICBO 2014 Proceedings [15] M. Shiloach, S. K. Frencher Jr, J. E. Steeger, K. S. Rowell, K. Bartzokis, [18] "SPARQL Query Language for RDF," 2014; http://www.w3.org/TR/rdf- M. G. Tomeh, K. E. Richards, C. Y. Ko, and B. L. Hall, “Toward sparql-query/. robust information: data quality and inter-rater reliability in the [19] "SKOS Simple Knowledge Organization System Namespace Document - American College of Surgeons National Surgical Quality HTML Variant, 18 August 2009 Recommendation Edition," 2014; Improvement Program,” Journal of the American College of http://www.w3.org/2009/08/skos-reference/skos.html. Surgeons, vol. 210, no. 1, pp. 6-16, 2010. [20] "Patient Handout - Diabetes Medicaiton," 2013; [16] A. R. Aronson, and F. M. Lang, “An overview of MetaMap: historical http://nursing.advanceweb.com/sharedresources/advancefornurses/r perspective and recent advances,” J Am Med Inform Assoc, vol. 17, esources/downloadableresources/n1020303_p32handout.pdf. no. 3, pp. 229-36, May-Jun, 2010. [17] "MetaMap Semantic Groups and Types," 07/10/2014, 2014; http://metamap.nlm.nih.gov/SemanticTypesAndGroups.shtml. 51