ICBO 2014 Proceedings Post-Traumatic Stress Disorder (PTSD) Ontology and Use Case Bryan Travis Gamblea,b*, Matthew Brusha, Thomas Rindflesche Aaron Cohena, Melissa Haendela e National Library of Medicine, a Oregon Health & Science University, Portland, OR, USA National Institutes of Health, Bethesda, MD, USA gamble@ohsu.edu* Maryan Zirkleb, Dezon Finchc, Ruth Reevesd, Samah Jamal Fodehf,g*, Jonathan Batesf, David Hickhamb, Stephen L Lutherc Cynthia Brandtf,g, Kei-Hoi Cheungf,g* b f Portland VA Healthcare System, Portland, OR,USA VA Connecticut Healthcare System, West Haven, CT, USA c g James A Haley Veterans Hospital, Tampa, FL, USA Yale School of Medicine, Yale University, d Tennessee Valley VA Healthcare System, New Haven, CT, USA Nashville, Tennessee, USA samah.fodeh@yale.edu*, kei.cheung@yale.edu* Abstract—Ontologies play an increasingly important role in impacts the physical integrity or life of the individual or of annotation, integration, and analysis of biomedical data. In this another person [3]. It is considered normal for an individual to paper, we describe the design and development of a Post- have a strong reaction to a traumatic event but the effects Traumatic Stress Disorder (PTSD) Ontology and how we can use should decrease over time when the threat is no longer present. this ontology as a controlled vocabulary for supporting automatic However, people with PTSD continue to experience extreme annotation of clinical text. The automated annotation is reactions and symptoms even after the trauma is no longer performed using a natural language processing (NLP) tool called present [4]. According to the National Center for PTSD, 7-8% YTEX. In addition, we demonstrate how we can use the concepts of the population in the U.S. will have a form of this disorder at and relationships defined in the PTSD Ontology to perform data some point in their lives [5]. summarization and categorization. Much of the significant clinical details of a health condition Keywords— PTSD, mental disorder, natural language is usually recorded in unstructured clinical notes as part of the processing, data categorization, clinical note analysis electronic health records (EHR). This clinically useful information is typically abstracted using natural language I. INTRODUCTION processing (NLP) and machine learning techniques. However, Ontology development is motivated by providing semantic because these automated methods are blind to the sublanguages context, automated reasoning and annotation, data mining and used to describe the various health conditions in the notes, the analysis, and decision-making support. In addition to knowledge coded in domain-specific ontologies can serve as a ontological efforts at the Unified Medical Language System useful guide to the automated process of data abstraction. As (UMLS; http://www.nlm.nih.gov/research/umls), the National similar consortiums have driven toward standardizing the Center for Biomedical Ontologies (NCBO) has developed a representation of phenotype data in other domains, such as repository called “BioPortal” [1] that allows both manual and efforts with eMERGE (electronic MEdical Records and programmatic access to several hundreds of biomedical GEnomics:https://www.mc.vanderbilt.edu/victr/dcc/projects/ac ontologies including some of those from the UMLS. To c/index.php/Main_Page) [6], we are putting forth similar promote quality and standard practice, the Open Biological and interventions to improve the limited existing coverage of Biomedical Ontologies (OBO) Foundry [2] has established a PTSD. set of principles for ontology development with the goal of In this paper, we provide a use case showing how the PTSD creating a suite of orthogonal interoperable reference Ontology can be used to support automatic annotation of ontologies in the biomedical domain. While the development clinical text. In addition, we discuss how the PTSD concepts of ontologies is growing and maturing, there is still a need for and relationships can be used to perform data categorization. expanding existing ontologies or developing new interoperable ontologies that describe new domains of knowledge in the II. ONTOLOGY DEVELOPMENT biomedical domain. In addition, the utility of ontologies in clinical or health information applications has not yet been A. PTSD Ontology fully demonstrated. Knowledge representation of PTSD is limited by In this paper, we describe the development of a new heterogeneous yet overlapping terminologies and the ontology in the domain of Post-Traumatic Stress Disorder subjective narrative patient information in electronic clinical (PTSD). The American Psychiatric Association defines PTSD notes. The lack of contextual analysis with this unstructured as a condition occurring from exposure to a trauma that evidence is a barrier to the understanding of PTSD symptoms * Co-first-authors who contributed equally. Correspondence should be addressed to Bryan Gamble, Samah Jamal Fodeh, and Kei-Hoi Cheung 56 ICBO 2014 Proceedings and the evaluation of treatment effectiveness. To surmount C. Ontological structure and content these hurdles, we are developing an ontology to capture The current in progress version of the PTSD Ontology is knowledge relevant to PTSD symptoms and treatments. The available for public download at http://code.google.com/p/ptsd- PTSD Ontology is being built to share domain knowledge of ontology/. Coverage of the ontology is currently directed to relevant concepts in a formal framework representation and variations in symptoms and treatments. While the work is on- capture the semantic relationships between those concepts. For going, the purpose of the conceptual PTSD model design is to instance, the semantic relation isA represents subclasses of support: 1) retrieval, collection, and sharing of information; 2) specific PTSD symptom clusters defined within the natural language processing (NLP) tasks; and 3) ontology- framework. The ontology is being developed to specify the driven information extraction (IE) for automated accumulation concepts, relationships, instances, and axioms explicitly to of symptoms and treatments located within the narrative enable more precise search and reasoning about this data. Such portion of a patient’s EHR encounter data. The ontology is an ontological framework allows domain knowledge to be being designed to account for a wide range of treatments and to shared and reused across applications. recognize the specificity and intensity of symptoms. Currently, the PTSD Ontology consists of 219 symptom classes and 367 B. Design elements and principles treatment classes. PTSD symptoms are arranged in clusters This ontology is developed in the Web Ontology Language according to definitions in the Diagnostic and Statistical (OWL) using Protégé Version 4.3. Our approach plans to make Manual of Mental Disorders, 5th ed. [3]. Clusters include use of existing knowledge bases and ontologies via an stressors, intrusion symptoms, avoidance and numbing, ontological import. Currently, the PTSD Ontology includes negative alterations in cognitions and mood, alterations in imports from existing data collections such as Systematized arousal and reactivity, functional significance and dissociative Nomenclature of Medicine - Clinical Terms (SNOMED-CT: symptoms. A subset of avoidance subclasses is displayed in http://purl.bioontology.org/ontology/SNOMEDCT), Symptom Figure 1. It is important to semantically distinguish these Ontology (SYMP:http://purl.bioontology.org/ontology/SYMP), variations in symptoms as they translate directly into the Ontology of General Medical Science (OGMS: diagnosis of disease and type and breadth of clinical care. http://purl.bioontology.org/ontology/OGMS), National Cancer While the variations in symptoms are applicable to multiple Institute (NCIT: http://purl.bioontology.org/ontology/NCIT), cohorts, the context of this framework is derived from adult Medical Dictionary for Regulatory Activities (MEDDRA: patients with traumatic stress reaction treated in a Veterans http://purl.bioontology.org/ontology/MEDDRA), and the Healthcare Administration (VHA) clinical setting. This upper-level Basic Formal Ontology (BFO: symptom grouping establishes parameters necessary for the http://purl.bioontology.org/ontology/BFO) among other semantic understanding of assessment, diagnosis, and resources [1]. management of symptoms. Similarly, concepts describing treatment interventions are arranged in categories designated in Our design of the PTSD Ontology follows the principles of the Veteran Affairs/Department of Defense (VA/DoD) PTSD the OBO Foundry [2] in order to ensure interoperability with evidence-based practice management guidelines [8]. the existing reference ontologies. The PTSD Ontology shares many concepts that exist in other ontologies and reference Five primary therapeutic categories of pharmacotherapy, terminologies. For example, the PTSD Ontology contains psychological, psycho educational, psychosocial, and case concepts defined in the Symptoms Ontology but with added management are shown in Figure 2. Knowledge about the context of traumatic exposures that accompany both behavioral variations in available prescribed treatments for PTSD can and physiological symptomatology spectrum. An application further enhance our ability to comparatively evaluate their that was supported with Symptoms Ontology as a reference relationships and effectiveness on treating the symptoms of this terminology would be interoperable with similar but modified illness. This organization provides structure for symptom- concepts defined in the PTSD Ontology. The goal is for other researchers to be able to implement relevant concepts and relationships in order to systematically share, reuse, and alleviate inconsistencies in disparate data sets across the PTSD community. Controlled vocabulary resources, literature reviews and expert panels form the building blocks of our ontological foundation. The existing coverage was excellent for building a terminology base but was limited and inadequate in completeness to meet the needs for our current and future use case implementations. Lastly, for increased coverage, the PTSD Ontology incorporated annotations with symptom and treatment terms extracted from mental health notes of patients with PTSD extracted from the Veterans’ Health Administration Corporate Data Warehouse (VHA CDW). Our ontological contents (including concepts, synonyms, relationships and their hierarchical organization) has been validated by clinicians and PTSD domain experts. Fig.1Subset of avoidance subclasses. 57 ICBO 2014 Proceedings and retrieving the terms and relationships we wanted using SPARQL queries. The query output was produced in the tab- delimited format. A. NLP annotation use case Projects like Annotator (http://bioportal.bioontology.org/ annotator) and ODIE (http://bioontology.stanford.edu/ODIE- project) enable the use of biomedical ontologies in natural language processing (NLP). In the context of clinical NLP, YTEX [9] was used to automatically annotate clinical notes involving different medical conditions (e.g., fall and lung cancer). YTEX is an extension to the clinical Text Analysis and Knowledge Extraction System (cTAKES) to derive robust feature sets from NLP pipelines [10]. The output components generated by YTEX include words, concepts, phrases, sentences and annotations of concepts, which can be stored in a Fig. 2. Subset of primary therapeutic categories classes relational database. Among these components, only concepts were extracted and used as features. The dictionary component specific management supporting precision of information of YTEX is composed of UMLS clinical concepts; it feeds into retrieval and classification. The ontology can be customized to the named entity recognition module to annotate concepts. This support personalized therapeutic approaches when treating dictionary can be customized and its contents can be replaced heterogeneous symptoms that persist at the individual patient- by the vocabulary of the user’s interest. Since the goal of this level. The hierarchical arrangement of symptoms and study is to identify only PTSD treatment and symptom related treatments allows representation of data using parent/child concepts and because the UMLS includes non-PTSD concepts relationships and fosters organization of information to from a variety of sources, we replaced the built-in dictionary of leverage automated retrieval. Subclass relations establish YTEX with a pared-down set of concepts obtained from the hierarchical relationships between classes, while other PTSD Ontology. properties are used to classify data along other axes. Figure 3 shows some of the classes and a high-level overview of An excerpt from an actual outpatient progress note for a treatment classes establishing the “treats” property displaying patient with PTSD is shown in the YTEX annotation viewer in this non-hierarchical relationship with specific symptoms Figure 4. Spans of text that were mapped to concepts described in the ontology. As the gaps in current understanding (symptoms or treatments) in the PTSD Ontology vocabulary of the disorder are addressed, it is important for our structure to are highlighted by the annotation viewer. set parameters that foster contextual collaboration. The framework of the PTSD Ontology aids establishing a B. Data analysis consensus on the semantic understanding of terms and The developed controlled vocabulary for PTSD will serve relationships used to describe this disorder. as valid components for data categorization of PTSD treatments. As described in Section III.A, the vocabulary III. USE CASE encoded in the newly established PTSD Ontology was utilized In this section, we discuss two applications of the PTSD to detect treatments of PTSD mentioned in clinical notes. The Ontology: automatic annotation of PTSD clinical notes and treatment concepts detected and extracted for each clinical note categorization of notes’ contents based on the hierarchical were used to compose a bag of concepts (BOC) representation relationships defined in the ontology. A subset of the PTSD of the notes. In this representation the concepts are arranged in Ontology was obtained by loading the ontology into Protégé a matrix where the rows are the clinical notes and the columns are the treatment concepts. This representation can be effectively utilized in subsequent machine learning and data analysis tasks. In this work, we demonstrate the utility of the PTSD Ontology by building a condensed representation of clinical notes using the ontology’s hierarchical relationships. Fig. 3. Established relationships between classes. Fig. 4. Concept annotations generated by YTEX. 58 ICBO 2014 Proceedings The advantage of the new data representation is a reduction ontologies refer to universals, where clinical histories consist in dimensionality i.e. the size of the feature set (concepts) to overwhelmingly of representational units that refer to instances reduce complexity in the analysis of large volumes of notes. [11].” The subjective nature of symptom identification can be Because the BOC extracted from YTEX output (see Section problematic for ontology development within the PTSD III.A) contains a variety of concepts, some of which are domain inasmuch as the same symptoms may be associated relevant to our interests and others not, we use the hierarchical with disparate formal diagnoses and treatment structure of the PTSD Ontology to help remove the irrelevant recommendations. The development of description logic within concepts. We present a process to transform the concepts ontology, customized to the domain can help overcome another extracted from the notes into a more general, less granular set important obstacle to data interoperability in mental health of concepts by integrating knowledge from the PTSD research: the use of different assessments and scales for Ontology. This transforms the representation of the text notes measuring symptoms and assisting in diagnosis [12]. from concepts to more abstract categories. The benefit of this transformation is three-fold: First, it reduces the complexity Validation, feature reduction, and identification of PTSD and sparsity of data analysis by decreasing the dimensionality clinical data categorization are currently underway. Future of the space. Second, it provides a focused/targeted analysis of analysis will compare ontology coverage and accuracy with existing terminologies including the UMLS. Continued the notes by removing the features that do not belong to the categories of our interest and not relevant to the clinical use- development will assist in analyses of clinical relationships case which could obscure the analysis. Third, it may reveal between symptoms and treatments. The PTSD Ontology can new categories to capture and conceptualize the data for better potentially facilitate research collaborations on varied understanding. We transformed the BOC representation of the assessments and structured interviews. Ontological PTSD clinical notes to the Bag Of Categories (BOCat) representation and reasoning may help improve prediction, representation, where the categories are the types of PTSD prognosis, and understanding of this complex disorder. treatments. A significant reduction of dimensionality is achieved using BOCat. In the BOC representation, there were ACKNOWLEDGMENT 367 concepts to describe the notes whereas in the BOCat We gratefully acknowledge support from Oregon Health & representation, the feature space is compressed into a higher Science University, the Department of Veteran Affairs ontological level consisting of 6 treatment categories Healthcare System, and Yale University School of Medicine. (dimensions) and the notes are described using these higher This study was supported in part by the Intramural Research level PTSD treatment concepts in the ontology. To discard Program of the National Institutes of Health, National Library irrelevant concepts i.e. symptoms from the BOC and generate of Medicine the focused BOCat of treatment only, each concept is mapped to its treatment category using the hierarchical relationships in the PTSD Ontology. This forms a filter, wherein concepts that REFERENCES do not belong to a treatment type are dropped from the [1] Noy,N.F. et al. (2009) BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic acids research. 2, W170-173. analysis. In addition to noise and dimensionality reduction, the BOCat representation assigned weights to each category of [2] Smith,B et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature biotechnology. 11, 1251-1255. treatments in the notes. The weight of a treatment type is [3] American Psychiatric Association. (2013) Diagnostic & Statistical Manual of calculated for a particular clinical note by summing the Mental Disorders,5th ed. American Psychiatric Association, Washington, DC. frequencies of all concepts belonging to that type of treatment. [4] Norris,F. and Tracy,M. and Galea,S. (2009) Looking for resilience: This information, typically documented exclusively in the Understanding the longitudinal trajectories of responses to stress. Social narrative text, indicates how often a treatment type is Science & Medicine, 03(043). documented in a clinical note and how effective it might be for [5] U.S. Department of Veterans Affairs. How Common is PTSD? 30 Jan. 2014. a patient’s existing PTSD symptoms. Web. 01 Jul. 2014. http://www.ptsd.va.gov/public/PTSD- overview/basics/how-common-is-ptsd.asp). [6] Pathak,J et al. (2011) Mapping clinical phenotype data elements to IV. DISCUSSION standardized metadata repositories and controlled terminologies: the The data used in this analysis consisted of complete eMERGE Network experience. J Am Med Inform Assoc.18(4):376-386. sentences of narrative text from clinical documentation that [7] McGuinness,D.L. and Van Harmelen,F. (2004) OWL web ontology language included much “noise” such as abbreviations, misspellings, overview. W3C recommendation, 10, 2004-03. negations. An important goal of this project is to make explicit [8] Department of Veterans Affairs. (2004) VA/DoD clinical practice guideline for post-traumatic stress management. Department of Defense. 2010 update. the assumptions in PTSD clinical note data and thereby reduce [9] Garla,V. et al. (2011) The Yale cTAKES extensions for document the ambiguity in concepts that describe symptoms and classification: architecture and application. J Am Med Inform Assoc. treatments in this domain. In mental health, and more 18(5):614-20. specifically in anxiety disorders, concepts are often shared with [10] Savova.G.K. et al. (2010) Mayo clinical Text Analysis and Knowledge slight modifications corresponding to different contexts. For Extraction System (cTAKES): architecture, component evaluation and example, many PTSD re-experiencing symptoms are similar to applications. J Am Med Inform Assoc. 17(5):507-13. obsessive-compulsive disorder (OCD) symptoms but with the [11] Ceusters,W. and Smith,B. (2010) Foundations for a realist ontology of differential details relating the recollections to the traumatic mental disease. J. Biomedical Semantics, 1, 10. exposure. As we continue to develop more complex use cases, [12] Luther,S. et al. (2011) Using statistical text mining to supplement the we are aware of the impedance mismatch between ontologies development of an ontology. J Biomed Info. 44, S86-S93. and information models. As described by Ceusters, “terms in 59