ICBO 2014 Proceedings Ontological Representation of CDC Active Bacterial Core Surveillance Case Reports Albert Goldfain Barry Smith Lindsay G. Cowell Dept. of Eng. and Computer Science National Center for Ontological Dept. of Clinical Science Syracuse University Research UT Southwestern Medical Center Syracuse, USA Buffalo, USA Dallas, USA agoldfai@syr.edu phismith@buffalo.edu lindsay.cowell@utsouthwestern.edu manner. This is of particular importance for an international I. INTRODUCTION problem such as bacterial surveillance. The Center for Disease Control and Prevention’s Active We use OBOF ontologies to represent: (1) entities referenced Bacterial Core Surveillance (CDC ABCs) Program is a in the CDC ABCs CRF for MRSA, (2) the CDC case definition collaborative effort betweeen the CDC, state health for MRSA infectious disease, and (3) the CDC inclusion departments, laboratories, and universities to track invasive criteria for cases. bacterial pathogens of particular importance to public health [1]. The year-end surveillance reports produced by this IDO-Staph is an extension of the Infectious Disease Ontology program help to shape public policy and coordinate responses (IDO) covering entities specific to Staphylococcus aureus to emerging infectious diseases over time. The ABCs case infectious disease. Classes in IDO-Staph have supertypes in report form (CRF) data represents an excellent opportunity for IDO, the Ontology for General Medical Science (OGMS), and data reuse beyond the original surveillance purposes. BFO. Within this framework, many of the logical implications are inherited by descendant types from their supertypes. In In this work, we focus on methicillin-resistant Staphylococcus creating our ontological representation to cover the ABCs aureus (MRSA), which has been tracked by the ABCs program CRF, we use the most specific OBOF term (i.e., the lowest since 2005. We use the Infectious Disease Ontology (IDO) descendent of a BFO term) that is applicable for each relevant Staphyloccocus aureus extension ontology (IDO-Staph), along entity in the CRF. with other ontologies following the principles of the Open Biomedical Ontologies Foundry (OBOF) to represent the As an illustration of the sorts of entities to which the CRF entities referenced by the MRSA specific ABCs CRF. The needs to refer to, we represent the following information from goals of this effort are: (1) to demonstrate that infectious a specific (hypothetical) case report: disease case report data can be positioned for reuse and linking John Doe is a 67 inch, 210 lb, 38 year-old patient at the Mayo to complementary data sources at the point of collection, (2) to Clinic with a case with a Staphylococcus aureus infectious identify any coverage gaps or limitations in the OBOF disease. Labwork identified MRSA in a sample John’s blood representation, and (3) to extend and reassess previous work in (MRSA bacteremia) after spa typing the isolate. The isolate the ontology of infectious diseases [2,3,4]. was found to be SCCMec Type IV, tested positive for toxic- shock syndrome toxin, and negative for Panton-Valentin II. ONTOLOGICAL REPRESENTATION OF CASE REPORTS Leukocidn (PVL). This strain of isolate is known to be resistant One of the unique problems for synthesizing surveillance to several antibiotics, including Methicillin. The underlying data relating to any rapidly changing phenomenon of broad condition that led to the initial infectious disorder was social impact – such as the rise of antibiotic resistance in intravenous drug use. bacteria – is that the range of information requires changes Our ontological representation of this case report is expressed over time. In addition to temporal queries, such data frequently as RDF triples. Relations are drawn from the OBO Relation must be interrogated along several other dimensions: across Ontology [5]. The appropriate relationship between individuals pathogens in the ABCs program, across geographical regions referred to by CRF fields and the universals in OBOF represented by different ABCs surveillance sites and beyond, ontologies is made explicit. For example, the following triples across pathogens with different forms of antibiotic resistance establish relationships between John Doe’s particular disease (with our evolving understanding of the mode of action and and disorder, and the universal types they instantiate: genetic basis for this resistance), and across data models / systems with different case definition criteria and different ‘John Doe’s MRSA infectious disease’ instance of semantics for data entry fields. ido-staph:‘staphylococcus aureus infectious disease’ ‘John Doe’s MRSA infectious disorder’ instance of The semantic web stack of technologies, when applied towards metadata representation and resource linking, is a particularly ido-staph:‘staphylococcus aureus infectious disorder’ good fit for this task. The SPARQL query language for such ‘John Doe’s MRSA infectious disease’ has_material_basis representations also allows data to be stored in a decentralized ‘John Doe’s MRSA infectious disorder’. 74 ICBO 2014 Proceedings The material basis of the infectious disease is the infectious underlying condition listed for the MRSA infectious disease is disorder, which has as proper parts: an organism population of intravenous drug use. Ontologically, this can be modeled as a MRSA (i.e., the infection) and a portion of John Doe’s blood. disposition towards certain behaviors that would be The MRSA isolate is sampled from a part of John Doe (his explanatory for the how the MRSA came to be in John’s blood) and placed into culture. The time at which entities exist bloodstream. Depending on the modeling needs, John’s is very important here. The material sample isolated from intravenous drug use can be associated with many other pieces John’s disorder is no longer part of John (and thus no longer of information (e.g., relating to the injection site bearing the part of the material basis for his disease), but data derived portal of entry role for Staphylococcus aureus). from this sample can be predictive of the course of John Doe’s disease, prognosis, and outcome. In culture, only a sub- The final portion of the CRF is the classification of MRSA population of MRSA organisms will have ever been a part of type. Absent any information that John Doe acquired MRSA John, but we have faith in the stability of the predictions it while meeting the criteria for either HACO or HA, John’s case allows because the salient properties of John’s MRSA would be classified as community-associated MRSA. Lists of population are inherited by their immediate descendents. criteria such as this are well suited for OWL/RDF since the Thus, we can make inferences based on SCCmec and spa task is determining if an instance satisfies a description. typing, toxin profiles, and other labwork assays performed on the isolate in culture, for example, the methicillin resistance of III. A WEB-BASED MRSA CASE REPORTING SYSTEM John’s MRSA: We have implemented a large part of the ABCs CRF for MRSA as a standards-compliant (HTML5/CSS3) web-based ‘John Doe’s MRSA isolate’ has_disposition ‘John Doe’s form.1 The current version of the web-form is intended as a MRSA isolate’s antibiotic resistance to methicillin’ proof-of-concept for annotating CRF data at the point of collection. The web form is a custom solution rather than one ‘John Doe’s MRSA isolate’s antibiotic resistance to built around a particular web framework. This allows for methicillin’ instance of ido-staph:‘PBP2a-mediated resistance maximal flexibility in exporting to other data formats that are to beta-lactam antibiotic’ specifically required by external resources. The ultimate goal would be to implement such a system with direct EMR IDO-Staph provides the ability to subtype specific drug integration. resistance dispositions based on their mechanism of action. However much broader coverage is needed for the IV. CONCLUSION mechanisms of action involved for different antibiotics. The Our annotation of such data with OBOF types and relations Comprehensive Antibiotic Resistance Database (CARD) and can provide several advantages, including: (1) precise its associated ontology [6] provide a good start along these semantics and definitions can be enforced during data entry, lines. To be brought fully into alignment with OBOF, CARD (2) linkage to other infectious disease resources, such as the data would have to be linked to a suitable drug ontology such CARD, to enable broader queries, (3) harmonization and as DrON [7]. comparability of multi-year CRF data (e.g., for a longitudinal study), (4) the possibility for retrospective application of new In virtue of its physical makeup, John Doe’s MRSA infection inclusion criteria, and (5) an OWL/RDF data model with which (i.e., the population of MRSA organisms) has a particular to build web applications around CRF data. antibiotic resistance towards methicillin. Moreover, resistance Most of the resources necessary to build an ontological to methicillin can (and will) vary in degree across isolated representation of the entities referred to by the ABCs CRF are samples. Laboratory personnel measure the degree of already part of ontologies conformant to OBOF principles. resistance by performing a minimal inhibitory concentration Some of the gaps in coverage include: (1) an ontological assay to produce a certain measurement datum. resource specifically for pathogen genes and gene products, (2) a drug ontology that classifies methods of action for different We have elsewhere discussed the detailed representation of antibiotics, and (3) a good ontological relation template for SCCmec types and toxin profiles for PVL and TSST in the how information about isolates in culture can lead to inferences context of the NARSA isolate repository, as well as the about the disorders these bacteria are sampled from. representational units for the lab processes and assays If case report data is properly represented and linked to other involved in classifying Staphylococcus aureus [4]. These resources, this data can lead to insights beyond the original representations are readily combined with the data from the scope of CDC ABCs surveillance. CRF to enrich the clinical picture of John Doe’s disease. The entities and relationships required for the ontological ACKNOWLEDGMENT representation of SCCmec type IV (as in this case) are The authors would like to thank Dr. Vance Fowler and Dr. presented. Alan Lesee for productive discussions on Staphylococcus aureus case report requirements. The infection type in this case is bacteremia, which is differentiated from other types of infection solely by 1 anatomical location of the isolate (i.e., the bloodstream). The See http://www.awqbi.com/ido/abccrf/ 75 ICBO 2014 Proceedings REFERENCES [4] A. Goldfain, B. Smith, and L. G. Cowell, “Constructing a Lattice of Infectious Disease Ontologies from a Staphylococcus aureus Isolate [1] National Center for Immunization and Respiratory Diseases, Division of Repository”, Proceedings of the Third International Conference on Bacterial Diseases, “CDC – ABCs: Overview – Background” , Biomedical Ontology, 2012. http://www.cdc.gov/abcs/overview/background.html , Retrieved Jan 22, [5] B. Smith, W. Ceusters, B. Klagges, J. Köhler, A. Kumar, J. Lomax, C. 2014. Mungall, F. Neuhaus, A. L. Rector, and C. Rosse, “Relations in [2] A. Goldfain, B. Smith, and L. G. Cowell, “Towards an Ontological biomedical ontologies”, Genome Biology, vol 6: R46. Representation of Resistance: The Case of MRSA”, Journal of [6] A. G. McArthur et al, “The Comprehensive Antibiotic Resistance Biomedical Informatics, vol. 44(1), pp. 35–41, 2011. Database”, Antimicrobial Agents and Chemotherapy, vol. 57, pp. 3348– [3] A. Goldfain, B. Smith, and L. G. Cowell, “Dispositions and the 3357. Infectious Disease Ontology”, Proceedings of the Sixth International [7] W. R. Hogan, J. Hanna, E. Joseph, and M. Brochhausen, “Towards a Conference on Formal Ontology in Information Systems, pp. 400–413, 2010. Consistent and Scientifically Accurate Drug Ontology”, Proceedings of the Fourth International Conference on Biomedical Ontology, 2013. 76 OntologicalRepresentationofCDCActiveBacterialCoreSurveillanceCaseReports ICBO 2014 Proceedings AlbertGoldfain1,BarrySmith2,LindsayG.Cowell3 (1)Dept.ofEng andComputerScience,SyracuseUniversity,(2)NationalCenterforOntologicalResearch, (3)Dept.ofClinicalScience,UTSouthwesternMedicalCenter ABSTRACT IDOͲSTAPHANDTHEOBOFOUNDRY PROOFOFCONCEPTIMPLEMENTATION We propose an ontological representation to support the annotation of a CDC Active Bacterial Core IDO-Staph is an extension of the Infectious Disease Ontology (IDO) covering entities specific to There are several immediate benefits of migrating the ABCs CRF from a paper form to an surveillance (ABCs) case report form, specifically the form used for Methicillin-Resistant Staphylococcus Staphylococcus aureus infectious disease. Classes in IDO-Staph have supertypes in IDO, the Ontology for electronic web form. A web form would allow for form validation (on the client and server side), aureus surveillance. The ontological representation is developed using source ontologies from the Open General Medical Science (OGMS), and BFO. For example, the taxonomy leading to Staphylococcus allow certain fields to be labeled as required input, and help to prevent data entry errors. Biomedical Ontology Foundry. A prototype web-based case report form is implemented to demonstrate aureus infectious disease is as follows: how the proposed ontology resource can support the automatic annotation of case report data. The bfo:disposition prototype implementation can be found at http://www.awqbi.com/ido/abccrf/. We argue that the annotated We have implemented a large part of the ABCs CRF (see http://www.awqbi.com/ido/abccrf/ ) ogms:disease data will enable reuse of the surveillance data beyond the original scope and purpose of its collection. • Web form implementation Design considerations, benefits, and limitations of the ontological representation are described. ido:infectious disease ido-staph:staphylococcus aureus infectious disease • Standards compliant (HTML5/CSS3) Within this framework, many of the logical implications are inherited by descendant types from their • Follow-up questions as needed (jQuery) GOALS: supertypes. • Client side logical constraints / required fields enforced • Demonstrate that infectious disease case report data can be positioned for reuse and linking to complementary data sources at the point of collection • RDF/XML output suitable for • Identify any coverage gaps or limitations in the OBOF representation, and In creating our ontological representation to cover the ABCs CRF, we use the most specific OBOF term • Storage in a triplestore (i.e., the lowest descendent of a BFO term) that is applicable for each relevant entity in the CRF. • Extend and reassess previous work in the ontology of infectious diseases. • SPARQL query • Input to a reasoner CDCABCsSURVEILLANCEPROGRAM CASEREPORTREPRESENTATION CDC Active Bacterial Core Surveillance (ABCs) program is a collaborative effort between the CDC, state John Doe is a 67 inch, 210 lb, 38 year-old patient at the Mayo Clinic with a case with a health departments, laboratories, and universities to track invasive bacterial pathogens of particular Staphylococcus aureus infectious disease. Labwork identified MRSA in a sample John’s blood importance to public health. (MRSA bacteremia) after spa typing the isolate. The isolate was found to be SCCMec Type IV, tested positive for toxic-shock syndrome toxin, and negative for Panton-Valentin Leukocidn (PVL). Case reports produced for six emergent pathogens: group A and group B Streptococcus, Haemophilus This strain of isolate is known to be resistant to several antibiotics, including Methicillin. The influenzae, Neisseria meningitis, Streptococcus pneumoniae, and methicillin-resistant Staphylococcus underlying condition that led to the initial infectious disorder was intravenous drug use. aureus (MRSA). The primary output of the ABCs program is a yearly epidemiological report on each of the pathogens covered. CONCLUSIONS An ontological representation can also facilitate the extension, specialization, and linking of the CRF with different resources. Our annotation of such data with OBOF types and relations can provide several advantages, including: • Precise semantics and definitions can be enforced during data entry. Case Reports Yearend Report • Linkage to other infectious disease resources, such as the Comprehensive Antibiotic Resistance Link to antibiotic resistance: Database, to enable broader queries. ‘John Doe’s MRSA isolate’ has_disposition • Harmonization and comparability of multi-year CRF data (e.g., for a longitudinal study). ‘John Doe’s MRSA isolate’s antibiotic resistance to methicillin’ • The possibility for retrospective application of new inclusion criteria.An OWL/RDF data ‘John Doe’s MRSA isolate’s antibiotic resistance to methicillin’ instance of model with which to build web applications around CRF data. Excellent opportunity for case report data reuse. ido-staph:‘PBP2a-mediated resistance to beta-lactam antibiotic’ As we have seen, most of the resources necessary to build an ontological representation of the ‘John Doe’s MRSA isolate’s antibiotic resistance to methicillin’ has_qualitative_basis SOME entities referred to by the ABCs CRF are already part of ontologies conformant to OBOF In this work, we focus on ABCs MRSA case reports. The CDC lists some pathogogen-specific objectives (is_quality_measured_as SOME principles. Some of the gaps in coverage include: for MRSA surveillance in addition to the main ABCs program objectives: ‘methicillin minimal inhibitory concentration measurement datum of John Doe’s MRSA isolate ’) 1. An ontological resource specifically for pathogen genes and gene products, 2. A drug ontology that classifies methods of action for different antibiotics, 1. To evaluate changes in rates of hospital-onset (HO), healthcare-associated community onset (HACO), and community-associated (CA) invasive [MRSA] disease over time and across different geographic ‘methicillin minimal inhibitory concentration measurement datum of John Doe’s MRSA isolate’ 3. A good ontological relation template for how information about isolates in culture can lead to areas instance of obi:‘minimal inhibitory concentration’ inferences about the disorders these bacteria are sampled from. 2. To identify populations at risk for invasive MRSA disease, 3. To describe the molecular and microbiologic characteristics of [HA], [HACO], and [CA] MRSA” Toxins and their dispositions ido-staph:‘Panton-Valentine leukocidin’ instance of leukocidin ACKNOWLEDGEMENTSANDCONTACT Achieving these goals also requires linking case report data to relevant molecular and ido-staph:‘Panton-Valentine leukocidin’ has_disposition SOME ido:invasion disposition microbiological information. This work was funded by the National Institutes of Health through Grant R01 AI 77706-01. pro:‘toxic shock syndrome toxin-1’ instance of protein In addition to querying case report data across time, the data may need to be queried across several other pro:‘toxic shock syndrome toxin-1’ has_disposition SOME ido:‘exotoxin disposition’ The authors would like to thank Dr. Vance Fowler and Dr. Alan Lesee for productive discussions dimensions: on Staphylococcus aureus case report requirements. • Across pathogens in the ABCs program • Across geographical regions represented by different ABCs surveillance sites and beyond. Contact Author Email: albertgoldfain@gmail.com • Across pathogens with different forms of antibiotic resistance (with our evolving understanding of the mode of action and genetic basis for this resistance). • Across data models / systems with different case definition criteria and different semantics for data entry fields. RESEARCH POSTER PRESENTATION DESIGN © 2012 77 www.PosterPresentations.com