=Paper=
{{Paper
|id=None
|storemode=property
|title=Constructing a Lattice of Infectious Disease Ontologies from a Staphylococcus aureus Isolate Repository
|pdfUrl=https://ceur-ws.org/Vol-897/session1-paper03.pdf
|volume=Vol-897
|dblpUrl=https://dblp.org/rec/conf/icbo/GoldfainCS12
}}
==Constructing a Lattice of Infectious Disease Ontologies from a Staphylococcus aureus Isolate Repository==
    Constructing a Lattice of Infectious Disease Ontologies from a
             Staphylococcus aureus Isolate Repository
                                 Albert Goldfain1,* Barry Smith2 and Lindsay G. Cowell3
                                                   1
                                                       Blue Highway Inc., Syracuse, NY, USA
                                                  2 University at Buffalo, Buffalo, NY, USA
                                 3 University of Texas Southwestern Medical Center, Dallas, TX, USA
ABSTRACT                                                                    cus” and scattered exclusions such as “[041] Bacterial infec-
    A repository of clinically associated Staphylococcus aureus (Sa) iso‐   tion in conditions classified elsewhere and of unspecified
lates is used to semi‐automatically generate a set of application ontolo‐
gies for specific subfamilies of Sa‐related disease. Each such applica‐
                                                                            site. Excludes: septicemia (038.0 – 038.9)”.
tion ontology is compatible with the Infectious Disease Ontology (IDO)         The National Academies of Science have recently called
and uses resources from the Open Biomedical Ontology (OBO) Found‐           for a new taxonomy of disease, along with informatics tools
ry. The set of application ontologies forms a lattice structure beneath     to support its construction (Committee on the Framework
the IDO‐Core and IDO‐extension reference ontologies. We show how            for Developing a New Taxonomy of Disease, 2011). In sup-
this lattice can be used to define a strategy for the construction of a
new taxonomy of infectious disease incorporating genetic, molecular,        port of such a taxonomy, an information commons would be
and clinical data. We also outline how faceted browsing and query of        developed to store “bedside” clinical data collected during
annotated data is supported using a lattice application ontology.           clinical encounters, effectively treating each patient as a
                                                                            participant in a clinical study, and integrate this information
1    INTRODUCTION                                                           in a knowledge network that would formalize the relation-
One of the more ambitious goals of current clinical and bi-                 ships between different disease data sets. The long-term
omedical research is the personalization of medicine, in                    goal is to produce the new taxonomy of disease from a vali-
which treatments are selected on the basis of patient-specific              dated subset of the knowledge network.
as well as disease-specific information. Recent advances in                    We believe that biomedical ontologies will be essential to
high-throughput technologies have resulted in a push for the                the construction of the envisioned taxonomy of disease,
use of patient-specific information in care decisions, par-                 especially the ontologies in the Open Biomedical Ontology
ticularly genomic and functional genomic data, but also                     (OBO) Foundry (Smith et al., 2007). The OBO Foundry
proteomic, metabolomic, and cytometry data. It is widely                    (OBOF) represents a coordinated effort to construct refer-
believed that the increased precision of personalized medi-                 ence biomedical ontologies according to best practices and
cine will yield more effective treatments, with better out-                 principles and to use these ontologies as the basis for
comes and fewer adverse side effects.                                       OBOF-conformant application ontologies. The coordinated
   Personalized medicine requires that genomic (and other)                  development of these ontologies and their use of a common
data be effectively classified and associated with known                    formalism increases data interoperability and consistency
clinical phenotypes and disease types. Currently available                  for datasets annotated in their terms. The use of OBOF on-
taxonomies of disease do not support this, however, and are                 tologies in construction of the new disease taxonomy can
in general not well suited for integration and analysis of                  bring significant benefits. For example, the widespread use
high-throughput molecular and cellular data with clinical                   of OBOF ontologies for data annotation would link the dis-
data, such as the data found in electronic medical records.                 ease taxonomy to many existing databases and information
Current disease taxonomies were developed primarily to                      resources, and their underlying formalism allows the dy-
support diagnosis and reimbursement coding rather than as                   namic inference of different views and multiple intercon-
biological representations of disease. As a consequence,                    nected hierarchies. In addition, many analysis algorithms for
they are based on single, rigid hierarchies that do not reflect             high-throughput data already utilize these ontologies.
the complex interconnections between disease types; they                       The Infectious Disease Ontology (IDO) suite of ontolo-
lack links to molecular- and cellular-level data and infor-                 gies is being developed within the OBO Foundry framework
mation; and they lack the sort of formal structure that would               and includes a hub – the IDO-Core – consisting of terms
support their use for the kinds of computational analyses                   and relations relevant to infectious diseases generally, to-
applied in biological and clinical research. For example, the               gether with a set of disease-specific extensions derived
International Classification of Disease (ICD) version 9 in-                 therefrom. The IDO ontologies are interoperable and jointly
cludes catch-all codes such as “[041.19] Other Staphylococ-                 cover the infectious disease domain. Here we illustrate how
                                                                            the IDO ontologies can be used in the construction of a part
*
 To whom correspondence should be addressed: agoldfain@blue-                of the new taxonomy of disease and to integrate clinically
highway.com                                                                 relevant phenotypic and genotypic data.
                                                                                                                                         1
Goldfain, Smith, and Cowell
   We take as our case study infectious diseases caused by
Staphylococcus aureus (Sa) infection. We show how isolate
data from the Network on Antimicrobial Resistance in
Staphylococcus aureus (NARSA) can be annotated using
IDO and its extensions. We then demonstrate a faceted
browser in which both phenotypic and genotypic aspects of
the IDO-annotated isolate data can be exposed and queried.
Our goal is to provide a resource from which an IDO-
conformant application ontology can be derived for a specif-
ic Sa infectious disease type. Such application ontologies
can be generated in a semi-automated way and collectively
form a lattice structure beneath IDO-Core (described be-
low). While our example narrowly focuses on properties of
infectious agents, this effort is part of a larger effort to cre-
ate an ontological representation of Sa diseases, and we be-
lieve the same approach can be applied to host data and to
the integration of host and pathogen data.
2   INFECTIOUS DISEASE ONTOLOGY
IDO-Core includes terms relevant for infectious diseases
generally, terms such as ‘host’, ‘infectious agent’, ‘fomite’,
and ‘virulence factor’, and the relations between the corre-
                                                                               Fig 1. A possible lattice expansion of IDO
sponding types. Disease- and pathogen-specific extensions
are developed by extending the core to include terms and            2.1   OGMS/IDO Disease Model
relations relevant to the corresponding infectious disease(s).      The IDO ontologies represent disease according to the dis-
For example, the IDO extension for Sa (IDO-Sa) includes             order – disease – disease course framework provided by the
terms such as ‘Staphylococcus aureus bacteremia’ and                Ontology for General Medical Science (OGMS), in which a
‘Staphylococcal cassette chromosome mec’.                           disorder is the physical basis of a disease, which is itself a
   IDO extensions are currently being developed for influ-          disposition to pathological processes realized in a disease
enza, malaria, brucellosis, HIV, and Sa. Further extensions         course. For example, in IDO-Sa we assert the following in
will involve the creation of specific application ontologies        OWL-DL:
by IDO user groups. It will be necessary for these ontologies
                                                                          Sa subClassOf obi:organism AND
to import terms from several OBO Foundry ontologies, as
                                                                                       ido:‘infectious agent’
well as from existing IDO extension ontologies. This will
                                                                          SaI =def ido:‘infectious disorder’ AND
give rise to a lattice structure beneath IDO core and its ex-
                                                                                       has_part SOME Sa
tensions, as illustrated in Figure 1. At the bottom of the lat-
tice is IDO-ALL, the (pre-inference) closure of possible the              SaID =def ido:‘infectious disease’ AND
IDO ontologies.                                                                        has_material_basis_in SOME SaI.
   When a new application ontology is needed, its position                SaID realized_by ONLY SaIDC
in the lattice will be determined by the terms it needs to im-      where, ‘Staphylococcus aureus’ = Sa, ‘Sa Infectious Disor-
port. IDO Core is agnostic to biological scale, host organ-         der’ = SaI,‘Sa Infectious Disease’ = SaID, and ‘Sa Infec-
ism, and disciplinary perspective, but it will be desirable for     tious Disease Course’=SaIDC.
some of the application ontologies in the lattice to hold              The primary classification of Sa is as an organism, but Sa
some of these fixed (e.g., genetic aspects of influenza in          bacteria are also infectious agents because they have a dis-
birds), thus serving as granular partitions of the domain on-       position to cause infectious disease in some hosts. Note we
tology they are extending. The lattice serves as a representa-      define Sa infectious disorder as an infectious disorder that
tion of some of the interdependencies in the existing IDO           has Sa as part, but we do not assert “Sa part_of SOME SaI”
set of ontologies and the intended overall domain coverage.         because Sa can be among a host’s normal flora, for example
                                                                    on the skin or nasal mucosa.
                                                                       We use the shortcut relation has_material_basis here to
                                                                    establish a link between the disease (disposition) and the
                                                                    disorder (material entity) (Goldfain, Smith and Cowell, un-
                                                                    der review). An infectious disorder is both an infection (a
                                                                    material entity composed of infectious agents) and a disor-
                                                                    der (has reached the threshold of clinical significance to
                                                                    dispose a host to infectious disease).
2
                               Constructing a Lattice of Infectious Disease Ontologies from a Staphylococcus aureus Isolate Repository
2.2       Classifying Staphylococcus aureus diseases                        SCCMecIV has_part SOME ‘ccr Type 2’
Infectious diseases can usefully be classified in terms of a
number of differentia, including: host type, (sub-)species of          More fine grained sequence information about the ccr and
infectious agent, route of transmission, antibiotic resistance,        mec complexes can be captured using SO terms and rela-
and anatomical site of infection.                                      tions.
   For many species of infectious agent, including Sa, a fur-
ther classification into strain categories is useful. Many dif-        3     CASE STUDY
ferent typing systems are used, including: Pulse Field Gel             We will now show how a lattice of Sa isolates can be con-
Electrophoresis (into strains), Multi-Locus Sequence Typing            structed using IDO-Sa and isolate metadata indicating prop-
(into sequence types), BURST Clustering (into clonal com-              erties such as the mec and ccr gene complex types. The
plexes), and gram staining (into gram positive and gram                isolate lattice is then used as the basis for our desired lattice
negative classes). Each of these typing systems is tied to a           of infectious disease application ontologies. Ontologically
particular type of assay that can be described using the On-           speaking, isolates are particulars that instantiate the organ-
tology for Biomedical Investigations (OBI).                            ism type Sa and have been extracted from a host organism.
   For our present purpose, we are interested in a typing              Here we do not represented the distinctions between Sa as
system specifically created to differentiate Sa isolates, the
                                                                       an ‘isolate’ or as part of a ‘cell culture’, however we believe
Staphylococcal cassette chromosome mec (SCCmec) typing
                                                                       these terms are general enough to infectious disease re-
system. SCCmec is further differentiated by its subparts: (a)
                                                                       search to warrant inclusion in IDO-Core.
Cassette chromosome recombinases (ccr) and (b) mec gene
                                                                          The ontology generated for this case study is stored
complex (mec). The SCCmec is a mobile genetic element
                                                                       across several OWL files. The full ontology, including ex-
that carries the central determinant for broad-spectrum beta-
                                                                       ternal imports and automatically generated isolate infor-
lactam antibiotic resistance encoded by the mecA gene
                                                                       mation is currently available in OWL-DL format at
(Katayama, Ito and Hiramatsu, 2000). The genetic charac-
                                                                       http://www.awqbi.com/LATTICE/narsa-complete.owl. The
teristics of SCCMec are of critical importance to the type of
                                                                       ontology was developed using Protege 4.1 and was checked
treatment and Sa disease course an infected host may under-
                                                                       for inconsistency using the Hermit 1.3.5 and Fact++ reason-
go. The International Working Group on the Staphylococcal
                                                                       ers.
Cassette Chromosome elements1 maintains a list with defi-
nitions of the latest known SCCmec types. At the time of               3.1     Resources
this writing, there are 11 known SCCmec types. We include              Wherever possible, we import and reuse terms (and URIs)
this information in IDO-Sa by leveraging the Sequence On-              from OBO Foundry ontologies via the MIREOT technique
tology (SO) to assert the following:                                   (Courtot et al., 2011) and use relations from the OBO rela-
                                                                       tion ontology (RO) or proposed extensions thereto. The
      SCCMec subClassOf so:gene_cassette                              OBO Foundry ontologies we require for our case study are:
      SCCMec subClassOf so:mobile_genetic_element                     Ontology for General Medical Science (OGMS2), Ontology
      ‘mec gene complex’ subClassOf                                   for Biomedical Investigations (OBI3), Sequence Ontology
            so:gene_cassette_member                                    (SO), Infectious Disease Ontology (IDO4), Information Ar-
      ‘ccr gene complex’ subClassOf                                   tifact Ontology (IAO5), NCBI Taxonomy (NCBITaxon6),
            so:gene_cassette_member                                    and Foundational Model of Anatomy (FMA7).
      SCCMec has_part SOME ‘mec gene complex’                             We also import drug file names from the National Drug
      SCCMec has_part SOME ‘ccr gene complex’                         File Reference Terminology (NDF-RT) to represent antibi-
                                                                       otic resistance, and create links to two other resources: (1)
The classification of SCCmec as a gene cassette is to be               Antibiotic Resistance Ontology8 and Antibiotic Resistance
preferred over its classification as a mobile genetic element          Database Ontology9. Various other stakeholders (such as the
because the former tells us what SCCmec is, while the latter           DebugIT European Union initiative) have ontologies and
tells us what SCCmec can do. However, we include both                  databases of antimicrobial resistance, but we only to link to
here, because most descriptions of SCCmec highlight its                open resources for our case study.
mobility. Description of a SCCMec subtype then proceeds
as follows:
                                                                       2
                                                                         http://code.google.com/p/ogms/
      SCCMecIV subClassOf SCCMec                                      3
                                                                         http://obi-ontology.org/page/Main_Page
                                                                       4
      ‘mec Class B’ subClassOf ‘mec gene complex’                     5
                                                                         http://infectiousdiseaseontology.org/page/Main_Page
      ‘ccr Type 2’ subClassOf ‘ccr gene complex’                        http://code.google.com/p/information-artifact-ontology/
                                                                       6
                                                                         http://www.ncbi.nlm.nih.gov/Taxonomy/
      SCCMecIV has_part SOME ‘mec Class B’                            7
                                                                         http://sig.biostr.washington.edu/projects/fm/
                                                                       8
                                                                         http://arpcard.mcmaster.ca
1                                                                      9
    http://www.sccmec.org/Pages/SCC_ClassificationEN.html                http://ardb.cbcb.umd.edu/antibio_resis.obo
                                                                                                                                      3
Goldfain, Smith, and Cowell
3.2        NARSA Isolate Repository
The Network on Antimicrobial Resistance in Staphylococ-
cus aureus10 maintains a repository of Sa isolates for clinical
research which includes genetic, phenotypic, and demo-
graphic information on each isolate. For this example, we
use a subset of 101 NARSA isolates, those listed in the
“Known Clinically Associated Strains – ABCs Collection
from CDC” repository. All of the isolates in this subset have
an SCCMec type annotation in the NARSA repository and                  Fig 2. Antimicrobial profile for an isolate in the NARSA subset
have diverse geographic origin in the United States.11
    The NARSA subset was selected to demonstrate how a                The NDF-RT was used to validate this profile by making
disease lattice could be constructed starting from only struc-        sure that the set of drugs in the profile is a subset of:
tured HTML content about isolates. NARSA maintains a
database of extended information about such isolates; how-            {d | ndf-rt:’Staph Infection’ ndf-rt:may_be_treated_by d}
ever we only used the information publicly available on the
web.                                                                  For NARSA, or any other resource on antimicrobial re-
    A script was created to extract each isolate’s NARSA id           sistance, there may be a good reason to restrict attention to a
(NRSnnn), culture source, toxin profile, and antimicrobial            subset of antimicrobials. However, since new resistance
profile. The script was implemented in Ruby and utilized              evolves rapidly, a resource such as NDF-RT can be used to
the Hpricot HTML library and regular expressions to extract           synchronize the latest antibiotics permissible in such a pro-
information. First, the NARSA id was used to assert the               file.
existence of a Sa instance type. Then the culture source data             Minimum inhibitory concentration data (MIC) are repre-
was extracted. The culture source was sometimes unspeci-              sented using IAO and OBI as follows:
fied (‘other’) or underspecified (‘blood’ vs ‘wound’). Only
culture sources for which FMA types existed were asserted
                                                                                ‘MIC assay’ subclassOf iao:assay
to exist as such, but IDO allows for an even more complete
                                                                                ‘MIC assay’ has_specified_output SOME
representation of host anatomical entities if such infor-
                                                                                         ‘MIC data item’
mation is known. For example, the anatomical location from
which the infectious organism is isolated may also be a por-                    ‘MIC scalar measurement datum’ is_about SOME
tal of entry.                                                                            ‘drug susceptibility of infectious agent’
    The toxin profile for NARSA subset isolates included the
presence or absence of the Panton Valentine Leukocidin                Resistance is a disposition that an infectious agent bears
(PVL) and Toxic Shock Syndrome Toxin (TSST). These                    towards some drugs and is realized in their presence. We
toxins are strong determinants of the virulence and clinical          have elsewhere modeled resistance in terms of pairwise
manifestation of Sa disease. We classify PVL and TSST as              complementary dispositions on the part of both the infec-
ido:exotoxin. The presence or absence of a toxin is not usu-          tious agent and the drug (Goldfain, Smith & Cowell, 2011).
ally associated with drug resistance, but by representing             Here we link resistance to MIC measurement data using the
both pieces of information we are able to query the applica-          shortcut relation has_qualitative_basis as follows:
tion ontology for correlations between the presence of tox-
ins and resistance to certain drug types.                                       ido:’resistance to drug’ has_qualitative_basis
    The antimicrobial profile for the NARSA subset includes                      SOME (is_quality_measured_as SOME ‘MIC
15 drugs (see Figure 2 for a subset of these). For each drug,                    measurement datum’)
NARSA reports a minimum inhibitory concentration – a
range or exact value – along with an interpretation of the               Finally, for each drug D towards which the isolate Sa has
antibiotic resistance indicated by this value following the           a drug resistance we assert:
Clinical and Laboratory Standards Institute guidelines.
                                                                                ‘resistance to D’ subclassOf
                                                                                          ido:‘resistance to drug’
                                                                                Sa has_disposition SOME ‘resistance to D’
                                                                      3.3       From an Isolate Lattice to a Disease Lattice
                                                                      The lattice of infectious diseases mirrors the isolate lattice
                                                                      by representing the types of infectious disease different iso-
10                                                                    lates can give rise to. Infectious agents are parts of those
     See http://www.narsa.net/
11                                                                    infectious disorders which are the material basis for infec-
     See http://www.cdc.gov/abcs/reports-findings/surv-reports.html
4
                               Constructing a Lattice of Infectious Disease Ontologies from a Staphylococcus aureus Isolate Repository
tious disease. Using the representation developed above, we                We hope to reuse a similar technique to that outlined in
can begin to make assertions about the specific types of dis-           this paper for isolate repositories across the infectious dis-
ease the isolates give rise to and the profiles of the disease          ease domain. In so doing, we hope to broaden the lattice and
courses which realize these diseases. For example, the pres-            integrating organism specific typing systems with the IDO
ence of the PVL toxin in Sa can lead to necrotic lesions                suite of ontologies. We believe that such an effort can be a
(ogms:disorder) and necrotizing pneumonia (ogms:disease).               powerful enabler for a new taxonomy of infectious disease
                                                                        and its supporting knowledge network.
4      FACETED BROWSING OF THE LATTICE
A faceted browser of the ontologically annotated NARSA                  ACKNOWLEDGEMENTS
isolates was constructed using the MIT Exhibit 2.0 library              This work was funded by the National Institutes of Health through Grant
                                                                        R01 AI 77706-01. Smith’s contributions were funded through the NIH
(http://www.awqbi.com/LATTICE/narsa-complete.html).                     Roadmap for Medical Research, Grant U54 HG004028 (National Center
This tool allows the user to visualize and correlate isolate            for Biomedical Ontology).
information across different dimensions (see Figure 3).
                                                                        REFERENCES
                                                                        Committee on the Framework for Developing a New Taxonomy of Disease
                                                                           (2011). Toward Precision Medicine: Building a Knowledge Network for
                                                                           Biomedical Research and a New Taxonomy of Disease. The National
                                                                           Academies’ Findings Report.
                                                                        Courtot, M., Gibson F., Lister, A. L., Malone, J., Schober, D., Brinkman,
                                                                           R. R., and Ruttenberg, A. (2011). MIREOT: The minimum information
                                                                           to reference an external ontology term. Applied Ontology, 6(1), 23-33.
                                                                        Goldfain, A., Smith, B., and Cowell, L. G. (under review). BFO Disposi-
                                                                           tions and their Bases: Two Shortcut Relations.
                                                                        Goldfain, A., Smith, B., and Cowell, L. G. (2011). Towards an Ontological
                                                                           Representation of Resistance: The Case of MRSA. Journal of Biomedi-
                                                                           cal Informatics, 44(1), 35-41.
                                                                        Katayama, Y., Ito, T., and Hiramatsu, K. (2000). A New Class of Genetic
    Fig 3. Faceted browsing illustrates that most isolates with a re-
                                                                           Element, Staphylococcus Cassette Chromosome mec, Encodes Methi-
     sistance to Clindamycin are of SCCmec type II and lack PVL
                                                                           cillin Resistance in Staphylococcus aureus. Antimicrobial Agents and
                                                                           Chemotherapy, 44(6), 1549-1555.
Linking to external resources is facilitated by the fact that           Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W.,
such facets are assigned ontology types from the IDO lat-
                                                                            Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., The OBI Con-
tice. These are exactly the kinds of links that will be needed
                                                                            sortium, Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.-A.,
for the knowledge network supporting a new taxonomy of
                                                                            Scheuermann, R. H., Shah, N., Whetzel, P. L., and Lewis, S. (2007).
disease.
                                                                            The OBO Foundry: coordinated evolution of ontologies to support bi-
                                                                            omedical data integration. Nat Biotechnol, 25(11), 1251–1255.
5      CONCLUSION
A lattice of infectious disease ontologies can serve as a
mechanism to integrate pathogen-specific typing systems
such as SCCMec with phenotypic data such as drug re-
sistance. Such genotype-phenotype relations will be the key
to a more effective taxonomy of disease that enables truly
personalized medicine. The lattice of infectious diseases is
expected to grow along predictable dimensions (host organ-
ism, infectious agent organism, drug resistance), but can
accommodate lightweight application ontologies that are
created for very specific purposes. Each such application
ontology will have a place in the lattice on the basis of what
IDO terms it imports.
   We have shown that IDO-conformant annotation of iso-
late data (such as that in the NARSA repository) is possible
without the need to reassemble OBO Foundry resources for
new applications. Other benefits of our approach include:
exposing currently accepted SCCmec types in a computable
format via an ontology and validating the NARSA antimi-
crobial profile using the NDF-RT.
                                                                                                                                                 5