=Paper= {{Paper |id=Vol-222/paper-3 |storemode=property |title=Using Ontology Graphs to Understand Annotations and Reason about Them |pdfUrl=https://ceur-ws.org/Vol-222/krmed2006-p03.pdf |volume=Vol-222 |dblpUrl=https://dblp.org/rec/conf/krmed/DolanB06 }} ==Using Ontology Graphs to Understand Annotations and Reason about Them== https://ceur-ws.org/Vol-222/krmed2006-p03.pdf
KR-MED 2006 "Biomedical Ontology in Action"
November 8, 2006, Baltimore, Maryland, USA


              Using ontology visualization to understand annotations and
                                  reason about them
                       Mary E. Dolan, Ph.D., and Judith A. Blake, Ph.D.,
                   Mouse Genome Informatics [MGI], The Jackson Laboratory,
                                Bar Harbor, ME 04609 USA
                             mdolan@informatics.jax.org

Biomedical ontologies not only capture a wealth of
biological knowledge but also provide a                                         BACKGROUND
representational system to support the integration
and retrieval of biological information. Various             Ontologies can be used to abstract knowledge of a
biomedical ontologies are used by model organism             domain in a way that can be used by both by humans
databases to annotate biological entities to the             and computers by providing an explicit representation
literature and have become an essential part of high         of the entities of interest and the relationships among
throughput experiments and bioinformatics research.          them. In particular, biomedical ontologies
We are exploring the power of ontology visualization         representing various aspects of biology are being
to enhance the understanding of annotations by               used for annotating entities to the literature and for
placing annotations in the graph context of the              integrating the diverse information resulting from the
broader biological knowledge the ontology provides.          analysis of high-throughput experiments.
Presenting annotations in this context provides a
better understanding of the annotations because              Open Biomedical Ontologies (OBO) is an umbrella
humans are adept at extracting patterns and                  repository for well-structured controlled vocabularies
information from graphical representations of                for shared use across different biological and medical
complex data.                                                domains [1]. The OBO website contains a range of
                                                             ontologies that are designed for biomedical domains.
                 INTRODUCTION                                Some of the OBO ontologies, such as the Gene
                                                             Ontology (GO), apply across all organisms. Others
Biological systems can be very complex but many              are more restricted in scope; for example, the
aspects of biological system characterization have a         Mammalian Phenotype Ontology (MP) is a phenotype
wealth of biomedical knowledge accumulated over              ontology designed for specific taxonomic groups.
years of clinical and laboratory experience.
Ontologies provide a shared understanding of a               The GO Project was established to provide
domain that is human intelligible and computer               structured,    controlled,     organism-independent
readable and, consequently, a representational system        vocabularies to describe gene functions [2] and, as a
to support the integration and retrieval of this             consequence, provides semantic standards for
knowledge.                                                   annotation of molecular attributes in different
                                                             databases. Members of the GO Consortium supply
As techniques of large-scale genomic analysis and            annotations of gene products using this vocabulary.
functional gene annotation have progressed and are           The GO and annotations made to GO provide
becoming more common, it is essential to find                consistent descriptions of gene products and a
approaches to provide a comprehensive view of                valuable resource for comparative functional analysis
annotation sets. We are exploring the power of               research.
several widely used ontologies to provide a
comprehensive graphical view of annotations by               Currently, the three ontologies of GO contain nearly
presenting the annotations visualized within an              20,000 terms [3]. The terms are organized in
ontology relationship structure. By presenting               structures called directed acyclic graphs (DAGs)
annotations in the graph context we hope to provide a        which differ from strict hierarchies in that a more
better understanding of the annotations because              specialized (granular) child term can have more than
humans are adept at extracting patterns and                  one less specialized parent term. In the GO a child
information from graphical representations of                can be related to a parent by either a ‘part of’ or ‘is a’
complex data.                                                relationship. Mouse Genome Informatics (MGI)
                                                             curators use the GO to annotate mouse genes from the
                                                             literature. Currently, MGI has more than 100,000
                                                             annotations to more than 17,000 genes;
                                                             approximately half of the annotations are manual


                                                        21
    Figure 1. GO annotation graph for mouse Hgs                 Figure 2. GO comparative graph for MGI curated
 (HGF-regulated tyrosine kinase substrate) provides              orthologs to mouse Pax6 (paired box gene 6). The
 an alternative to tabular or text views. Blue/shaded          nodes are color-coded according to organism: mouse
 nodes in the GO graph indicate mouse annotations.               annotations shown in blue/lighter shading, human
      Full graph and annotation set available at:                   annotations in red/darker shading, multiple
http://www.informatics.jax.org/javawi2/servlet/WIFet                organisms in gray. Full graph available at:
    ch?page=GOMarkerGraph&id=MGI:104681                        http://www.informatics.jax.org/javawi2/servlet/WIFet
                                                                  ch?page=GOOrthologyGraph&id=MGI:97490
annotations from the literature, the balance from              graphical format, as shown in figure 2, allows a user
automated data loads. An MGI user has the option of            to assess the consistency, inconsistency and level of
viewing the full set of GO annotations for a particular        detail of annotations made to different model
gene in three formats: as a table, as automatically            organisms.
generated text, and as a graph. The graph presents
                                                               Our examination of the comparative graphs led to the
relevant parts of the GO with direct annotations
                                                               observation      that     annotations      are     often
indicated as colored nodes, as shown in figure 1. The
                                                               complementary, reflecting the fact that the different
graphical format allows a user to easily see, for
                                                               model organisms are used to study different aspects
example, whether a gene product appears to
                                                               of biology. Since biologists are often species-blind
participate in a broad range of molecular functions or
                                                               and assemble their initial picture of a gene and its
in only a narrow, specialized function.
                                                               function without regard to the taxonomic origin of the
Genes that share close evolutionary relationships are          gene that was studied in a particular experiment, this
likely to function in similar ways. As a complement to         suggested the broader application of such graphs as
our previous work [4] on the assessment of                     ‘summary’ rather than ‘comparative’ graphs that
annotation consistency of independently developed              might be used to answer the request: “Show me
annotation sets for curated mammalian orthologs [5],           everything that is known about this gene.” The power
we provided comparative graphical visualizations of            of this representation is that it provides a view of the
annotations, one graph for each mouse-human-rat                summary of information derived from species-
ortholog triple with nodes colored according to                specific experimental results.
organism annotated. Coloring nodes to distinguish
                                                               In addition to the ability to visualize comparative
among annotations extends the usefulness of the
                                                               annotation sets, graphs can be used to coordinate
visualization for pattern recognition by users. The
                                                               information for animal models of human


                                                          22
   Figure 3. GO annotation graph for OMIM gene CATALASE; CAT. The graph coordinates GO annotations for
      thirteen model organisms with nodes colored by organism. Full graph and annotation set available at:
           http://www.spatial.maine.edu/~mdolan/OrthoDisease_Graphs/OMIM_GeneGraphs/CAT.html

diseases. The primary purpose of performing                           DESCRIPTION OF CURRENT WORK
experiments that study the consequences of mutations
in a particular organism is that these experiments               While each annotation group develops curation
provide valuable models for the understanding of                 standards to meet the needs of their community, one
human disease. We have extended our ontology                     of the important results of various ontology projects
visualization approach [6] to the orthology sets                 has been an attempt to develop a common vocabulary
developed in the resource OrthoDisease [7], a                    and shared annotation standards that enhance the
comprehensive database of model organism genes                   utility of these annotations for analysis. We have
that are orthologous to human disease genes derived              found that regardless of the ontology, presenting
from the OMIM database [8], a continuously updated               terms in a graphical context makes the relationships
catalog of human genes and inherited, or heritable,              of ontology terms clear, provides context for
genetic diseases. We have abstracted orthology                   annotations, and makes the examination of large
information on thirteen organisms for which curated              annotation sets feasible. The long-term objective,
GO annotation sets are publicly available. By                    now, is to build consensus for curation standards that
combining all GO annotations for the orthologs                   will strengthen the utility of data integration
associated with each disease gene or with each                   capabilities of this approach.
disease, we obtain a comprehensive annotation set for
each disease gene and for each disease. Each                     We have generalized our GO visualization approach
annotation set is presented on the GO graph with                 to other ontologies and annotation data sets. First, we
nodes having annotation colored according to the                 construct a complete graph to represent the ontology.
organism that is the source of the annotation. Figure 3          Second, we color nodes that have annotations and
shows part of the graph for OMIM gene CAT that                   limit the graph to the sections necessary to show all
demonstrates the degree of similarity annotations to             annotations. By limiting the graphs to annotated
diverse organisms can show. Of course, in some                   sections we do not have to deal with scalability issues
sense, it is the differences that are of more interest in        that might arise if we were to attempt to represent an
this case since we are interested in collecting together         entire ontology that includes thousands of terms.
as much information as possible.                                 Finally, we build a web page for each gene that
                                                                 includes an image of the graph and a table of
                                                                 annotations. In addition, to facilitate the examination



                                                            23
   Figure 4. The comparative graph paradigm: an ontology provides the relationship structure among terms; a
grouping idea defines the object set; and discriminating idea distinguishes objects whose annotations will be color-
                                                 coded in the graph.

of larger graphs, we provide scalable vector graphics            particular mouse models (phenotypes) reveal the
(SVG) images, which include pan-zoom-search                      contribution of particular genomic variants (alleles) to
functionality that allow a user to examine specific              the presentation of disease phenotypes. The
sections of the graphs. The graph images are                     annotation of genotype-phenotype associations is an
generated using GraphViz, a freely available, open               essential part of assessing mouse models for human
source graph layout program [9].                                 disease.

Gene expression data sets describe when and where                We have adapted our comparative GO annotation
particular genes are active. Providing a                         approach to phenotype annotations made to different
comprehensive picture of the level of gene expression            mouse gene alleles to create Mammalian Phenotype
across developmental stages and anatomical                       (MP) Ontology [12] graphs. As in the case of GO
structures will facilitate investigation of regulation of        comparative graphs (figure 2), the generalized
gene expression.                                                 approach to comparative graphs requires three things:
                                                                 an ontology to provide the relationship structure, a
We have applied our simple graphical display                     grouping idea to connect the annotated objects, and a
approach to gene expression data with annotations to             distinguishing idea (see figure 4). First, we construct
both the Adult Mouse Anatomical Dictionary (MA)                  a complete graph to represent the ontology. Second,
[10] and the Edinburgh atlas of mouse embryonic                  we color nodes that have annotations according to the
development (EMAP) [11]. For each gene with                      distinguishing characteristic and limit the graph to the
annotation data, the resulting graph shows the mouse             sections necessary to show all annotations. Finally,
anatomy ontology with anatomical structure nodes                 we build a web page for each gene that includes an
colored to indicate where that gene is expressed. In             image of the graph and a table of annotations.
addition, in the case of the EMAP graphs, we have
attempted to tease apart time dependence of gene                 In the case of the GO comparative graphs the
expression patterns by separating annotations to                 grouping idea is orthology and the distinguishing idea
different developmental stages by producing graphs               is organism: mouse annotations in blue, human
for each Theiler stage.                                          annotations in red and so forth. In the case of MP
                                                                 graphs the grouping idea is the gene and the
The laboratory mouse is an important model                       distinguishing idea is the allele: each allele’s
organism for a broad range of human diseases and                 annotated nodes are colored differently. In a similar
disorders, including diabetes, heart disease, and                way to color coding of GO nodes by organism, color-
cancer. Genomic and genetic investigations of                    coding of MP nodes by allele allows a user to easily



                                                            24
                           Figure 5. Part of the Adult
                                Mouse Anatomical                Adult Mouse Anatomical Dictionary graphs display
                                 Dictionary (MA)                relationships of annotations
                               annotation graph for
                           postnatal expression data            The Adult Mouse Anatomical Dictionary (MA) is an
                              for mouse gene Abcg2              anatomy ontology that can be used to provide
                          (ATP-binding cassette, sub-           standardized nomenclature for anatomical terms in
                          family G (WHITE), member              the postnatal mouse. It was developed as part of the
                                        2).                     Gene Expression Database (GXD) resource of
                           Full graph and annotation            information from the mouse [12]. The Adult Mouse
                                 set available at:              Anatomical     Dictionary     organizes     anatomical
                           http://www.spatial.maine.e           structures for the postnatal mouse spatially and
                          du/~mdolan/GXD_Graphs/                functionally. Each MGI gene detail page includes
                                   Abcg2.html                   links to gene expression data; the user can select data
                                                                for the postnatal mouse and obtain a tabular view of
                                                                available expression data.

                                                                Our graphical representations present another view of
                                                                the data, as shown in figure 5. This partial view of the
                                                                graph for Abcg2 (ATP-binding cassette, sub-family G
                                                                (WHITE), member 2) clearly shows the relationship
                                                                of three annotations as variations in granularity. Note
                                                                that the colored nodes indicate only direct annotations
                                                                made by curators from the literature, although
                                                                indirect annotation can be inferred from the ontology
                                                                structure.

                                                                EMAP graphs provide information on
                                                                developmental stage specific expression
see similarities and differences in alleles annotated to
different phenotypes. Our purpose in creating such              The Edinburgh Mouse Atlas Project (EMAP)
graphs is to move beyond simply providing another               annotation of gene expression data can be used to
representation of a phenotype data set to add potential         capture the complex and ever-changing patterns
value to this data set as a method of assessing mouse           throughout the development of the mammalian
models for human disease.                                       embryo and how they relate to the emerging tissue
                                                                structure at each developmental stage.
                      RESULTS
                                                                We have adapted the EMAP ontology to separate
                                                                annotations associated with different Theiler stages
Graphical representations of expression data sets               and created EMAP annotation graphs for each stage,
using anatomy ontologies                                        effectively treating each stage as a separate ontology
                                                                structure. With this approach we can, within the limits
The Mouse Anatomical Dictionary provides                        of incomplete annotation, see stage separated
ontologies that provide a standardized nomenclature             annotations as a time series of expression patterns.
for anatomical parts to describe the complex patterns           For example, figure 6 shows expression annotations
of gene expression in the developing and adult mouse            for mouse gene Shh (Sonic hedgehog) for Theiler
and how they relate to the emerging tissue structure.           stages 11 (figure 6, upper panel) and 12 (figure 6,
Terms that describe embryonic developmental stages              lower panel). A user might consult such graphs to
(Theiler Stages 1 through 26) have been developed               explore changes in expression pattern between stages
by the Edinburgh Mouse Atlas Project (EMAP) [11].               or determine the earliest stage at which the gene is
Terms that describe mice at postnatal stages,                   known to be expressed in a particular anatomical
including adult (Theiler stage 28) have been                    structure. The way these graphs are presented at our
developed as the Adult Mouse Anatomical Dictionary              web site, a user can move forward or back to adjacent
(MA) [10].                                                      Theiler stage.




                                                           25
Figure 6. EMAP ontology graphs for Theiler stages 11 (upper) and Theiler stage 12 (lower) displaying expression
   patterns for mouse Shh (Sonic hedgehog). (Annotations available from GXD.)Full graph and annotation set
             available at: http://www.spatial.maine.edu/~mdolan/GXD_Graphs/TimeSlices/TS11.html




                                                      26
 Figure 7. Detail of the Mammalian Phenotype (MP) Ontology annotation graph for two alleles of mouse gene Arx
representing allelic compositions Arxtm1Kki/Y (blue/lighter shading) and Arxtm1Pgr/Y (red/darker shading). We observe
 that the allele annotations segregate in separate ontology branches. Only the allelic composition Arxtm1Kki/Y high-
     level phenotypes correspond to nervous system and reproductive system phenotypes, while only the allelic
    composition Arxtm1Pgr/Y corresponds to homeostasis/metabolism and growth/size phenotype. Full graph and
          annotation set available at: http://www.spatial.maine.edu/~mdolan/GenoPheno_Graphs/Arx.html

Using graphical representations to reason about               how do the annotations to the different alleles
annotations: assess mouse models for human                    compare? Applying the comparative graph
disease                                                       methodology and indicating MP annotations to terms
                                                              by color-coding according to allelic composition
The Mammalian Phenotype (MP) Ontology [13] is                 Arxtm1Kki/Y and Arxtm1Pgr/Y results in the graph detail
used by MGI to represent phenotypic data. The MP              shown in figure 7. (Information on mouse strain
Ontology enables annotation of mammalian                      background is not indicated in the graph but is given
phenotypes in the context of mutations and strains            in a complete annotation table that accompanies the
that are used as models of human disease and                  graph.) We observe that in the graph the allele
supports different levels of phenotypic knowledge.            annotations segregate in separate branches reflecting
For example, among the highest levels of the MP               the fact that the phenotype annotations associated
Ontology are terms for: growth/size phenotype,                with the two alleles fall into distinct high-level
homeostasis/metabolism phenotype, nervous system              phenotypes. Only the allelic composition Arxtm1Kki/Y
phenotype, and reproductive system phenotype.                 corresponds to high-level nervous system and
                                                              reproductive system phenotypes, while only the
So for example, the mouse gene Arx (aristaless                allelic composition Arxtm1Pgr/Y corresponds to
related homeobox gene (Drosophila)) has 2 alleles,            homeostasis/metabolism and growth/size phenotypes.
Arxtm1Kki and Arxtm1Pgr, both of which have been
annotated to MP by curators at MGI. We might ask:



                                                         27
                                                          .




 Figure 8. MGI integrates data on mouse models of human disease from OMIM with existing data for mouse genes
and strains. For example, as shown on this “Associated Human Diseases” information page for Arx, Arxtm1Kki /Y on
the strain background 129P2/OlaHsd * C57BL is a known mouse model for OMIM human disease, “Lissencephaly,
X-Linked, with Ambiguous Genitalia; XLAG” characterized by nervous system and reproductive system phenotypes.
   The visualization methodology as shown in figure 7 is consistent with the known association of this particular
                      human disease and the Arxtm1Kki mouse model. (This page is available at:
            http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=humanDisease&key=850912 )


This distinction is confirmed by seeing that, indeed,          Availability of graphs
Arxtm1Kki is a known mouse model for OMIM human
disease, “Lissencephaly, X-Linked, with Ambiguous              All graphs presented in this work are publicly
Genitalia; XLAG” (see figure 8), which is                      available.
characterized by nervous system and reproductive               •    The GO graphs are available for each gene from
system phenotypes. The visualization methodology                    the gene detail pages at MGI.
outlined here is consistent with the known association         •    The OrthoDisease graphs are available at:
of this particular human disease and the Arxtm1Kki                  http://www.spatial.maine.edu/~mdolan/OrthoDis
mouse model. Our hope is that examination of the                    ease_Graphs/
MP graphs for specific disease associated phenotypes           •    The Adult Mouse Anatomical Dictionary (MA)
would help point to good mouse models. To facilitate                graphs for GXD data for selected genes are
this, we have created an index to all genes and alleles             available
indicating high-level phenotypes. For example, a user               http://www.spatial.maine.edu/~mdolan/GXD_Gr
can search the index for all genes and alleles                      aphs/
annotated for “nervous system phenotype” and                   •    The Theiler stage separated Edinburgh Mouse
examine the linked MP graphs for segregation of                     Atlas Project (EMAP) graphs displaying GXD
allele phenotypes and a potential novel mouse model                 data      for    Shh      are  available     at:
for a human disease characterized by nervous system                 http://www.spatial.maine.edu/~mdolan/GXD_Gr
abnormality. In this way we have extended the                       aphs/TimeSlices
usefulness of the graphical representations beyond             •    The Mammalian Phenotype (MP) graphs for all
just another way of presenting the data to a method                 MGI genes with phenotype annotations are
that allows a user to reason about annotations.                     available                                    at:
                                                                    http://www.spatial.maine.edu/~mdolan/GenoPhe
                                                                    no_Graphs/


                                                          28
                                                                  8.   Hamosh A, Scott AF, Amberger JS, Bocchini CA and
                   CONCLUSIONS                                         McKusick VA. Online Mendelian Inheritance in Man
Biological systems can be very complex but many                        (OMIM), a knowledgebase of human genes and
                                                                       genetic disorders. Nucleic Acids Research. 2005, 33:
aspects of biological system characterization have a
                                                                       D514-D517.
wealth of biomedical knowledge accumulated over
years of clinical and laboratory experience.                      9.   GraphViz [http://www.graphviz.org/]
Ontologies provide a shared understanding of a                    10. Hayamizu TF, Mangan M, Corradi JP, Kadin JA and
domain that is human intelligible and computer                        Ringwald M. The Adult Mouse Anatomical
readable that can help support the integration and                    Dictionary: a tool for annotating and integrating data.
retrieval of this knowledge.                                          Genome Biology 2005, 6:R29 1-8.
                                                                  11. Baldock RA, Bard JB, Burger A, Burton N,
Here we provide a methodology to visualize sets of                    Christiansen J, Feng G, Hill B, Houghton D, Kaufman
annotations as provided by a model organism                           M, Rao J, et al. EMAP and EMAGE: a framework for
database curation system to aid researchers in better                 understanding       spatially     organized    data.
comprehending and navigating the data. The result is                  Neuroinformatics 2003, 1:309-325.
a comprehensive view of available knowledge. As
                                                                  12. Hill DP, Begley DA, Finger JH, Hayamizu TF,
more annotations are made and become available,
                                                                      McCright IJ, Smith CM, Beal JS, Corbani LE, Blake
such tools will be both more necessary, to handle                     JA, Eppig JT, et al. The mouse Gene Expression
larger data sets, and more useful, as annotation                      Database (GXD): updates and enhancements. Nucleic
approaches completeness. We believe that this                         Acids Res 2004, 32: D568-D571.
approach to coordinating biological knowledge
available in model organism resources will provide a              13. Smith CL, Goldsmith CW and Eppig JT. The
valuable resource in medical research and contribute                   Mammalian Phenotype Ontology as a tool for
                                                                       annotating, analyzing and comparing phenotypic
to understanding these systems.
                                                                       information. Genome Biology 2004, 6:R7 1-9.
                  Acknowledgements
    This work is funded by NIH/NHGRI (HG-
002273).

                       References

1.   Open         Biomedical       Ontologies      (OBO)
     [http://obo.sourceforge.net/]
2.   The Gene Ontology Consortium. Gene Ontology: tool
     for the unification of biology. Nature Genetics 2000,
     5: 25-29.
3.   The Gene Ontology Consortium. The Gene Ontology
     (GO) project in 2006. Nucleic Acids Res 2006, 34:
     D322-D326
4.   Dolan ME, Ni L, Camon E, and Blake JA. A
     procedure for assessing GO annotation consistency.
     Bioinformatics 2005, 21(Suppl 1):i136-i143.
5.   Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake
     JA, et al. The Mouse Genome Database (MGD): from
     genes to mice -- a community resource for mouse
     biology. Nucleic Acids Research 2005, 33: D471-5.
6.   Dolan ME and Blake JA. Using Ontology
     Visualization to Coordinate Cross-species Functional
     Annotation for Human Disease Genes. Proceedings
     Nineteenth IEEE International Symposium on
     Computer-based Medical Systems: Ontologies for
     Biomedical Systems 2006, 583-587.
7.   O'Brien KP, Westerlund I, Sonnhammer EL.
     OrthoDisease: a database of human disease orthologs.
     Human mutation 2004, 24(2):112-9.



                                                             29