=Paper=
{{Paper
|id=Vol-222/paper-3
|storemode=property
|title=Using Ontology Graphs to Understand Annotations and Reason about Them
|pdfUrl=https://ceur-ws.org/Vol-222/krmed2006-p03.pdf
|volume=Vol-222
|dblpUrl=https://dblp.org/rec/conf/krmed/DolanB06
}}
==Using Ontology Graphs to Understand Annotations and Reason about Them==
KR-MED 2006 "Biomedical Ontology in Action"
November 8, 2006, Baltimore, Maryland, USA
Using ontology visualization to understand annotations and
reason about them
Mary E. Dolan, Ph.D., and Judith A. Blake, Ph.D.,
Mouse Genome Informatics [MGI], The Jackson Laboratory,
Bar Harbor, ME 04609 USA
mdolan@informatics.jax.org
Biomedical ontologies not only capture a wealth of
biological knowledge but also provide a BACKGROUND
representational system to support the integration
and retrieval of biological information. Various Ontologies can be used to abstract knowledge of a
biomedical ontologies are used by model organism domain in a way that can be used by both by humans
databases to annotate biological entities to the and computers by providing an explicit representation
literature and have become an essential part of high of the entities of interest and the relationships among
throughput experiments and bioinformatics research. them. In particular, biomedical ontologies
We are exploring the power of ontology visualization representing various aspects of biology are being
to enhance the understanding of annotations by used for annotating entities to the literature and for
placing annotations in the graph context of the integrating the diverse information resulting from the
broader biological knowledge the ontology provides. analysis of high-throughput experiments.
Presenting annotations in this context provides a
better understanding of the annotations because Open Biomedical Ontologies (OBO) is an umbrella
humans are adept at extracting patterns and repository for well-structured controlled vocabularies
information from graphical representations of for shared use across different biological and medical
complex data. domains [1]. The OBO website contains a range of
ontologies that are designed for biomedical domains.
INTRODUCTION Some of the OBO ontologies, such as the Gene
Ontology (GO), apply across all organisms. Others
Biological systems can be very complex but many are more restricted in scope; for example, the
aspects of biological system characterization have a Mammalian Phenotype Ontology (MP) is a phenotype
wealth of biomedical knowledge accumulated over ontology designed for specific taxonomic groups.
years of clinical and laboratory experience.
Ontologies provide a shared understanding of a The GO Project was established to provide
domain that is human intelligible and computer structured, controlled, organism-independent
readable and, consequently, a representational system vocabularies to describe gene functions [2] and, as a
to support the integration and retrieval of this consequence, provides semantic standards for
knowledge. annotation of molecular attributes in different
databases. Members of the GO Consortium supply
As techniques of large-scale genomic analysis and annotations of gene products using this vocabulary.
functional gene annotation have progressed and are The GO and annotations made to GO provide
becoming more common, it is essential to find consistent descriptions of gene products and a
approaches to provide a comprehensive view of valuable resource for comparative functional analysis
annotation sets. We are exploring the power of research.
several widely used ontologies to provide a
comprehensive graphical view of annotations by Currently, the three ontologies of GO contain nearly
presenting the annotations visualized within an 20,000 terms [3]. The terms are organized in
ontology relationship structure. By presenting structures called directed acyclic graphs (DAGs)
annotations in the graph context we hope to provide a which differ from strict hierarchies in that a more
better understanding of the annotations because specialized (granular) child term can have more than
humans are adept at extracting patterns and one less specialized parent term. In the GO a child
information from graphical representations of can be related to a parent by either a ‘part of’ or ‘is a’
complex data. relationship. Mouse Genome Informatics (MGI)
curators use the GO to annotate mouse genes from the
literature. Currently, MGI has more than 100,000
annotations to more than 17,000 genes;
approximately half of the annotations are manual
21
Figure 1. GO annotation graph for mouse Hgs Figure 2. GO comparative graph for MGI curated
(HGF-regulated tyrosine kinase substrate) provides orthologs to mouse Pax6 (paired box gene 6). The
an alternative to tabular or text views. Blue/shaded nodes are color-coded according to organism: mouse
nodes in the GO graph indicate mouse annotations. annotations shown in blue/lighter shading, human
Full graph and annotation set available at: annotations in red/darker shading, multiple
http://www.informatics.jax.org/javawi2/servlet/WIFet organisms in gray. Full graph available at:
ch?page=GOMarkerGraph&id=MGI:104681 http://www.informatics.jax.org/javawi2/servlet/WIFet
ch?page=GOOrthologyGraph&id=MGI:97490
annotations from the literature, the balance from graphical format, as shown in figure 2, allows a user
automated data loads. An MGI user has the option of to assess the consistency, inconsistency and level of
viewing the full set of GO annotations for a particular detail of annotations made to different model
gene in three formats: as a table, as automatically organisms.
generated text, and as a graph. The graph presents
Our examination of the comparative graphs led to the
relevant parts of the GO with direct annotations
observation that annotations are often
indicated as colored nodes, as shown in figure 1. The
complementary, reflecting the fact that the different
graphical format allows a user to easily see, for
model organisms are used to study different aspects
example, whether a gene product appears to
of biology. Since biologists are often species-blind
participate in a broad range of molecular functions or
and assemble their initial picture of a gene and its
in only a narrow, specialized function.
function without regard to the taxonomic origin of the
Genes that share close evolutionary relationships are gene that was studied in a particular experiment, this
likely to function in similar ways. As a complement to suggested the broader application of such graphs as
our previous work [4] on the assessment of ‘summary’ rather than ‘comparative’ graphs that
annotation consistency of independently developed might be used to answer the request: “Show me
annotation sets for curated mammalian orthologs [5], everything that is known about this gene.” The power
we provided comparative graphical visualizations of of this representation is that it provides a view of the
annotations, one graph for each mouse-human-rat summary of information derived from species-
ortholog triple with nodes colored according to specific experimental results.
organism annotated. Coloring nodes to distinguish
In addition to the ability to visualize comparative
among annotations extends the usefulness of the
annotation sets, graphs can be used to coordinate
visualization for pattern recognition by users. The
information for animal models of human
22
Figure 3. GO annotation graph for OMIM gene CATALASE; CAT. The graph coordinates GO annotations for
thirteen model organisms with nodes colored by organism. Full graph and annotation set available at:
http://www.spatial.maine.edu/~mdolan/OrthoDisease_Graphs/OMIM_GeneGraphs/CAT.html
diseases. The primary purpose of performing DESCRIPTION OF CURRENT WORK
experiments that study the consequences of mutations
in a particular organism is that these experiments While each annotation group develops curation
provide valuable models for the understanding of standards to meet the needs of their community, one
human disease. We have extended our ontology of the important results of various ontology projects
visualization approach [6] to the orthology sets has been an attempt to develop a common vocabulary
developed in the resource OrthoDisease [7], a and shared annotation standards that enhance the
comprehensive database of model organism genes utility of these annotations for analysis. We have
that are orthologous to human disease genes derived found that regardless of the ontology, presenting
from the OMIM database [8], a continuously updated terms in a graphical context makes the relationships
catalog of human genes and inherited, or heritable, of ontology terms clear, provides context for
genetic diseases. We have abstracted orthology annotations, and makes the examination of large
information on thirteen organisms for which curated annotation sets feasible. The long-term objective,
GO annotation sets are publicly available. By now, is to build consensus for curation standards that
combining all GO annotations for the orthologs will strengthen the utility of data integration
associated with each disease gene or with each capabilities of this approach.
disease, we obtain a comprehensive annotation set for
each disease gene and for each disease. Each We have generalized our GO visualization approach
annotation set is presented on the GO graph with to other ontologies and annotation data sets. First, we
nodes having annotation colored according to the construct a complete graph to represent the ontology.
organism that is the source of the annotation. Figure 3 Second, we color nodes that have annotations and
shows part of the graph for OMIM gene CAT that limit the graph to the sections necessary to show all
demonstrates the degree of similarity annotations to annotations. By limiting the graphs to annotated
diverse organisms can show. Of course, in some sections we do not have to deal with scalability issues
sense, it is the differences that are of more interest in that might arise if we were to attempt to represent an
this case since we are interested in collecting together entire ontology that includes thousands of terms.
as much information as possible. Finally, we build a web page for each gene that
includes an image of the graph and a table of
annotations. In addition, to facilitate the examination
23
Figure 4. The comparative graph paradigm: an ontology provides the relationship structure among terms; a
grouping idea defines the object set; and discriminating idea distinguishes objects whose annotations will be color-
coded in the graph.
of larger graphs, we provide scalable vector graphics particular mouse models (phenotypes) reveal the
(SVG) images, which include pan-zoom-search contribution of particular genomic variants (alleles) to
functionality that allow a user to examine specific the presentation of disease phenotypes. The
sections of the graphs. The graph images are annotation of genotype-phenotype associations is an
generated using GraphViz, a freely available, open essential part of assessing mouse models for human
source graph layout program [9]. disease.
Gene expression data sets describe when and where We have adapted our comparative GO annotation
particular genes are active. Providing a approach to phenotype annotations made to different
comprehensive picture of the level of gene expression mouse gene alleles to create Mammalian Phenotype
across developmental stages and anatomical (MP) Ontology [12] graphs. As in the case of GO
structures will facilitate investigation of regulation of comparative graphs (figure 2), the generalized
gene expression. approach to comparative graphs requires three things:
an ontology to provide the relationship structure, a
We have applied our simple graphical display grouping idea to connect the annotated objects, and a
approach to gene expression data with annotations to distinguishing idea (see figure 4). First, we construct
both the Adult Mouse Anatomical Dictionary (MA) a complete graph to represent the ontology. Second,
[10] and the Edinburgh atlas of mouse embryonic we color nodes that have annotations according to the
development (EMAP) [11]. For each gene with distinguishing characteristic and limit the graph to the
annotation data, the resulting graph shows the mouse sections necessary to show all annotations. Finally,
anatomy ontology with anatomical structure nodes we build a web page for each gene that includes an
colored to indicate where that gene is expressed. In image of the graph and a table of annotations.
addition, in the case of the EMAP graphs, we have
attempted to tease apart time dependence of gene In the case of the GO comparative graphs the
expression patterns by separating annotations to grouping idea is orthology and the distinguishing idea
different developmental stages by producing graphs is organism: mouse annotations in blue, human
for each Theiler stage. annotations in red and so forth. In the case of MP
graphs the grouping idea is the gene and the
The laboratory mouse is an important model distinguishing idea is the allele: each allele’s
organism for a broad range of human diseases and annotated nodes are colored differently. In a similar
disorders, including diabetes, heart disease, and way to color coding of GO nodes by organism, color-
cancer. Genomic and genetic investigations of coding of MP nodes by allele allows a user to easily
24
Figure 5. Part of the Adult
Mouse Anatomical Adult Mouse Anatomical Dictionary graphs display
Dictionary (MA) relationships of annotations
annotation graph for
postnatal expression data The Adult Mouse Anatomical Dictionary (MA) is an
for mouse gene Abcg2 anatomy ontology that can be used to provide
(ATP-binding cassette, sub- standardized nomenclature for anatomical terms in
family G (WHITE), member the postnatal mouse. It was developed as part of the
2). Gene Expression Database (GXD) resource of
Full graph and annotation information from the mouse [12]. The Adult Mouse
set available at: Anatomical Dictionary organizes anatomical
http://www.spatial.maine.e structures for the postnatal mouse spatially and
du/~mdolan/GXD_Graphs/ functionally. Each MGI gene detail page includes
Abcg2.html links to gene expression data; the user can select data
for the postnatal mouse and obtain a tabular view of
available expression data.
Our graphical representations present another view of
the data, as shown in figure 5. This partial view of the
graph for Abcg2 (ATP-binding cassette, sub-family G
(WHITE), member 2) clearly shows the relationship
of three annotations as variations in granularity. Note
that the colored nodes indicate only direct annotations
made by curators from the literature, although
indirect annotation can be inferred from the ontology
structure.
EMAP graphs provide information on
developmental stage specific expression
see similarities and differences in alleles annotated to
different phenotypes. Our purpose in creating such The Edinburgh Mouse Atlas Project (EMAP)
graphs is to move beyond simply providing another annotation of gene expression data can be used to
representation of a phenotype data set to add potential capture the complex and ever-changing patterns
value to this data set as a method of assessing mouse throughout the development of the mammalian
models for human disease. embryo and how they relate to the emerging tissue
structure at each developmental stage.
RESULTS
We have adapted the EMAP ontology to separate
annotations associated with different Theiler stages
Graphical representations of expression data sets and created EMAP annotation graphs for each stage,
using anatomy ontologies effectively treating each stage as a separate ontology
structure. With this approach we can, within the limits
The Mouse Anatomical Dictionary provides of incomplete annotation, see stage separated
ontologies that provide a standardized nomenclature annotations as a time series of expression patterns.
for anatomical parts to describe the complex patterns For example, figure 6 shows expression annotations
of gene expression in the developing and adult mouse for mouse gene Shh (Sonic hedgehog) for Theiler
and how they relate to the emerging tissue structure. stages 11 (figure 6, upper panel) and 12 (figure 6,
Terms that describe embryonic developmental stages lower panel). A user might consult such graphs to
(Theiler Stages 1 through 26) have been developed explore changes in expression pattern between stages
by the Edinburgh Mouse Atlas Project (EMAP) [11]. or determine the earliest stage at which the gene is
Terms that describe mice at postnatal stages, known to be expressed in a particular anatomical
including adult (Theiler stage 28) have been structure. The way these graphs are presented at our
developed as the Adult Mouse Anatomical Dictionary web site, a user can move forward or back to adjacent
(MA) [10]. Theiler stage.
25
Figure 6. EMAP ontology graphs for Theiler stages 11 (upper) and Theiler stage 12 (lower) displaying expression
patterns for mouse Shh (Sonic hedgehog). (Annotations available from GXD.)Full graph and annotation set
available at: http://www.spatial.maine.edu/~mdolan/GXD_Graphs/TimeSlices/TS11.html
26
Figure 7. Detail of the Mammalian Phenotype (MP) Ontology annotation graph for two alleles of mouse gene Arx
representing allelic compositions Arxtm1Kki/Y (blue/lighter shading) and Arxtm1Pgr/Y (red/darker shading). We observe
that the allele annotations segregate in separate ontology branches. Only the allelic composition Arxtm1Kki/Y high-
level phenotypes correspond to nervous system and reproductive system phenotypes, while only the allelic
composition Arxtm1Pgr/Y corresponds to homeostasis/metabolism and growth/size phenotype. Full graph and
annotation set available at: http://www.spatial.maine.edu/~mdolan/GenoPheno_Graphs/Arx.html
Using graphical representations to reason about how do the annotations to the different alleles
annotations: assess mouse models for human compare? Applying the comparative graph
disease methodology and indicating MP annotations to terms
by color-coding according to allelic composition
The Mammalian Phenotype (MP) Ontology [13] is Arxtm1Kki/Y and Arxtm1Pgr/Y results in the graph detail
used by MGI to represent phenotypic data. The MP shown in figure 7. (Information on mouse strain
Ontology enables annotation of mammalian background is not indicated in the graph but is given
phenotypes in the context of mutations and strains in a complete annotation table that accompanies the
that are used as models of human disease and graph.) We observe that in the graph the allele
supports different levels of phenotypic knowledge. annotations segregate in separate branches reflecting
For example, among the highest levels of the MP the fact that the phenotype annotations associated
Ontology are terms for: growth/size phenotype, with the two alleles fall into distinct high-level
homeostasis/metabolism phenotype, nervous system phenotypes. Only the allelic composition Arxtm1Kki/Y
phenotype, and reproductive system phenotype. corresponds to high-level nervous system and
reproductive system phenotypes, while only the
So for example, the mouse gene Arx (aristaless allelic composition Arxtm1Pgr/Y corresponds to
related homeobox gene (Drosophila)) has 2 alleles, homeostasis/metabolism and growth/size phenotypes.
Arxtm1Kki and Arxtm1Pgr, both of which have been
annotated to MP by curators at MGI. We might ask:
27
.
Figure 8. MGI integrates data on mouse models of human disease from OMIM with existing data for mouse genes
and strains. For example, as shown on this “Associated Human Diseases” information page for Arx, Arxtm1Kki /Y on
the strain background 129P2/OlaHsd * C57BL is a known mouse model for OMIM human disease, “Lissencephaly,
X-Linked, with Ambiguous Genitalia; XLAG” characterized by nervous system and reproductive system phenotypes.
The visualization methodology as shown in figure 7 is consistent with the known association of this particular
human disease and the Arxtm1Kki mouse model. (This page is available at:
http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=humanDisease&key=850912 )
This distinction is confirmed by seeing that, indeed, Availability of graphs
Arxtm1Kki is a known mouse model for OMIM human
disease, “Lissencephaly, X-Linked, with Ambiguous All graphs presented in this work are publicly
Genitalia; XLAG” (see figure 8), which is available.
characterized by nervous system and reproductive • The GO graphs are available for each gene from
system phenotypes. The visualization methodology the gene detail pages at MGI.
outlined here is consistent with the known association • The OrthoDisease graphs are available at:
of this particular human disease and the Arxtm1Kki http://www.spatial.maine.edu/~mdolan/OrthoDis
mouse model. Our hope is that examination of the ease_Graphs/
MP graphs for specific disease associated phenotypes • The Adult Mouse Anatomical Dictionary (MA)
would help point to good mouse models. To facilitate graphs for GXD data for selected genes are
this, we have created an index to all genes and alleles available
indicating high-level phenotypes. For example, a user http://www.spatial.maine.edu/~mdolan/GXD_Gr
can search the index for all genes and alleles aphs/
annotated for “nervous system phenotype” and • The Theiler stage separated Edinburgh Mouse
examine the linked MP graphs for segregation of Atlas Project (EMAP) graphs displaying GXD
allele phenotypes and a potential novel mouse model data for Shh are available at:
for a human disease characterized by nervous system http://www.spatial.maine.edu/~mdolan/GXD_Gr
abnormality. In this way we have extended the aphs/TimeSlices
usefulness of the graphical representations beyond • The Mammalian Phenotype (MP) graphs for all
just another way of presenting the data to a method MGI genes with phenotype annotations are
that allows a user to reason about annotations. available at:
http://www.spatial.maine.edu/~mdolan/GenoPhe
no_Graphs/
28
8. Hamosh A, Scott AF, Amberger JS, Bocchini CA and
CONCLUSIONS McKusick VA. Online Mendelian Inheritance in Man
Biological systems can be very complex but many (OMIM), a knowledgebase of human genes and
genetic disorders. Nucleic Acids Research. 2005, 33:
aspects of biological system characterization have a
D514-D517.
wealth of biomedical knowledge accumulated over
years of clinical and laboratory experience. 9. GraphViz [http://www.graphviz.org/]
Ontologies provide a shared understanding of a 10. Hayamizu TF, Mangan M, Corradi JP, Kadin JA and
domain that is human intelligible and computer Ringwald M. The Adult Mouse Anatomical
readable that can help support the integration and Dictionary: a tool for annotating and integrating data.
retrieval of this knowledge. Genome Biology 2005, 6:R29 1-8.
11. Baldock RA, Bard JB, Burger A, Burton N,
Here we provide a methodology to visualize sets of Christiansen J, Feng G, Hill B, Houghton D, Kaufman
annotations as provided by a model organism M, Rao J, et al. EMAP and EMAGE: a framework for
database curation system to aid researchers in better understanding spatially organized data.
comprehending and navigating the data. The result is Neuroinformatics 2003, 1:309-325.
a comprehensive view of available knowledge. As
12. Hill DP, Begley DA, Finger JH, Hayamizu TF,
more annotations are made and become available,
McCright IJ, Smith CM, Beal JS, Corbani LE, Blake
such tools will be both more necessary, to handle JA, Eppig JT, et al. The mouse Gene Expression
larger data sets, and more useful, as annotation Database (GXD): updates and enhancements. Nucleic
approaches completeness. We believe that this Acids Res 2004, 32: D568-D571.
approach to coordinating biological knowledge
available in model organism resources will provide a 13. Smith CL, Goldsmith CW and Eppig JT. The
valuable resource in medical research and contribute Mammalian Phenotype Ontology as a tool for
annotating, analyzing and comparing phenotypic
to understanding these systems.
information. Genome Biology 2004, 6:R7 1-9.
Acknowledgements
This work is funded by NIH/NHGRI (HG-
002273).
References
1. Open Biomedical Ontologies (OBO)
[http://obo.sourceforge.net/]
2. The Gene Ontology Consortium. Gene Ontology: tool
for the unification of biology. Nature Genetics 2000,
5: 25-29.
3. The Gene Ontology Consortium. The Gene Ontology
(GO) project in 2006. Nucleic Acids Res 2006, 34:
D322-D326
4. Dolan ME, Ni L, Camon E, and Blake JA. A
procedure for assessing GO annotation consistency.
Bioinformatics 2005, 21(Suppl 1):i136-i143.
5. Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake
JA, et al. The Mouse Genome Database (MGD): from
genes to mice -- a community resource for mouse
biology. Nucleic Acids Research 2005, 33: D471-5.
6. Dolan ME and Blake JA. Using Ontology
Visualization to Coordinate Cross-species Functional
Annotation for Human Disease Genes. Proceedings
Nineteenth IEEE International Symposium on
Computer-based Medical Systems: Ontologies for
Biomedical Systems 2006, 583-587.
7. O'Brien KP, Westerlund I, Sonnhammer EL.
OrthoDisease: a database of human disease orthologs.
Human mutation 2004, 24(2):112-9.
29