Using Biomedical Ontologies for Data Representation and Man- agement in the Mouse Genome Informatics (MGI) System Li Ni,* Carol J. Bult, Jim A. Kadin, Joel E. Richardson, Martin Ringwald, Janan T. Eppig, Judith A. Blake The Jackson Laboratory, Bar Harbor, Maine, U.S.A. ABSTRACT allows consistent annotation of mouse genotypes with standard phenotype MGI is also the originator of the Adult Structured vocabularies and ontologies are increasingly uti- Mouse Anatomy (MA), it allows us to navigate through the lized for managing, annotating, and analyzing complex bio- extensive dictionary hierarchies for the different develop- logical data. The Mouse Genome Informatics (MGI, mental stages, to locate specific anatomical structures within http://www.informatics.jax.org) resource contributes to and those hierarchies, and to annotate an obtain the expression utilizes a variety of vocabularies and ontologies to robustly results associated with those structures. capture and provide biological information about the labora- tory mouse and its use as a model for human disease. Here MGI also incorporates Online Mendelian Inheritance in we report on the power and complexity of using biomedical Man (OMIM) terms to make similarity assertions between ontologies for data representation and management at MGI. mouse models and human disease; incorporates the Mouse MGI is a highly integrated database and software system Anatomical Dictionary for describing expression data dur- integrating mouse genetic, genomic and phenotypic infor- ing mouse embryonic development. mation including data on gene characterization, pathways, protein classifications, sequence, gene expression, alleles, Additionally, for mouse gene products functional annota- phenotypes, diseases, and comparative gene data for mouse, tion, MGI start to use the Cell Ontology, the Mouse Embry- human, rat and other mammals. onic and Adult Anatomy Ontologies, the Evidence Code Ontology, and PSI-mod to add additional information to a The Gene Ontology (GO, http://www.geneontology.org) is Gene Ontology annotation. These combined ontology anno- the most widely used vocabulary for providing connections tations will be loaded into the database in the near future. between proteins and their roles in the biological organiza- tion. MGI is a founding member of the GO Consortium, and The structured vocabulary-based annotations assist in robust actively participates in the ongoing development of GO, as and accurate data mining when posing such complex ques- well as applying GO to functional annotation of mouse gene tions in both computational and individual formats at MGI. products. MGI provide an automatically generated text re- Though the use of multiple ontologies, MGI is able to ro- port, a tabular form view and a graphical display of GO an- bustly represent many components of knowledge about the notations in the gene function detail page. MGI also provide mouse model system. While these ontologies are inde- comparative graphs of GO annotations for mammalian pendently developed, the concurrent use of them within the orthologs. In addition, MGI hosts graphical and statistical MGI system illuminates some challenges in the intersection tools exploiting the hierarchical structure of the GO to aid in of ontologies that MGI ontology developers and curators visualization of annotations and the analysis of large data work to address. For example, the MP and the GO termi- sets such as microarray data. nologies incorporate mouse anatomical terms from the MA. Testing and updating the MP and the GO to accurately MGI has spearheaded the development of two other major reflect the canonical anatomy organization in the MA re- structured vocabularies in support of annotation and analysis quires resources and attention on a regular basis. of mouse biology: the Mammalian Phenotype (MP) Ontolo- gy, a widely adopted ontological model that enables pheno- MGI is a consortium of database resources with funding type annotations to background-specific allelic genotypes at obtained from NHGRI (HG00330 and HG02273), varying degrees of granularity, this structured vocabulary NIH/NICHD (HD062499), and NCI (CA89713). * To whom correspondence should be addressed: li.ni@jax.org 1