Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 1 Adapting Disease Vocabularies for Curation at the Rat Genome Database Laulederkind SJ, Hayman GT, Wang SJ, Bolton ER, Smith JR, Tutaj M, De Pons J, Shimoyama M Dwinell MR Department of Biomedical Engineering Genomic Sciences and Precision Medicine Center Medical College of Wisconsin and Marquette University Medical College of Wisconsin Milwaukee, WI, USA Milwaukee, WI, USA Abstract— The Rat Genome Database (RGD) has been term “osteoarthritis” in DO has no children terms, but the RGD annotating genes, QTLs, and strains to disease terms for over 15 version of DO has 11 children terms or variations of years. During that time the controlled vocabulary used for “osteoarthritis” (Figure 1). The extra details of those terms is disease curation has changed a few times. The changes were lost to users of DO. To avoid the loss of granularity it was necessitated because no single vocabulary or ontology was freely decided to extend the DO beyond the merged, axiomized DO file. accessible and complete enough to cover all of the disease states After mapping/adding DO terms completely to the RGD version described in the biomedical literature. of MEDIC, a broader, deeper disease vocabulary has been achieved, by providing more term branches throughout the The first disease vocabulary used at RGD was the “C” branch ontology and more child terms within those branches. of the National Library of Medicine’s Medical Subject Headings (MeSH). For at least a few years it was the most publicly Keywords— Rat Genome Database, disease, vocabularies, accessible, complete, and useful vocabulary to describe diseases online resource and disease processes. However, it still had many holes in its coverage of disease vocabulary and an improved vocabulary was much desired. By 2011 RGD had switched disease curation to the use of MEDIC (MErged DIsease voCabulary), which was a combination of MeSH and OMIM (Online Mendelian Inheritance in Man) constructed by curators at the Comparative Toxicogenomics Database (CTD). MEDIC was an improvement over MeSH, because of the added coverage of OMIM terms, but it was not long before RGD curators saw the need for more disease terms. So within a couple of years, RGD began to add terms to MEDIC under the guise of the RGD Disease Ontology (RDO). Since RGD assigned a unique ID to every MEDIC term imported from CTD, it was easy to add specially coded IDs to indicate those additional terms from a separate, supplemental file. Meanwhile, the human disease ontology (DO) had slowly been developing and expanding. As early as 2010, members of RGD were contributing to the development of DO. However, five Figure 1. RDO Children terms of Osteoarthritis. years went by before MGD (mouse genome database) and RGD joined with DO in an organized attempt to make DO useful for the model organism community. From that collaboration came a large addition of OMIM-based terms, expansion of multi- parentage of terms through axiomatic extension, and expansion of cross-references to clinical vocabularies. Based on the promise of those improvements, it was determined that the Alliance of Genome Resources could use the DO as a unifying disease vocabulary across model organism databases. Despite the improvements in DO, RGD still had more than 1000 custom terms and 3800 MEDIC terms with annotations to deal with if RGD would convert to the use of DO. Those extra terms originated from OMIM, MeSH, and the biomedical literature. If RGD mapped those non-DO disease terms to DO, much granularity of meaning would be lost. For instance the ICBO 2018 August 7-10, 2018 1 Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 2 Figure 2. Osteoarthritis with Mild Chondrodysplasia ICBO 2018 August 7-10, 2018 2