Developing the Animals in Context Ontology Suzanne L. Santamaria1,* Maureen Fallon1, Julie M. Green1, Stefan Schulz2,3 and Jeffrey R. Wilcke1 1 Veterinary Medical Informatics Laboratory at Virginia Tech, Blacksburg, Virginia, US 2 Institute for Medical Informatics, Statistics and Documentation, Medical University, Graz, Austria 3 Institute of Medical Biometry and Medical Informatics, University Medical Center, Freiburg, Germany ABSTRACT the National Cancer Institute (NCI) Thesaurus3 - but both Animals are classified by any number of various characteristics are inadequate for representing animals in practical use. The including Linnaean rank, physiologic features, purpose and place. The Animals in Context Ontology (ACO) was developed by editing a reason is that both lack a structure and mechanism for repre- subset of the Systematized Nomenclature of Medicine Clinical senting animal classes with non-Linnaean defining charac- Terms (SNOMED-CT©) to follow the Open Biological and Biomedical teristics such as sex and production role. In addition, these Ontologies (OBO) Foundry Principles. It includes animals classified ontologies contain some imprecise classes unsuitable for use by Linnaean ranking as well as practical uses that are of interest to science, medicine and agriculture. ACO was built within the onto- in animal production and husbandry. logical framework of the Basic Formal Ontology (BFO) and the Rela- The Animals in Context Ontology (ACO)4 was devel- tions Ontology (RO) and uses classes from other ontologies includ- oped to fill this need for identifying animals in extra- ing the Phenotypic Quality Ontology (PATO), the National Center for Linnaean ways. In this paper we describe the development Biotechnology Information (NCBI) Taxonomy, the Environment On- tology (EnvO), and the Gene Ontology (GO). ACO includes 216 of ACO, the resulting ontology, and future work. unique classes in an OWL format. Availability: http://vtsl.vetmed.vt.edu/aco/Ontology/aco.zip. 2 METHOD 1 INTRODUCTION 2.1 Source for ACO Development Animal classification needs vary by user and purpose. A subset of animal classes was previously developed us- The Linnaean hierarchy is the international standard for ing the organism hierarchy of the Veterinary Terminology animal nomenclature. However, it has notable shortcomings Services Laboratory (VTSL)5 extension of the Systematized for many applications. It lacks the common identifying Nomenclature of Medicine Clinical Terms (SNOMED- characteristics for sex, production role such as meat or milk CT©)6, a large, international medical terminology. The sub- for human food, age, diet and living environment necessary set was originally populated from animals needed by the to describe many animals that are subjects in scientific re- United States Food and Drug Administration’s Center for search, patients in veterinary clinics, and animals in produc- Veterinary Medicine (FDA CVM) and the United States tion units such as farms. At the same time, it is too specific Department of Agriculture - Animal and Plant Health In- for some common animal classes which do not correspond spection Services - Veterinary Services (USDA APHIS VS) with a single Linnaean taxonomic equivalent and which and was stored in VTSL’s database. Current known users could refer to more than one taxonomic group. For example, of a portion of the animal classes include two branches of in the United States, “cattle” could refer to Bos taurus or USDA APHIS VS for animal disease surveillance and the Bison-Bos taurus crosses. Elsewhere, “cattle” might refer to Virginia Department of Health for rabies reporting. non-Bos taurus species. However, all cattle throughout the The current organism hierarchy in SNOMED-CT core world are members of Bovinae that have some use. does not contain any non-taxonomic defining relationships; An ontology that represents the way animals in practical however, the organism classes in the VTSL extension were uses are described, from “Cattle” to “Beef heifer raised in defined using additional characteristics including sex, age confinement,” is needed for various applications. These group, production role, and taxonomic rank. The subset had applications include vaccine and drug labels, gene set map- a stated poly-hierarchical structure so animals could be clas- ping, species preservation, and veterinary medical records. sified by taxonomy (e.g., Bovinae) and common role group- There are two ontologies listed on the Open Biological and Biomedical Ontologies (OBO) Foundry1 (Smith et al., 2007) ing (e.g., Food animal), but lacked text definitions. website that carry animal classifications - the National Cen- ter for Biotechnology Information (NCBI) Taxonomy2 and 3 http://www.cancer.gov/cancertopics/cancerlibrary/terminologyresources * Address correspondence to: slsantamaria@vt.edu. 4 http://vtsl.vetmed.vt.edu/aco 1 5 http://obofoundry.org http://vtsl.vetmed.vt.edu 2 6 http://www.ncbi.nlm.nih.gov/taxonomy http://www.ihtsdo.org/snomed-ct 1 Santamaria et al. ACO. See Table 1 for a summary of the classes in ACO. 2.2 Importing External Ontology Classes Collaboration with OBO members was necessary in some ACO imports the upper level ontologies Basic Formal instances to pick the appropriate classes from external on- Ontology (BFO)7 and BioTopLite8 as well as a bridge be- tologies and to learn the OntoFox program. tween them. BioTopLite was chosen because it is a top- domain ontology for biomedicine, and because it includes Ontology No. Use in ACO Example numerous object properties (relations), some of them mapped to the Relations from the OBO Relations Ontolo- ACO 216 Practical animal Female adult horse gy,9 together with numerous constraints such as do- classes main/range restrictions. See Figure 1 for placements of ACO 58 Taxon qualities Subfamily bovinae ACO classes in BioTopLite. quality ACO 12 Roles Produces milk for human food ACO 1 Disposition Disposition to rumi- nate Imported Full Ontologies Basic Formal 39 Upper level Process Ontology (BFO) hierarchy BioTopLite 49 Upper level hier- participates in archy and relations BioTopLite-BFO 39 Connects BFO and bridge BioTopLite From External Ontologies NCBI Taxonomy 40 Organism taxonomy Bovinae (Linnaean) Gene Ontology (GO) 7 Biological processes Lactation Environment Ontology 10 Environment sites Aquatic habitat (EnvO) Phenotypic Quality 13 Phenotypic qualities Female Fig. 1. Placement of ACO classes in an upper level ontology. Ontology (PATO) Dashed boxes contain BioTopLite classes. Bold boxes contain Table 1. Summary of classes created for or imported into ACO. classes imported from other ontologies and the thin lined boxes are classes in ACO. Image composed with CmapTools. The taxdemo ontology10 (Schulz, et al., 2008) had been 2.3 ACO-Specific Classes proposed as an example of how to build organism ontolo- We developed ACO following the OBO Foundry princi- gies that refer to biological taxa. The basic idea had been to ples. ACO specific classes and related definitions were en- represent taxa as qualities, which can inhere in populations, tered manually into the Protégé 4.1 ontology editor.16 All in a single organism, as well as in organism parts. For the classes unique to ACO are given URIs. The original purpose of ACO, taxon quality classes were created and SNOMED-CT class identifier has been retained as a cross related to the ACO animal classes as proposed in taxdemo. reference using the alternativeId annotation property. The The Ontology Lookup Service11 and the Bioportal12 were preferred names are in plain English and are mainly singular used to locate appropriate external ontologies for class re- nouns with the exception of cattle (explained in Discussion). use. We used OntoFox13 to create files to import classes The preferred description in SNOMED-CT was used as the from external ontologies such as the Phenotypic Quality preferred name in ACO. The scope includes classes of those Ontology (PATO)14 and the Gene Ontology (GO)15 into animals that are put to practical use. Text definitions in the genus-species differentia format were added with the 7 http://www.ifomis.org/bfo hasDefinition annotation property for each ACO specific 8 http://purl.org/biotop class. ACO uses a common shared syntax of OWL-DL. De- 9 http://obofoundry.org/ro scription logic definitions were added for most of the classes 10 http://purl.org/biotop/taxdemo/dev 11 http://www.ebi.ac.uk/ontology-lookup/ 12 http://bioportal.bioontology.org 13 15 http://ontofox.hegroup.org http://www.geneontology.org 14 16 http://www.obofoundry.org/cgi-bin/detail.cgi?id=quality http://protege.stanford.edu 2 Developing the Animals in Context Ontology in ACO. Appropriate, commonly used synonyms were add- intermediates. The following rules were used. Refer to Fig- ed using the hasExactSynonym annotation property. ure 2 for an illustration of the examples. 3 RESULTS (1) ACO animal class is a subclass of the most distal ACO contains 510 classes, 286 of which are unique to NCBI Taxonomy class that includes all members of ACO. See Table 1 for a listing of classes by ontology. We animal class. As shown in Figure 2, the ACO class Antelope is a subclass of Bovidae because that is the imported classes from external ontologies to avoid duplica- most distal NCBI Taxonomy class that denotes all tion of existing content. Table 2 shows an example of a members considered antelope by mammalogists and class from ACO and its associated axioms. taxonomists of authority such as Mammal Species Preferred name: Castrated male cattle for beef production of the World17 (Four-horned antelope in Bovinae, Synonym: “Beef steer” grey rhebok of Peleinae, etc.). Bovidae is imported as a direct child of Metazoa. Text definition: “Beef cattle which are male and castrated” Formal definition: (2) If two needed NCBI Taxonomy classes are part of a equivalentTo Cattle for beef production and natural hierarchy in the NCBI Taxonomy, they are bearer of some Castrated male quality imported retaining the hierarchy. In Figure 2, the Inherited: ACO class Cattle is a subclass of Bovinae from the subClassOf bearer of some Subfamily bovinae quality NCBI Taxonomy because that is the most distal subClassOf bearer of some Disposition to ruminate NCBI Taxonomy class that includes all members Table 2. Example of ACO class. For brevity, many of the inherited considered cattle by taxonomists throughout the anonymous classes are excluded from this table. world (Bos taurus, Bison, etc.). Because Bovidae is needed for a different ACO class (Antelope as de- 3.1 ACO Top Structure scribed above), Bovinae is imported as a child of The animal classes in ACO denote descendents of Bovidae, which is imported as a direct child of Metazoa from the NCBI Taxonomy. Metazoa corresponds Metazoa. with Kingdom Animalia and is the class that encompasses (3) If NCBI Taxonomy does not include the most distal all potential animal classes in ACO. Metazoa imports as a taxonomic ancestor known that includes all mem- direct child of Organism in BioTopLite. Originally, we im- bers taxonomists consider to be a member of the ported all classes from the needed distal taxonomic class in ACO animal class, then we created the needed distal NCBI Taxonomy (superclass of an ACO-specific class sub- class in ACO and imported the most distal NCBI class) to Metazoa in NCBI Taxonomy. This included many Taxonomy class that subsumes this needed distal intermediate classes and a mixture of Linnaean and cladistic class as a direct child of Metazoa. As shown in Fig- classes that proved unwieldy. We then reimported in ure 2, Suinae is the most distal known taxonomic OntoFox attaching the most distal taxonomic class needed class that includes all species considered to be pigs. from NCBI directly as a child of Metazoa, eliminating the Suinae does not exist in NCBI Taxonomy so the most distal taxonomic ancestor (Suidae) was im- ported from NCBI Taxonomy and a class Suinae was created in ACO as a child of Suidae, which is imported as a direct child of Metazoa. The ACO class Pig is a child of Suinae. BioTopLite Value Ontology Relation Example Source participates in Lactation GO bearer of Female PATO bearer of Produces milk for human food ACO bearer of Subfamily caprinae quality ACO Table 3. Defining relationships for ACO class: Lactating ewe for milk production. The relationship in the first row distinguishes this class from its parent. Other relationships are inherited. Some inher- ited relationships are not shown to due space limitations. Fig. 2. Choice of upper level classes from other ontologies. 17 http://www.vertebrates.si.edu/msw/mswcfapp/msw/index.cfm Organism is from BioTopLite. Bolded boxes are classes imported from NCBI Taxonomy and thin lined boxes are classes in ACO. 3 Santamaria et al. 3.2 ACO Defining Classes cluding the OBO Foundry website, Protégé, OntoFox, the Many classes used in the formal definitions of ACO- Ontology Lookup Service and the NCBI Bioportal. Collabo- specific classes were imported from external ontologies or ration with OBO members was very effective. Multiple were created in ACO but identified as probable additions to people offered their opinions on questions posed to the external ontologies. An appropriate source for the animal listservs. Responses were provided within 24 hours and in roles in external ontologies was not found so they remain in some cases almost immediately. We found that ontologies ACO. Taxon quality classes were created in ACO. See Ta- listed on the OBO website are at varying stages of develop- ble 3 for an example of a formal definition of an ACO class. ment, compliance with OBO principles, and curation level. We encountered several classes that need work and identi- 3.3 Added Classes to Infer Structure fied several necessary additions to the ontologies. ACO has a single isa asserted inheritance structure, ex- An example of a class that could be improved is Pasture pressed by subclass relations in OWL-DL. Animal classifi- in the Environment Ontology (EnvO)18. Its parent is Grass- cation and organization which do not obey a biological tax- land and its text definition is “Grassland used for grazing of onomy is desired for grouping by common classes such as ungulate livestock as part of a farm or ranch.” Pasture can Food animal. This provides useful classification hierarchies consist of grasses or legumes and are not always part of a for the users of the ontology. ACO includes the following managed farm or ranch. There are pastures in certain parts classes that infer members based on formal definitions: An- of the world that are open, public areas. Therefore we sug- imal for breeding, Animal in fiber production, Exhibition gest the EnvO curators should either: 1. edit this class name animal, Aquarium animal, Zoo animal, Food animal, La- to “grassland ranch pasture” and leave the text definition as boratory animal, and Wildlife. ACO classifies with both the is, or 2. move this class from Grassland to Terrestrial habi- Fact++ and HermiT reasoners in Protégé 4.1. tat and edit the text definition to: “Terrestrial habitat used 3.4 General Class Axioms for grazing, foraging or browsing by animals.” ACO includes some general class axioms to further de- No. of fine the animals in roles. See Table 4 below for an example. Additions It shows how animals bearing a certain role can be consid- Ontology Needed Example ered equivalent to animals that participate in certain pro- GO 3 Rumination cesses. EnvO 15 Feedlot bearer of some Produces fiber PATO 5 Castrated male EquivalentTo participates in some NCBI Taxonomy 3 Suinae (Production and (has outcome some Fiber product)) Table 5. Summary of additional classes needed in OBO ontologies Table 4. Shown is the general class axiom for an animal who is for ACO. Some of these additions have been submitted through the bearer of the role class Produces fiber. Fiber product will be re- appropriate tracker. quested as an addition to the Environment Ontology (EnvO). 3.5 Development Time We discovered numerous classes for additions to exist- Discussion of conceptual issues including upper level ing ontologies so other ontologists can draw similar content ontology placement, external ontology classes re-use, and from the same external ontology. See Table 5 for a summary text definition creation took place over the period of one of these additions and the ACO site 19 for a list of all the year. The actual manual creation of the ontology in Protégé needed additions. We believe it is more desirable for the took one month. The linkage to a well-constrained upper- taxon quality classes to be included as formal definitions of level ontology like BioTopLite was of considerable heuristic the NCBI Taxonomy classes rather than included directly in value, due to iterative validation steps using DL classifiers ACO. Since this is a significant and debatable request, we for consistency checking. did not include these in the additions list to NCBI Taxono- 3.6 Availability my. Another option is to interpret the NCBI Taxonomy classes as taxon qualities themselves rather than organisms. ACO is open and available online. However, we did not choose this because NCBI’s documen- tation explicitly states that the taxonomy refers to organisms 4 DISCUSSION and because including ACO classes as subclasses of NCBI ACO was developed as an ontology of animal classes Taxonomy classes enables reasoning and subsumption with within the OBO Foundry framework to maximize resources, other ontologies using the same taxonomic resource. data integration, reusability and interoperability. This proved both challenging and rewarding. Tools to assist with 18 ontology development were available without charge, in- http://environmentontology.org 19 http://code.google.com/p/animalnamesontology/downloads/list 4 Developing the Animals in Context Ontology Animals bearing roles were given additional general Identifying animal information at various levels from class axioms relating their production role to an outcome of breed and utility to Linnaean classification is needed for a specific product. The EnvO class Food product includes various electronic record applications from science to medi- food for human or animal consumption in its text definition, cine. ACO integrates within the Linnaean classification sys- therefore additional EnvO classes specific to products for tem but provides common non-Linnaean groupings such as human consumption (e.g., Egg product for human consump- Duck and extends them to practical animal classes such as tion) are needed to fulfill these axioms. Classes for Wool Duck laying egg for human food. Animal data recorded with product and Fiber product also need to be added to EnvO. ACO classes integrate and interoperate with other OBO- We reviewed each class in ACO to check for compliance based scientific and medical ontologies, allowing for rea- to the OBO Foundry singular noun principle. Three catego- soning and classification of data captured from multiple ries of non-compliance were identified: 1) plural noun sources and with multiple ontologies. This should encourage where singular form exists (“eggs”); 2) single noun and plu- biomedical researchers to access animal science and veteri- ral noun are the same (“deer”); 3) plural noun where singu- nary research as well as production and health records for lar form does not exist (“cattle” and “broodstock”). All clas- comparative analysis purposes including discovering new ses with the plural “eggs” in the preferred name were associations between phenotypic and gene traits. Because it changed to the singular “egg.” All classes with “broodstock” is expensive to build and maintain biomedical ontologies, in the preferred name were edited to include “breeding” collaborating and using common resources may help to de- instead and broodstock terms were retained as synonyms. crease costs associated with ontology development and “Deer” were left as is as there is no exclusive singular form. maintenance. Collaborators from multiple OBO ontologies The issue of a singular form of cattle was presented to the including the Vaccine Ontology have expressed interest in OBO list serve. Multiple suggestions were given and “head using ACO. ACO’s format is more accessible to the broader of cattle” seemed the most logical and accurate of the sug- scientific community while still maintaining its SNOMED- gestions for a singular count noun. Although this is techni- CT subset origin. cally correct, it is not how people engaged in animal hus- bandry or veterinary medicine talk and would violate the 5 FUTURE WORK OBO Foundry principle that preferred terms should be in Community use of ACO will result in the addition of ordinary English as extended by technical terms already classes and other changes needed to improve the ontology. established in the relevant discipline. Therefore, we chose to Future work of the ACO development process includes: 1) keep “cattle” in our singular classes. analyzing representation of animal taxa specific production We built ACO manually because one researcher needed classes like broilers and fryers in chickens and starters, experience in ontology building and using Protégé. An ef- growers, and finishers in pigs; 2) considering formal defini- fective automated transfer method between the SNOMED- tion with Linnaean and other classes for useful grouping CT subset and the ontology in OWL would have decreased classes such as Antelope, Shellfish, Cold blooded animal, some development time. This was investigated superficially Duck and Nonhuman primate; and 3) investigating the need and problems with SNOMED-CT’s description logic and to divide ACO into multiple ontologies. Formal evaluation the extension classes’ use of non-sanctioned relationships in for inclusion into the OBO Foundry, assignment of an OBO SNOMED-CT were encountered. Foundry namespace, documentation development and track- In addition to the improved format and increased inter- er creation are future goals. operability, this development work resulted in improve- ments in the original subset. We identified and corrected ACKNOWLEDGEMENTS simple and logical errors and omissions in the original sub- set. Examples include retiring a class from the original sub- We acknowledge and appreciate funding from the US Food set (Animal in context) because it could be not be instantiat- and Drug Administration’s Center for Veterinary Medicine ed, adding a missing definition of the quality neonatal to (FDA CVM) and the US Department of Agriculture – Ani- Newborn sheep for milk production, and removing a redun- mal and Plant Health Inspection Service – Veterinary Ser- dant parent of Cattle for Cattle on pasture for human food, vices (USDA APHIS VS) in support of this work. leaving Cattle for human food as its only parent. The origi- nal subset classes had a taxon rank attribute and value (“ge- REFERENCES nus” level). This was deprecated and we plan on using the Schulz S., Stenzhorn H., Boeker M. The ontology of biological taxa. Bioin- structure of the taxdemo ontology to communicate taxon formatics. 2008 Jul 1;24(13):i313-21. Smith, B. et al. (2007). The OBO Foundry: coordinated evolution of ontol- quality and rank instead. We added a role of Pre-production ogies to support biomedical data integration. Nat Biotechnol, 25(11), to better define replacement animals and increase the num- 1251–1255. ber of fully defined classes in the subset. 5