Improvements to the Drosophila anatomy ontology Marta Costa*, David Osumi-Sutherland, Steven Marygold and Nick Brown FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK ABSTRACT rons in a variety of model organisms. We, in collaboration The Drosophila anatomy ontology (DAO) defines the broad anat- with the Virtual Fly Brain project, have focused on captur- omy of the fruitfly Drosophila melanogaster, a genetic model organ- ism. It contains over 8700 classes, with close to half of these corre- ing this information in Drosophila. Currently, 46% of the sponding to neuroanatomical terms. DAO comprises terms that are part of the nervous system, We are systematically reviewing the DAO classes, improving the including close to 2300 distinct neuron classes, over 220 textual information and classification. This includes adding defini- neuroblast lineage clones and neuropils. tions, comments and synonyms, as well as formal definitions, which results in a full classification in some cases. Classes belonging to each of the defined organ systems are reviewed together to improve Here, we present our most recent work in improving the consistency of free text and formalisation. So far we have reviewed textual information and formalisation patterns of DAO clas- 7 of the 11 organs system classes, resulting in 83% of classes hav- ses. ing a definition. 1 INTRODUCTION 2 RESULTS The Drosophila anatomy ontology (DAO) (Costa et al., In order to maintain consistency between related terms, we 2013) is an ontology that describes the wild-type anatomy are reviewing existing classes by making use of their current of Drosophila, containing over 8700 classes. It is used by classification into 11 different organ systems, such as the FlyBase (Dos Santos et al., 2015), the gene and genomic tracheal, muscle, adipose, etc. Around 80% of classes in the database for Drosophila, for manual curation of phenotypes DAO had previously been classified as part of an organ sys- and expression patterns. Users are also able to query for this tem previously, thus making it easy to retrieve a list of clas- type of data, either through FlyBase or Virtual Fly Brain ses to review. Work has proceeded class by class, improving (Milyaev et al., 2012). Having an accurate, encompassing both the textual information and formalisation. When neces- and human-readable ontology is therefore essential to enable sary, we have sought advice from expert researchers. curators to choose the correct anatomy term, and for users to easily navigate the data. The systematic review of classes uncovered several cases of redundancy and duplication, which were resolved by obso- When the DAO was first developed over 20 years ago, it did leting one of the terms. For example, the classes Malpighian not include textual information or significant formalisation. tubule Type II cell (see section 2.1) and excretory star cell A large effort has been undertaken in the last 9 years to im- were found to refer to same entity. In this case, the latter prove this situation (Costa et al., 2013). This work has re- was obsoleted, and the name added as a synonym to the sulted in 83% of classes now having a definition and the former. classification having been greatly improved. The DAO cur- rently contains 46 object properties and over 18,000 sub- We have concluded this review for 7 of the 11 organ sys- class axioms, with over 2,500 equivalent class axioms, with tems (muscle, tracheal, reproductive, digestive, circulatory, around 50% of over 10,000 classifications being inferred. excretory and adipose), corresponding to 570 classes. Work is ongoing to complete the remaining (muscle, nervous, en- New DAO classes are curated from the published literature, docrine and sensory). if enough evidence regarding their morphological character- isation, identity and if appropriate, function, are provided. Every class includes a definition, synonyms and comments 2.1 Improving textual information each attributed to a source reference. We have added textual definitions to 83% of DAO classes, an improvement of 10% since October 2013. The definition The neuroanatomy field has grown massively in the last few describes the general classification of the anatomical entity, years thanks to technical advances, enabling researchers to its properties, and when appropriate, any distinguishing identify and characterize the function of individual neurons traits. These statements are supported by references, either and several projects are currently underway to map all neu- cited in the text, or listed at the end with a publication iden- tifier (mostly a FlyBase one: prefix FBrf followed by 7 dig- * To whom correspondence should be addressed: m.costa@gen.cam.ac.uk Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 1 Costa et al. its). In other instances, adding a formal definition allows for full Comments are added when appropriate, for one of two rea- classification of terms. This becomes particularly relevant sons. The first is to provide relevant information relating to for neuroanatomy, a field in which new neuron types are the experimental setup when, for example, investigating the being frequently described. Having a formal definition for a function of a neuron. The second is to clarify the relation- class ensures that new terms are correctly classified, provid- ship between competing nomenclatures. ed that enough information is available, such as develop- mental origin. Synonyms from the published literature are added to each An example of a neuron class that can be fully classified class, together with references. The addition of synonyms based on expression (which identifies this subset of very has particular relevance to anatomy ontologies, for which well studied neurons) and developmental origin is below. competing nomenclatures often exist. This formalisation pattern, or a similar one (excluding only the neuroblast information), was used to define the 104 clas- An example of the textual information for a class is below: ses of adult fruitless neurons. name: Malpighian tubule Type II cell name: adult fruitless aDT-b (female) neuron definition: "Morphologically distinct cell type found intersection_of: FBbt:00005106 ! neuron only in the initial, transitional and main segments of the intersection_of: develops_from FBbt:00050148 ! neu- Malpighian tubules interspersed with Type I cells. Type roblast CREa1 (female) II cells are smaller and flatter than Type I cells, with shorter (main segment) or no (initial region) apical mi- intersection_of: expresses FlyBase:FBgn0004652 ! crovilli. Type II cells originate from a subset of caudal fruitless visceral mesoderm cells that overlie the tubule primor- intersection_of: part_of FBbt:00110416 ! adult fruit- dia as they evert from the hindgut. By stage 15, Type II less aDT-b (female) lineage clone cells have been incorporated in the tubules and adopt epithelial characteristics. In the mature tubules there are on average 110 Type II cells." [FlyBase:FBrf0064792, 3 DISCUSSION FlyBase:FBrf0102373, FlyBase:FBrf0160477, Fly- Base:FBrf0222532] We have reviewed terms that belong to 7 of the 11 organ systems in the DAO, improving the textual information es- comment: These cells are involved in primary urine sential for casual users, and the formalisation necessary to production via the presence of ion channels that allow easily maintain a correct classification and to prevent the chloride and water to enter the tubule lumen (O'Donnell introduction of errors. Reviewing related classes as a group et al., 1998). helps to maintain consistency in the ontology, both in terms synonyms: "excretory star cell" EXACT; "Malpighian of free text and the formalisation patterns used. tubule stellate cell" EXACT [FlyBase:FBrf0030988] Future work will focus on completing the systematic review of the DAO by revising the classes in the remaining 4 organ systems. 2.2 Improving formal definitions In systems such as the tracheal, in which certain structures are repeated in each metameric unit, adding a formal defini- REFERENCES tion to each of these terms significantly increases the ro- Costa, M., Reeve, S., grumbling, G. and Osumi-Sutherland, D. (2013). The bustness of error checking procedures. An example of some Drosophila anatomy ontology. J. Biomedical Semantics, 32-4. of the relationships that are added is below: Dos Santos G, Schroeder AJ, Goodman JL, Strelets VB, Crosby MA, Thurmond J, Emmert DB, Gelbart WM; the FlyBase Consortium. name: adult abdominal spiracular branch (2015). FlyBase: introduction of the Drosophila melanogaster Release intersection_of: FBbt:00003071 ! adult spiracular 6 reference genome assembly and large-scale migration of genome an- notations. branch Nucleic Acids Res. doi: 10.1093/nar/gku1099 intersection_of: connected_to FBbt:00003040 ! adult Milyaev, N., Osumi-Sutherland, D., Reeve, S., Burton, N., Baldock, R. A. and Armstrong, J. D. (2012). The Virtual Fly Brain browser and query lateral trunk interface. Bioinformatics 28, 411-5. intersection_of: connected_to FBbt:00004814 ! adult abdominal spiracle intersection_of: part_of FBbt:00003024 ! adult ab- dominal segment 2 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes