=Paper=
{{Paper
|id=Vol-1515/poster5
|storemode=property
|title=2015 Disease Ontology update: DO's expanded curation activities to connect disease-related data
|pdfUrl=https://ceur-ws.org/Vol-1515/poster5.pdf
|volume=Vol-1515
|dblpUrl=https://dblp.org/rec/conf/icbo/MitrakaS15
}}
==2015 Disease Ontology update: DO's expanded curation activities to connect disease-related data==
2015 Disease Ontology update: DO’s expanded curation activities to connect disease-related data Elvira Mitraka1 and Lynn M. Schriml1,* 1 Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, USA ABSTRACT and visualize the Disease Ontology and disease concept The Human Disease Ontology is a widely used biomedical resource, data. which standardizes and classifies common and rare human diseases. Its latest iteration makes use of the OWL language to facilitate easier The latest version of DO has close to 9,000 disease terms, curation between a variety of working groups and to take advantage of more than 16,000 synonyms and almost 39,000 cross- the analyses available using OWL. The DO integrates disease concepts references to other biomedical resources. Those resources from ICD-‐9, ICD-‐10, the National Cancer Institute Thesaurus, SNOMED-‐ include the ICD-9 and ICD-10, the National Cancer Institute CT, MeSH, OMIM, EFO and Orphanet. The DO Team is focused on ena-‐ bling mapping and curation of large disease datasets for major Bio-‐ (NCI) Thesaurus (Sioutos et al., 2007), SNOMED-CT medical Resource Centers and integration of their disease terms into (Donnelly, 2006) and MeSH (https://www. DO. Constant updates and additions to the ontology allow for coverage nlm.nih.gov/mesh/MBrowser.html) extracted from the Uni- of the vast field of human diseases. By having close collaborations with a variety of research groups, such as MGD, EBI, NCI, the Disease Ontol-‐ fied Medical Language System (UMLS) (Bodenreider, ogy has established itself as the go-‐to tool for human disease curation. 2004) based on the UMLS Concept Unique Identifiers for Implementing a combination of informatic tools and manual curation each disease term. DO also includes disease terms extracted DO ensures that it maintains the highest standard possible. directly from Online Mendelian Inheritance in Man (OMIM) (Ambereger et al., 2011), the Experimental Factor 1 INTRODUCTION Ontology (EFO, http://www.ebi.ac.uk/efo/) and Orphanet The Disease Ontology (DO) (Schriml et al., 2012) is the (Maiella et al., 2013). core disease data resource for the biomedical community. The DO files are available in both OBO and OWL format Human disease data is a cornerstone of biomedical research from DO’s SourceForge site (http://sourceforge.net/p/ dis- for identifying drug targets, connecting genetic variations to easeontology/code/HEAD/tree/trunk) and can be found at phenotypes, understanding molecular pathways relevant to http://purl.obolibrary.org/obo/doid.obo and http: novel treatments and coupling clinical care and biomedical //purl.obolibrary.org/obo/doid.owl. DO’s OBO and OWL research. Consequently, across the multitude of biomedical files are also available from the OBO Foundry resources there is a significant need for a standardized rep- (http://www.obofoundry.org/cgi-bin/detail. cgi?id=disease resentation of human disease to map disease concepts across ontology) and GitHub resources, to connect gene variation to phenotypes and drug (https://github.com/obophenotype/human-disease- targets and to support development of computational tools ontology/tree/master/src/ontology). that will enable robust data analysis and integration. 3 CURRENT WORK 2 CURRENT STATUS Due to the huge amount of data generated at an increasingly DO has proven to be an invaluable genomics and genetic rapid pace, the genomics community is trying to streamline disease data resource used for evaluating and connecting its data processing efforts. Ontologies are an avenue that diverse sets of data, used by diverse curation groups to con- lead to this, but even they can become too big and unwieldy nect human disease to animal models and genomic re- in their effort to capture all available data. There are in- sources and used to informatically identify representative stances where the multitude of information captured is not phenotype sets (Köhler et al., 2014; Schofield et al., 2010), needed. functionally similar genes (Fang and Gough, 2013; Single- The Gene Ontology is one the most widely used ontology ton et al., 2014), human gene and genome annotations (Peng and one of the most comprehensive. It covers the molecular et al., 2013; Osborne et al., 2009), pathways, cancer variants functions, biological processes and location in cellular com- (Wu et al., 2014) and immune epitopes (Vita et al., 2014). ponents of gene products, containing more than 40,000 The DO website (http://www.disease-ontology.org), is a terms. In order to make it more accessible and less resource web-based application that allows users to query, browse, intensive the Gene Ontology Consortium has created slim version of the ontology. These “GO slims” are smaller ver- * To whom correspondence should be addressed: sions of GO that contain only a subset of the terms, repre- lschriml@som.umaryland.edu Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 1 Mitraka et al. senting a general knowledge of a specific field, without go- Maiella,S., Rath,A., Angin,C., Mousson,F. and Kremp,O. (2013) Orphanet ing too deep into the hierarchy. and its consortium: where to find expert-validated information on rare Due to the breadth of its user base the DO team decided diseases. Rev. Neurol. (Paris), 169, S3–S8. Osborne,J.D., Flatow,J., Holko,M,. Lin,S.M., Kibbe,W.A., Zhu,L.J., Da- to create its own slim files, the DO Cancer Slim (Wu et al., nila,M.I., Feng,G. and Chisholm,R.L. (2009) Annotating the human ge- 2015) being the most prominent, containing terms needed nome with Disease Ontology. BMC Genomics. 10 Suppl 1:S6. by the pan-cancer community. In the same vein a DO MGI Peng,K., Xu,W., Zheng,J., Huang,K., Wang,H., Tong,J., Lin,Z., Liu,J., slim is being created. It contains all the terms that were Cheng,W., Fu,D., Du,P., Kibbe,W.A., Lin,S.M. and Xia,T. (2013) The modified or created during an intensive curatorial effort to Disease and Gene Annotations (DGA): an annotation resource for hu- map concepts between DO and OMIM. It will give insight man disease. Nucleic Acids Res. 41:D553-D560. into the overlap between DO and OMIM, as well as which Schofield,P.N., Gkoutos,G.V., Gruenberger,M., Sundberg,J.P. and Han- disease types are more heavily featured in the MGD. Mean- cock, J.M. (2010) Phenotype ontologies for mouse and man: bridging ing it can give even more information about which diseases the semantic gap. Dis. Model Mech., 3:281–289. do not have a mouse model to represent them. Schriml,L.M., Arze,C., Nadendla,S., Chang,Y.W., Mazaitis,M., Felix,V., Feng,G. and Kibbe,W.A. (2012) Disease Ontology: a backbone for dis- ease semantic integration. Nucleic Acids Res., 40, D940–D946. 4 UPCOMING WORK Singleton MV, Guthery SL, Voelkerding KV, Chen K, Kennedy B, Mar- Future plans include the definition of all disease terms in graf RL, Durtschi J, Eilbeck K, Reese MG, Jorde LB, Huff CD, Yandell DO and the creation of DO slims for every major curation M (2014) Phevor combines multiple biomedical ontologies for accurate project of all the MODs. These slims will enable DO users identification of disease-causing alleles in single individuals and small to review the representation and classification of MOD as- nuclear families. Am. J. Hum. Genet., 94:599-610. sociated diseases, to compare the diseases represented be- Sioutos,N., de Coronado,S., Ha.ber,M.W., Hartel,F.W., Shaiu,W.L. and tween MODs and to compare the different animal models Wright,L.W. (2007) NCI Thesaurus: a semantic model integrating can- associated with a particular disease or types of diseases cer-related clinical and molecular information. J. Biomed. Inform., 40, 30–43. across species. Vita,R., Overton,J.A., Greenbaum,J.A., Ponomarenko,J., Clark,J.D., Cantrell,J.R., Wheeler,D.K., Gabbard,J.L., Hix,D., Sette,A. and Pe- ACKNOWLEDGEMENTS ters,B (2014) The immune epitope database (IEDB) 3.0. Nucleic Acids This work was supported in part by the National Institute of Res. 43:D405-D412 Health – National Center for Research Resources Wu,T.J., Shamsaddini,A., Pan,Y., Smith,K., Crichton,D.J., Simonyan,V. (R01RR025341) and NIH/NIGMS (R01 GM 089820-06). and Mazumder,R. (2014) A framework for organizing cancer-related variations from existing databases, publications and NGS data using a REFERENCES High-performance Integrated Virtual Environment (HIVE). Database (Oxford), bau:022. Alexandrescu,A. (2001) Modern C++ Design: Generic Programming and Wu,T.J., Schriml,L.M., Chen, Q.R., Colbert,M., Crichton,D.J., Finney,R., Design Patterens Applied. Addision Wesley Professional, Boston. Hu,Y., Kibbe,W.A., Kincaid,H., Meerzaman,D., Mitraka,E., Pan,Y., Amberger,J., Bocchini,C. and Hamosh,A. (2011) A new face and new Smith,K.M., Srivastava,S., Ward,S., Yan,C. and Mazumder,R. (2015) challenges for Online Mendelian Inheritance in Man (OMIM). Hum. Generating a focused view of disease ontology cancer terms for pan- Mutat., 32, 564–567 cancer data integration and analysis. Database (Oxford), bav:032. Bodenreider,O. (2004) The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res., 32, D267– D270. Donnelly,K. (2006) SNOMED-CT: the advanced terminology and coding system for eHealth. Stud. Health Technol. Inform., 121, 279–290. Fang, H. and Gough,J. (2013) DcGO: database of domain-centric ontolo- gies on functions, phenotypes, diseases and more. Nucleic Acids Res. 41:D536-D544. Köhler,S., Doelken,S.C., Mungall,C.J., Bauer,S., Firth,H.V., Bailleul- Forestier,I., Black,G.C., Brown,D.L., Brudno,M., Campbell,J., FitzPat- rick,D.R., Eppig,J.T., Jackson,A.P., Freson,K., Girdea,M., Helbig,I., Hurst,J.A., Jähn,J., Jackson,L.G., Kelly,A.M., Ledbetter,D.H., Mansour,S., Martin,C.L., Moss,C., Mumford,A., Ouwehand,W.H., Park,S.M., Riggs,E.R., Scott,R.H., Sisodiya,S., Van Vooren,S., Wap- ner,R.J., Wilkie,A.O., Wright,C.F., Vulto-van Silfhout,A.T., de Leeuw,N., de Vries,B.B., Washingthon,N.L., Smith,C.L., Wester- field,M., Schofield,P., Ruef,B.J., Gkoutos,G.V., Haendel,M., Smedle, D., Lewis,S.E. and Robinson,P.N. (2014) The Human Phenotype On- tology project: linking molecular biology and disease through pheno- type data. Nucleic Acids Res. 4:D966-D974. 2 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes