=Paper= {{Paper |id=Vol-1515/poster5 |storemode=property |title=2015 Disease Ontology update: DO's expanded curation activities to connect disease-related data |pdfUrl=https://ceur-ws.org/Vol-1515/poster5.pdf |volume=Vol-1515 |dblpUrl=https://dblp.org/rec/conf/icbo/MitrakaS15 }} ==2015 Disease Ontology update: DO's expanded curation activities to connect disease-related data== https://ceur-ws.org/Vol-1515/poster5.pdf
 2015 Disease Ontology update: DO’s expanded curation activities
                to connect disease-related data
                                                                      Elvira Mitraka1 and Lynn M. Schriml1,*
                         1
                             Institute	
  for	
  Genome	
  Science,	
  University	
  of	
  Maryland	
  School	
  of	
  Medicine,	
  Baltimore,	
  MD,	
  USA	
  
                                                                                             	
  


ABSTRACT                                                                                                                  and visualize the Disease Ontology and disease concept
        The	
   Human	
   Disease	
   Ontology	
   is	
   a	
   widely	
   used	
   biomedical	
   resource,	
            data.
which	
   standardizes	
   and	
   classifies	
   common	
   and	
   rare	
   human	
   diseases.	
  
Its	
   latest	
   iteration	
   makes	
   use	
   of	
   the	
   OWL	
   language	
   to	
   facilitate	
   easier	
         The latest version of DO has close to 9,000 disease terms,
curation	
  between	
  a	
  variety	
  of	
  working	
  groups	
  and	
  to	
  take	
  advantage	
  of	
                  more than 16,000 synonyms and almost 39,000 cross-
the	
  analyses	
  available	
  using	
  OWL.	
  The	
  DO	
  integrates	
  disease	
  concepts	
                         references to other biomedical resources. Those resources
from	
  ICD-­‐9,	
  ICD-­‐10,	
  the	
  National	
  Cancer	
  Institute	
  Thesaurus,	
  SNOMED-­‐
                                                                                                                          include the ICD-9 and ICD-10, the National Cancer Institute
CT,	
  MeSH,	
  OMIM,	
  EFO	
  and	
  Orphanet.	
  The	
  DO	
  Team	
  is	
  focused	
  on	
  ena-­‐
bling	
   mapping	
   and	
   curation	
   of	
   large	
   disease	
   datasets	
   for	
   major	
   Bio-­‐             (NCI) Thesaurus (Sioutos et al., 2007), SNOMED-CT
medical	
   Resource	
   Centers	
   and	
   integration	
   of	
   their	
   disease	
   terms	
   into	
                (Donnelly,       2006)      and      MeSH       (https://www.
DO.	
   Constant	
   updates	
   and	
   additions	
   to	
   the	
   ontology	
   allow	
   for	
   coverage	
           nlm.nih.gov/mesh/MBrowser.html) extracted from the Uni-
of	
  the	
  vast	
  field	
  of	
  human	
  diseases.	
  By	
  having	
  close	
  collaborations	
  with	
  
a	
  variety	
  of	
  research	
  groups,	
   such	
   as	
   MGD,	
   EBI,	
   NCI,	
  the	
   Disease	
   Ontol-­‐      fied Medical Language System (UMLS) (Bodenreider,
ogy	
  has	
  established	
  itself	
  as	
  the	
  go-­‐to	
  tool	
  for	
  human	
  disease	
  curation.	
             2004) based on the UMLS Concept Unique Identifiers for
Implementing	
   a	
   combination	
   of	
   informatic	
   tools	
   and	
   manual	
   curation	
                      each disease term. DO also includes disease terms extracted
DO	
  ensures	
  that	
  it	
  maintains	
  the	
  highest	
  standard	
  possible.	
  
                                                                                                                          directly from Online Mendelian Inheritance in Man
                                                                                                                          (OMIM) (Ambereger et al., 2011), the Experimental Factor
1       INTRODUCTION                                                                                                      Ontology (EFO, http://www.ebi.ac.uk/efo/) and Orphanet
The Disease Ontology (DO) (Schriml et al., 2012) is the                                                                   (Maiella et al., 2013).
core disease data resource for the biomedical community.                                                                      The DO files are available in both OBO and OWL format
Human disease data is a cornerstone of biomedical research                                                                from DO’s SourceForge site (http://sourceforge.net/p/ dis-
for identifying drug targets, connecting genetic variations to                                                            easeontology/code/HEAD/tree/trunk) and can be found at
phenotypes, understanding molecular pathways relevant to                                                                  http://purl.obolibrary.org/obo/doid.obo        and        http:
novel treatments and coupling clinical care and biomedical                                                                //purl.obolibrary.org/obo/doid.owl. DO’s OBO and OWL
research. Consequently, across the multitude of biomedical                                                                files are also available from the OBO Foundry
resources there is a significant need for a standardized rep-                                                             (http://www.obofoundry.org/cgi-bin/detail. cgi?id=disease
resentation of human disease to map disease concepts across                                                               ontology)                      and                     GitHub
resources, to connect gene variation to phenotypes and drug                                                               (https://github.com/obophenotype/human-disease-
targets and to support development of computational tools                                                                 ontology/tree/master/src/ontology).
that will enable robust data analysis and integration.
                                                                                                                          3   CURRENT WORK
2       CURRENT STATUS                                                                                                    Due to the huge amount of data generated at an increasingly
DO has proven to be an invaluable genomics and genetic                                                                    rapid pace, the genomics community is trying to streamline
disease data resource used for evaluating and connecting                                                                  its data processing efforts. Ontologies are an avenue that
diverse sets of data, used by diverse curation groups to con-                                                             lead to this, but even they can become too big and unwieldy
nect human disease to animal models and genomic re-                                                                       in their effort to capture all available data. There are in-
sources and used to informatically identify representative                                                                stances where the multitude of information captured is not
phenotype sets (Köhler et al., 2014; Schofield et al., 2010),                                                             needed.
functionally similar genes (Fang and Gough, 2013; Single-                                                                    The Gene Ontology is one the most widely used ontology
ton et al., 2014), human gene and genome annotations (Peng                                                                and one of the most comprehensive. It covers the molecular
et al., 2013; Osborne et al., 2009), pathways, cancer variants                                                            functions, biological processes and location in cellular com-
(Wu et al., 2014) and immune epitopes (Vita et al., 2014).                                                                ponents of gene products, containing more than 40,000
The DO website (http://www.disease-ontology.org), is a                                                                    terms. In order to make it more accessible and less resource
web-based application that allows users to query, browse,                                                                 intensive the Gene Ontology Consortium has created slim
                                                                                                                          version of the ontology. These “GO slims” are smaller ver-
* To whom correspondence should be addressed:                                                                             sions of GO that contain only a subset of the terms, repre-
lschriml@som.umaryland.edu



    Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                                                                1
Mitraka et al.



senting a general knowledge of a specific field, without go-                Maiella,S., Rath,A., Angin,C., Mousson,F. and Kremp,O. (2013) Orphanet
ing too deep into the hierarchy.                                               and its consortium: where to find expert-validated information on rare
   Due to the breadth of its user base the DO team decided                     diseases. Rev. Neurol. (Paris), 169, S3–S8.
                                                                            Osborne,J.D., Flatow,J., Holko,M,. Lin,S.M., Kibbe,W.A., Zhu,L.J., Da-
to create its own slim files, the DO Cancer Slim (Wu et al.,
                                                                               nila,M.I., Feng,G. and Chisholm,R.L. (2009) Annotating the human ge-
2015) being the most prominent, containing terms needed
                                                                               nome with Disease Ontology. BMC Genomics. 10 Suppl 1:S6.
by the pan-cancer community. In the same vein a DO MGI
                                                                            Peng,K., Xu,W., Zheng,J., Huang,K., Wang,H., Tong,J., Lin,Z., Liu,J.,
slim is being created. It contains all the terms that were                     Cheng,W., Fu,D., Du,P., Kibbe,W.A., Lin,S.M. and Xia,T. (2013) The
modified or created during an intensive curatorial effort to                   Disease and Gene Annotations (DGA): an annotation resource for hu-
map concepts between DO and OMIM. It will give insight                         man disease. Nucleic Acids Res. 41:D553-D560.
into the overlap between DO and OMIM, as well as which                      Schofield,P.N., Gkoutos,G.V., Gruenberger,M., Sundberg,J.P. and Han-
disease types are more heavily featured in the MGD. Mean-                      cock, J.M. (2010) Phenotype ontologies for mouse and man: bridging
ing it can give even more information about which diseases                     the semantic gap. Dis. Model Mech., 3:281–289.
do not have a mouse model to represent them.                                Schriml,L.M., Arze,C., Nadendla,S., Chang,Y.W., Mazaitis,M., Felix,V.,
                                                                               Feng,G. and Kibbe,W.A. (2012) Disease Ontology: a backbone for dis-
                                                                               ease semantic integration. Nucleic Acids Res., 40, D940–D946.
4    UPCOMING WORK
                                                                            Singleton MV, Guthery SL, Voelkerding KV, Chen K, Kennedy B, Mar-
Future plans include the definition of all disease terms in                    graf RL, Durtschi J, Eilbeck K, Reese MG, Jorde LB, Huff CD, Yandell
DO and the creation of DO slims for every major curation                       M (2014) Phevor combines multiple biomedical ontologies for accurate
project of all the MODs. These slims will enable DO users                      identification of disease-causing alleles in single individuals and small
to review the representation and classification of MOD as-                     nuclear families. Am. J. Hum. Genet., 94:599-610.
sociated diseases, to compare the diseases represented be-                  Sioutos,N., de Coronado,S., Ha.ber,M.W., Hartel,F.W., Shaiu,W.L. and
tween MODs and to compare the different animal models                          Wright,L.W. (2007) NCI Thesaurus: a semantic model integrating can-
associated with a particular disease or types of diseases                      cer-related clinical and molecular information. J. Biomed. Inform., 40,
                                                                               30–43.
across species.
                                                                            Vita,R., Overton,J.A., Greenbaum,J.A., Ponomarenko,J., Clark,J.D.,
                                                                               Cantrell,J.R., Wheeler,D.K., Gabbard,J.L., Hix,D., Sette,A. and Pe-
ACKNOWLEDGEMENTS                                                               ters,B (2014) The immune epitope database (IEDB) 3.0. Nucleic Acids
This work was supported in part by the National Institute of                   Res. 43:D405-D412
Health – National Center for Research Resources                             Wu,T.J., Shamsaddini,A., Pan,Y., Smith,K., Crichton,D.J., Simonyan,V.
(R01RR025341) and NIH/NIGMS (R01 GM 089820-06).                                and Mazumder,R. (2014) A framework for organizing cancer-related
                                                                               variations from existing databases, publications and NGS data using a
REFERENCES                                                                     High-performance Integrated Virtual Environment (HIVE). Database
                                                                               (Oxford), bau:022.
Alexandrescu,A. (2001) Modern C++ Design: Generic Programming and
                                                                            Wu,T.J., Schriml,L.M., Chen, Q.R., Colbert,M., Crichton,D.J., Finney,R.,
   Design Patterens Applied. Addision Wesley Professional, Boston.
                                                                               Hu,Y., Kibbe,W.A., Kincaid,H., Meerzaman,D., Mitraka,E., Pan,Y.,
Amberger,J., Bocchini,C. and Hamosh,A. (2011) A new face and new
                                                                               Smith,K.M., Srivastava,S., Ward,S., Yan,C. and Mazumder,R. (2015)
   challenges for Online Mendelian Inheritance in Man (OMIM). Hum.
                                                                               Generating a focused view of disease ontology cancer terms for pan-
   Mutat., 32, 564–567
                                                                               cancer data integration and analysis. Database (Oxford), bav:032.
Bodenreider,O. (2004) The Unified Medical Language System (UMLS):
   integrating biomedical terminology. Nucleic Acids Res., 32, D267–
   D270.
Donnelly,K. (2006) SNOMED-CT: the advanced terminology and coding
   system for eHealth. Stud. Health Technol. Inform., 121, 279–290.
Fang, H. and Gough,J. (2013) DcGO: database of domain-centric ontolo-
   gies on functions, phenotypes, diseases and more. Nucleic Acids Res.
   41:D536-D544.
Köhler,S., Doelken,S.C., Mungall,C.J., Bauer,S., Firth,H.V., Bailleul-
   Forestier,I., Black,G.C., Brown,D.L., Brudno,M., Campbell,J., FitzPat-
   rick,D.R., Eppig,J.T., Jackson,A.P., Freson,K., Girdea,M., Helbig,I.,
   Hurst,J.A., Jähn,J., Jackson,L.G., Kelly,A.M., Ledbetter,D.H.,
   Mansour,S., Martin,C.L., Moss,C., Mumford,A., Ouwehand,W.H.,
   Park,S.M., Riggs,E.R., Scott,R.H., Sisodiya,S., Van Vooren,S., Wap-
   ner,R.J., Wilkie,A.O., Wright,C.F., Vulto-van Silfhout,A.T., de
   Leeuw,N., de Vries,B.B., Washingthon,N.L., Smith,C.L., Wester-
   field,M., Schofield,P., Ruef,B.J., Gkoutos,G.V., Haendel,M., Smedle,
   D., Lewis,S.E. and Robinson,P.N. (2014) The Human Phenotype On-
   tology project: linking molecular biology and disease through pheno-
   type data. Nucleic Acids Res. 4:D966-D974.



2                             Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes