Supporting database annotations and beyond with the Evidence & Conclusion Ontology (ECO) Marcus C. Chibucos1*, Suvarna Nadendla1, James B. Munro1, Elvira Mitraka1, Dustin Olley1, Nicole A. Vasilevsky2, Matthew H. Brush2, Michelle Giglio1 1 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD United States of America 2 Ontology Development Group, Library, Oregon Health & Science University, Portland, OR United States of America *Corresponding author: mchibucos@som.umaryland.edu; (410) 705-0885; 801 W. Baltimore St., Baltimore, MD, 21201 Abstract—The Evidence & Conclusion Ontology (ECO) is a supported by systematically describing evidence. Because ECO community standard for summarizing evidence in scientific terms are ontology terms, they contain standard definitions and research in a controlled, structured way. Annotations at the are networked using defined relationships. Thus, associating world's most frequented biological databases (e.g. model research data with descriptions of evidence using ECO can organisms, UniProt, Gene Ontology) are supported using ECO allow, for example, faceted queries of large datasets and terms. ECO describes evidence derived from experimental and computational methods, author statements curated from the implementations of customized quality control mechanisms. literature, inferences drawn by curators, and other types of II. ESSENTIALS OF ECO evidence. Here, we describe recent ECO developments and collaborations, most notably: (i) a new ECO website containing A. Basic ECO structure user documentation, up-to-date news, and visualization tools; (ii) improvements to the ontology structure; (iii) implementing logic As depicted in Fig. 1, ECO comprises two high-level classes, via an ongoing collaboration with the Ontology for Biomedical ‘evidence’ (ECO:0000000) & ‘assertion method’ (ECO:0000217). Investigations (OBI); (iv) addition of numerous experimental The definition of ‘evidence’ is “a type of information that is used evidence types; and (v) addition of new evidence classes describing to support an assertion” and ‘assertion method’ is defined as “a computationally derived evidence. Due to its utility, popularity, means by which a statement is made about an entity” [1]. and simplicity, ECO is now expanding into realms beyond the Together ‘evidence’ and ‘assertion method’ can be combined to protein annotation community, for example the biodiversity and describe both the support for an assertion and whether the phenotype communities. As ECO continues to grow as a resource, assertion was generated by manual or automatic means. ECO we are seeking new users and new use cases, with the hope that terms descend mainly from the ‘evidence’ hierarchy. However, ECO will continue to be a broadly used and easy-to-implement ‘evidence’ leaf terms are related to the ‘assertion method’ terms community standard for representing evidence in diverse by the ‘used_in’ relationship. Thus, one can assert not only what biological applications. Feel free to visit two ECO-sponsored evidence is used to support a particular assertion, but also workshops at ICBO 2016 to learn more: 1. “An introduction to the whether the assertion was made by a human being or a computer Evidence and Conclusion Ontology and representing evidence in (Fig. 1). scientific research” and 2. “OBI-ECO Interactions & Evidence”. B. Traditional uses of ECO Keywords—annotation; biodiversity; biomedical investigation; Some traditional example applications of ECO are found in conclusion; confidence; curation; evidence; experimental evidence; uses by the Gene Ontology [3]: (a) hierarchical ECO classes are inference; provenance; sequence similarity. used to support structured data queries; (b) when a protein is I. INTRODUCTION annotated based on sequence similarity to another annotated protein, the identity of that protein must be recorded in the The Evidence & Conclusion Ontology (ECO) [1] summarizes annotation file along with the evidence from ECO; (c) quality types of scientific evidence associated with biological research. control assessment can be enforced by only allowing certain Evidence can arise from laboratory experiments, computational annotations to terms from a given ontology to be supported by methods, manual literature curation, or other means. particular evidence types—lest such annotations be flagged for Researchers, biocurators, and database managers use this review; and (d) circular annotations based on computational evidence to justify their conclusions and support resulting predictions alone can be determined, and thus avoided. In the assertions, for example stating that a given protein has a ways described above, ECO has been used by many databases (e.g. UniProt, model organisms, Gene Ontology, et cetera) to particular function. support protein annotations. However, ECO has additional uses. Summarizing evidence with ECO allows projects such as the UniProt-Gene Ontology Annotation (UniProt- GOA) project [2] C. Recent ECO term development to manage large volumes of annotations in a convenient fashion, A growing number of resources/applications use ECO (more as both data management and query applications are than 40 of which we are aware). ECO has recently expanded its This work is supported by funding from the National Science Foundation Division of Biological Infrastructure under award number 1458400. Fig. 1. ECO root classes and combinatorial terms. Leaf terms depicted are logically defined as the ‘evidence’ parent class (‘match to InterPro member signature evidence’) related to the ‘assertion method’ class via the ‘used_in’ relationship (gray boxes). evidence representation through collaborations with many Specific examples of these will be addressed at the ICBO 2016 groups, for example: IntAct [4] (biological system workshop titled “An introduction to the Evidence and reconstruction), CollecTF [5] (motif prediction), Ontology of Conclusion Ontology and representing evidence in scientific Microbial Phenotypes [6] (microbial assays), Planteome research” (workshop W11) and new users and adopters are (http://planteome.org; genotype-phenotype associations), Gene especially encouraged to attend to learn more. Ontology [3] (logical inference & synapse research techniques), ACKNOWLEDGMENT SwissProt [7] (diverse experimental assays), and UniProt [2,7] The authors acknowledge the Ontology for Biomedical (detection techniques). Investigations (OBI) Consortium and, in particular, Bjoern III. THE FUTURE OF ECO Peters for ongoing collaboration with ECO. We thank Christian J. Stoeckert, Jr. and Jie Zheng for co-organizing the ICBO 2016 A. Increasing the logic within ECO workshop W08 titled “OBI-ECO Interactions & Evidence.” In May 2016, 14 people met in person at the Institute for REFERENCES Genome Sciences in Baltimore, MD, while approximately seven others joined remotely, to discuss modeling scientific research [1] M.C. Chibucos, C.J. Mungall, R. Balakrishnan, K.R. Christie, R.P. Huntley, O. White, J.A. Blake, S.E. Lewis, and M. Giglio, “Standardized evidence [8]. An objective of the meeting, titled “OBI-ECO description of scientific evidence using the Evidence Ontology (ECO),” Baltimore 2016: Evidence,” was to devise strategies for cross- Database (Oxford), v.2014:bau075, 2014. ontology coordination between ECO and the Ontology for [2] E.C. Dimmer, R.P. Huntley, Y. Alam-Faruque, T. Sawford, C. Biomedical Investigations (OBI) [9]. One decided outcome of O'Donovan, M.J. Martinet, et al., “The UniProt-GO Annotation database the meeting was to logically define ECO ‘experimental in 2011,” Nucleic Acids Res., 40, D565–D570, 2012. evidence’ classes using OBI classes. This work has been under [3] The Gene Ontology Consortium, “Gene Ontology Consortium: going way, and a cataloging of issues and areas for development in forward,” Nucleic Acids Res., 43(Database issue):D1049-1056, 2015. both ontologies has been undertaken. Followup discussions and [4] B.H.M. Meldal, O. Forner-Martinez, M.C. Costanzo, J. Dana, J. Demeter, a review of this ongoing work will take place at ICBO 2016 at M. Dumousseau, et al., “The complex portal – an encyclopaedia of macromolecular complexes,” Nucleic Acids Res., nar.gku975, 2014. workshop W08 titled “OBI-ECO Interactions & Evidence” and [5] S. Kılıç, D.M. Sagitova, S. Wolfish, B. Bely, M. Courtot, S. Ciufo, et al., participation by any interested users is welcome. “From data repositories to submission portals: rethinking the role of domain-specific databases in CollecTF,” Database, v.2016:baw055 2016. B. Beyond protein annotation [6] M.C. Chibucos, A.E. Zweifel, J. Herrera, W. Meza, S. Eslamfam, P. Uetz, Although ECO was originally created circa 2000 to support et al., “An ontology for microbial phenotypes,” BMC Microbiology, 14(1):294, 2014. gene product annotation by the Gene Ontology, today ECO is [7] The Uniprot Consortium, “UniProt: a hub for protein information,” used by many groups concerned with evidence, and even Nucleic Acids Res., 43(Database issue):D204-212, 2015. provenance, in scientific research. While numerous [8] The OBI Consortium, et al., “Cross-community ontological modeling of experimental and computational evidence types have been scientific evidence,” unpublished. added to ECO on behalf of a number of resources (see above and [9] A. Bandrowski, R. Brinkman, M. Brochhausen, M.H. Brush, B. Bug, www.evidenceontology.org), the ECO user base and diversity M.C. Chibucos, et al., “The Ontology for Biomedical Investigations,” PLoS One, 11(4):e0154556, 2016. of applications continues to increase. [10] W.A. Kibbe, C. Arze, V. Felix, E. Mitraka, E. Bolton, G. Fu, et al., Some examples of new/potential ECO users include “Disease Ontology 2015 update: an expanded and updated database of WikiData (https://www.wikidata.org), the deep sea community human diseases for linking biomedical knowledge through disease data,” (https://github.com/geneontology/deep_sea), the biodiversity Nucleic Acids Res., Oct 27. pii: gku1011, 2014 and phenotype communities, and the Disease Ontology [10].