=Paper=
{{Paper
|id=Vol-2137/paper_25.pdf
|storemode=property
|title=Logical Axiomatization of the Evidence & Conclusion Ontology (ECO) by Integrating External Ontology Classes
|pdfUrl=https://ceur-ws.org/Vol-2137/paper_25.pdf
|volume=Vol-2137
|authors=Rebecca Tauber,Marcus C. Chibucos
|dblpUrl=https://dblp.org/rec/conf/icbo/TauberC17
}}
==Logical Axiomatization of the Evidence & Conclusion Ontology (ECO) by Integrating External Ontology Classes==
Logical axiomatization of the Evidence & Conclusion Ontology (ECO) by integrating external ontology classes Rebecca Tauber1 & Marcus C. Chibucos1,2* 1 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, 21201 USA 2 Department of Microbiology and Immunology *correspondence: mchibucos@som.umaryland.edu * ABSTRACT literature (Fig. 1). ECO represents a range of evidence cate- Mapping semantically equivalent classes across ontologies is a crucial gories spanning from broad (e.g. ‘sequence similarity evi- step toward increasing interoperability and is necessary to enable the lever- aging of existing external ontologies during ontology development. In- dence’ or ‘author statement evidence’) to specific (e.g. ‘so- teroperability can allow the adoption of logical design patterns, which can dium dodecyl sulfate polyacrylamide gel electrophoresis enhance ontology manageability, improve structural consistency, and re- evidence’). Evidence types, as summarized by over 800 duce development time, in addition to facilitating knowledge discovery. The Evidence & Conclusion Ontology (ECO) and the Ontology for Bio- ECO classes, become important pieces of metadata associ- medical Investigations (OBI) began a loose collaboration, i.e. talking, in ated with annotations at databases that are used by research- 2011. Recently, however, great strides have been made toward harmoniz- ers worldwide to support their investigations. ing these two ontologies through integrating components of OBI into ECO, i.e. creating logical definitions in ECO using imported OBI classes. As these are two orthogonal OWL ontologies, enabling such integration re- quired creation of a logical design pattern to transform OBI classes (which define instruments, assays, etc.) into equivalent ECO evidence classes. This design pattern allows ECO to harness the expressivity of OBI in capturing complex experimental workflows that generate “evidence” that is cited in scientific publications. The goals of this effort are to increase consistency in the structure of ECO, facilitate further ECO and OBI development, better describe the methodologies that produce evidence, and discover new relationships between ECO evidence types. Here, we present the methods for integration and discuss this work as a model for future ontology harmo- nization efforts. 1 STRUCTURING SCIENTIFIC EVIDENCE When interpreting the findings of any scientific investi- gation, “evidence” is an important aspect to consider. What methods were employed? What types of data were generat- ed? How were findings interpreted? Documenting aspects of the scientific methodology employed in a given study af- fords investigators a basis for interpreting the results. Ultimately researchers use evidence to support a variety of conclusions. In the biomedical realm, one such conclu- sion might be the interpretation that a protein has a particu- lar function. Professional biocurators meticulously extract Fig. 1. ECO’s current highest-level evidence classes as depicted at such information – about methods, evidence, and conclu- http://evidenceontology.org/browse sions – from the scientific literature using a variety of man- ual and automated methods. This information is represented ECO terms, as ontology classes, contain standard defini- variously so that it can be stored at databases where it can tions and synonyms and are networked with relationships. be readily manipulated and used by researchers. Thus, associating research data with ECO evidence terms allows bioinformatics resources to manage large volumes of 1.1 The Evidence & Conclusion Ontology annotation data by providing mechanisms for sorting, query- The Evidence & Conclusion Ontology (ECO)1 systemati- ing, and performing quality control checks. For example, cally describes types of scientific evidence in biological UniProt-Gene Ontology Annotation (UniProt-GOA) uses research, such as evidence generated from laboratory exper- ECO to support searching of more than 365 million evi- iments, computational methods, or statements curated from dence-linked GO annotations2 and the Gene Ontology3 re- source itself uses ECO in support of various quality control mechanisms including annotation consistency.4 Funding acknowledgement: This material is based upon work supported by the National Science Foundation (NSF) Division of Biological Infra- structure (DBI) under Award Number 1458400 to MCC. 1 Tauber and Chibucos 1.1.1. Axes of classification (IAO)5, which defines ‘information content entity’ as “a The axes of classification in ECO are ‘evidence’ and ‘as- generically dependent continuant that is about some thing.” sertion method’, which are disjoint from one another (Fig. ‘assertion method’ (ECO:0000217) 2). ‘Assertion method’ (ECO:0000217) is the second root class of ECO in addition to ‘evidence’, and it is used to de- scribe whether a human being (e.g. a professional biocura- tor) or a machine (e.g. a computational pipeline) generated a particular evidence-based annotation that is stored at a bio- logical database. This class and its node within the ECO ontology have a complex history outside the present discus- sion (see Chibucos, et al. 20141 for a more thorough discus- sion). Briefly, ‘assertion method’ has only two subclasses, ‘manual assertion’ and ‘automatic assertion’, which refer to statements made by humans and machines, respectively. Connecting ‘evidence’ and ‘assertion method’ ‘Evidence’ is logically tied to ‘assertion method’ through the ‘used in’ relationship, enabling one to state whether a person or machine applied a particular piece of evidence in making an annotation (Fig. 2). For example, a human biocu- rator reading the literature to generate biological database Fig. 2. The two ECO root classes. annotations might read a scientific article where some ‘ex- perimental evidence’ (ECO:0000006) was presented about some metabolic pathway and its association with some dis- ‘evidence’ (ECO:0000000) ease in some organism. After carefully interpreting the methods and results presented in the paper, the biocurator ‘Evidence’ (ECO:0000000) - defined as “a type of in- might draw a conclusion such as “metabolic pathway x is formation that is used to support an assertion” - can be involved in disease y”. thought of as a description that may be representative of This conclusion might be asserted by the curator, typical- both the broad methods employed and any outputs generat- ly as a database annotation that could include multiple other ed by such methods. For example, ‘clinical study evidence’ pieces of information, depending on the database. Because a (ECO:0000180) may refer both to the protocols used and person made the annotation, i.e. ‘manual assertion’ types of data generated during a controlled investigation that (ECO:0000218), and the evidence supporting the annotation uses human subjects. was ‘experimental evidence’, these two disjoint classes be- Consider ‘chromatography evidence’ (ECO:0000325), which is defined as “a type of experimental evidence that is come connected as ‘experimental evidence used in manual based on separation of constituent parts of a mixture (the assertion’ (ECO:0000269). mobile phase) as they pass differentially through a station- Simultaneously recording both ‘evidence’ and ‘assertion ary phase due to differences in partition coefficient and re- method’ gives databases another dimension for interpreting tention on the stationary phase.” A researcher considering and presenting data. (Note: the ‘used in’ relationship is un- some scientific conclusion supported by chromatography der review and this structure of ECO is subject to continued evidence might be evaluating a graph generated during a development.) chromatography experiment that depicts a peak, which rep- resents light absorbance and elution time from a stationary 1.1.2. Current ECO status column. But the peak alone is not taken as the evidence: the results are considered within a particular context. Experi- As ECO’s user base has continued to grow, so has the mental conditions such as the type of solvent or column number of classes. As of July 2017, there were 513 pure used or observations such as how the chromatograph peak ‘evidence’ classes, i.e. those not linked logically to ‘asser- compares to peaks made with known standards are consid- tion method’ but which have a subclass that is so linked. ered, as well. 316 additional classes were of the ‘used in manual assertion’ Thus, ECO classes are considered summary in nature. type, meaning that they are children of one of the approxi- Each class can be seen as a type of ‘information content mately 500 pure evidence classes, combined with the ‘used entity’ (IAO:0000030) from Information Artifact Ontology in’ logical definition for a ‘manual assertion’. Finally, there were 54 ‘used in automatic assertion’ terms. 2 ECO & OBI ONTOLOGY HARMONIZATION Up to this point, ECO has primarily been a class hierar- annotate evidence from complex workflows (and would like chy, only utilizing a ‘used in’ property to logically define to see a tidy summary class). how the evidence was generated. The addition of more logi- Ideally, mapping classes between ontologies can be a cal definitions through incorporation of the Ontology for straightforward process. A class axiom using the Biomedical Investigations (OBI)6 can lead to discovery of owl:equivalentClass property is added to link a class in one new relationships through reasoning and facilitate develop- ontology to an equivalent class in another. However, this is ment speed & consistency. It has also helped to further clari- only possible and logically correct between heterogeneous fy ECO’s axes of classification and standardize ECO’s Eng- ontologies. In the case of orthogonal ontologies, it is easy to lish definitions. see a correlation between two terms, but it is much more difficult to transform this into a class axiom. For example, 2 INTRODUCTION TO OBI while ECO may define ‘microscopy evidence’, OBI defines The Ontology for Biomedical Investigations (OBI)6 de- the process of ‘microscopy’. How does one state that the scribes scientific investigations, e.g. study design & execu- process of microscopy results in microscopy evidence? tion, instruments & processes, data analysis, and so on, and To make this logical transformation, an alignment On- can be used to model how aspects of an investigation inter- tology Design Pattern (ODP) must be created. This serves as relate. OBI, like ECO, is developed in Web Ontology Lan- an OWL template to be inserted as the object of the equiva- guage (OWL). OBI uses upper-level Basic Formal Ontology lence class axiom. In reality, even the simple axiom ‘x (BFO)7 classes to guide development. BFO top-level classes owl:equivalentClass y’ is an ODP, but, out of necessity, the include ‘continuant’ and ‘occurrent’. ODPs for orthogonal ontologies tend to be more complex. 3.1 Ontology Design Pattern The ECO-OBI ODP consists of four distinct components that are combined to create the mappings (Fig. 4). The % symbol is replaced with the OBI class for the mapping, and ‘evidence’ is replaced with the direct parent of the evidence class being mapped. These axioms are either equivalence or subclass statements, depending on the degree of specificity that can be achieved with existing OBI classes. Fig. 3. Selected BFO7 classes (dark blue) and OBI6 classes (light blue). OBI uses logical axioms to describe different parts of bi- omedical investigations, which allows for very detailed modeling of such investigations. As shown in Fig. 3, the parts of an investigation may include a study design, inde- pendent and dependent variables, and the assay conducted. These are important components of the ECO-OBI map- pings. 3 MAPPING ONTOLOGY CLASSES In order to make use of the logic already inherent to Fig. 4. ECO-OBI Ontology Design Pattern (ODP) components OBI, ECO classes must be mapped to their equivalent OBI with OWL axiom (blue text) class(es), which already utilize various logical definitions. The mappings import that logic to be used in reasoning for It is important to note that each mapping may use any- structural analysis and future knowledge discovery. Not where from one to all of the components, depending on the only can the ECO structure be reviewed and revised, but complexity of the processes involved in generating the evi- also these mappings provide benefits to ECO users who dence. Specifically, many ECO evidence classes may not 3 Tauber and Chibucos include an independent variable that has been manipulated we were able to go through, row by row, and determine the to assess a dependent variable. This is true for assays that best fit for each. This required manual review of the ECO measure, detect, prepare, or simply visualize specimens, class and manual searches of both OBI and GO. After we such as microscopy. determined the design pattern was feasible, it was time to Many classes have completed mappings to all four ODP test the axioms in the ontology itself. components (Fig. 5). ROBOT8 is a versatile tool for working with OWL on- tologies and was created to work with biomedical ontolo- gies, although it can easily be applied to any ontology de- velopment. It allows developers to perform a variety of tasks, from filtering, to merging, and even converting ontol- ogy formats. One of the most useful features of ROBOT (and the one that was utilized for our harmonization efforts) is the template command. The spreadsheet created in the previous step was formatted with specific headers that ROBOT uses to transform the cell contents into axioms. The ROBOT template we used is demonstrated in Table 1, with two examples of mappings. Fig. 5. ECO class (light blue) with completed mapping to OBI (dark blue). This particular mapping is a subclass statement, as there are no OBI classes specific enough to make an equivalence axiom logically correct. The actual subclass statement for ‘tissue grafting evi- dence’ utilizes all four ECO mappings to OBI (Fig. 6). Table 1. ROBOT template displaying mapping between two ECO classes and respective OBI classes (explained in text). As shown in Table 1, the first row contains human read- able labels for each column that are not parsed by ROBOT. The second row contains the template strings. If a cell in the second row begins with a ‘C’, all entries in that column will be parsed as logical axioms. On the other hand, if it were to begin with an ‘A’, it would be parsed as an annotation. For Fig. 6. Subclass statement relating ECO ‘tissue grafting evidence’ the OBI columns, the ‘...’ in row two contains the OWL to OBI axioms shown in the design pattern, and the % symbol is replaced by the content in a given cell. The column ‘CLASS_TYPE’ specifies if the generated axiom is either a type of subclass or equivalent statement. 3.2 Mapping process Before any mappings could begin, we needed to retrieve After populating the table, for each ECO class in the ID a set of ECO classes for testing. The ‘experimental evi- column of a ROBOT template, ROBOT will parse the con- dence’ node of ECO was chosen because these evidence tents of that row and build an axiom based on the infor- classes can more easily be associated with various assays mation in each cell that corresponds to the template strings found in OBI. A SPARQL query was performed to get all in the column headers. children of ‘experimental evidence’ and associated axioms as a CSV. 3.2.1. Results of mapping ECO-OBI In order to facilitate the workflow, the CSV was export- The axioms created by the ROBOT template were im- ed to a Google spreadsheet and headers were added with mediately merged into ECO and reviewed in Protégé. space for each component of the design pattern. This way, 4 ECO & OBI ONTOLOGY HARMONIZATION Throughout the mapping process, we detected areas of OBI to expand. In some cases, OBI did not have enough terms to create an accurate mapping, so term suggestions were made. We are currently in the process of requesting the addition of 40 assay classes and 24 non-assay classes. Once these new terms have been accepted into OBI, 161 map- pings using them will be added to the ECO working branch on GitHub9 for review. We believe that expending the effort to map ECO and OBI has already been worth the effort. It has identified areas for OBI development, resulted in greater logic within ECO, and helped disentangle confused axes of classification with- in ECO. Work will continue on harmonizing ECO and OBI using the experimental node of ECO initially but expanding eventually to other areas, e.g. sequence similarity. After ECO and OBI have robust mappings, we believe that eventually ECO can leverage other external ontologies in a similar fashion. ACKNOWLEDGEMENTS Special thanks to: Elvira Mitraka for term review assistance; Matthew Brush for contributing the ODP, which was con- ceived during a 2016 joint OBI-ECO meeting on evidence in Baltimore; and Bjoern Peters, James Overton, Christian J. Stoeckert, Jr. and other OBI developers for collaborating. REFERENCES 1. Chibucos, M.C., Mungall, C.J., Balakrishnan, R., Christie, K.R., Hunt- ley, R.P., White, O., Blake, J.A., Lewis, S.E., and Giglio, M. (2014) Standardized description of scientific evidence using the Evidence On- tology (ECO). Database (Oxford), v.2014:bau075. 2. Dimmer, E.C., Huntley, R.P., Alam-Faruque, Y., Sawford, T., O'Dono- van, C., Martinet, M.J., et al. (2012) The UniProt-GO Annotation data- base in 2011. Nucleic Acids Research. 40:D565–D570. 3. The Gene Ontology Consortium. (2016) Expansion of the Gene Ontolo- gy knowledgebase and resources. Nucleic Acids Research. 45(D1):D331-D338. 4. Chibucos, M.C., Siegele, D.A., Hu, J.C., Giglio, M. (2017) The Evi- dence and Conclusion Ontology (ECO): Supporting GO Annotations. In Christophe Dessimoz & Nives Škunca (eds.), The Gene Ontology Handbook, Methods in Molecular Biology, vol. 1446, pp. 245-259. New York City: Humana Press (Springer). ISBN 978-1-4939-3743-1 5. https://github.com/information-artifact-ontology/IAO 6. Bandrowski, A., Brinkman, R., Brochhausen, M., Brush, M.H., Bug, B., Chibucos, M.C., et al., (2016) The Ontology for Biomedical Investiga- tions. PLoS One. 11(4):e0154556. 7. Arp, R., Smith, B., Spear, A.D. (2015) Building Ontologies with Basic Formal Ontology. Cambridge: The MIT Press. 8. ROBOT on GitHub: https://github.com/ontodev/robot 9. ECO on GitHub: https://github.com/evidenceontology/evidenceontology 5