=Paper= {{Paper |id=Vol-2137/paper_25.pdf |storemode=property |title=Logical Axiomatization of the Evidence & Conclusion Ontology (ECO) by Integrating External Ontology Classes |pdfUrl=https://ceur-ws.org/Vol-2137/paper_25.pdf |volume=Vol-2137 |authors=Rebecca Tauber,Marcus C. Chibucos |dblpUrl=https://dblp.org/rec/conf/icbo/TauberC17 }} ==Logical Axiomatization of the Evidence & Conclusion Ontology (ECO) by Integrating External Ontology Classes== https://ceur-ws.org/Vol-2137/paper_25.pdf
       Logical axiomatization of the Evidence & Conclusion Ontology (ECO)
                    by integrating external ontology classes
                                        Rebecca Tauber1 & Marcus C. Chibucos1,2*
         1
             Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, 21201 USA
                                            2
                                              Department of Microbiology and Immunology
                                         *correspondence: mchibucos@som.umaryland.edu
*
    ABSTRACT                                                                    literature (Fig. 1). ECO represents a range of evidence cate-
    Mapping semantically equivalent classes across ontologies is a crucial      gories spanning from broad (e.g. ‘sequence similarity evi-
step toward increasing interoperability and is necessary to enable the lever-
aging of existing external ontologies during ontology development. In-          dence’ or ‘author statement evidence’) to specific (e.g. ‘so-
teroperability can allow the adoption of logical design patterns, which can     dium dodecyl sulfate polyacrylamide gel electrophoresis
enhance ontology manageability, improve structural consistency, and re-         evidence’). Evidence types, as summarized by over 800
duce development time, in addition to facilitating knowledge discovery.
The Evidence & Conclusion Ontology (ECO) and the Ontology for Bio-              ECO classes, become important pieces of metadata associ-
medical Investigations (OBI) began a loose collaboration, i.e. talking, in      ated with annotations at databases that are used by research-
2011. Recently, however, great strides have been made toward harmoniz-          ers worldwide to support their investigations.
ing these two ontologies through integrating components of OBI into ECO,
i.e. creating logical definitions in ECO using imported OBI classes. As
these are two orthogonal OWL ontologies, enabling such integration re-
quired creation of a logical design pattern to transform OBI classes (which
define instruments, assays, etc.) into equivalent ECO evidence classes. This
design pattern allows ECO to harness the expressivity of OBI in capturing
complex experimental workflows that generate “evidence” that is cited in
scientific publications. The goals of this effort are to increase consistency
in the structure of ECO, facilitate further ECO and OBI development,
better describe the methodologies that produce evidence, and discover new
relationships between ECO evidence types. Here, we present the methods
for integration and discuss this work as a model for future ontology harmo-
nization efforts.

1     STRUCTURING SCIENTIFIC EVIDENCE
   When interpreting the findings of any scientific investi-
gation, “evidence” is an important aspect to consider. What
methods were employed? What types of data were generat-
ed? How were findings interpreted? Documenting aspects of
the scientific methodology employed in a given study af-
fords investigators a basis for interpreting the results.
   Ultimately researchers use evidence to support a variety
of conclusions. In the biomedical realm, one such conclu-
sion might be the interpretation that a protein has a particu-
lar function. Professional biocurators meticulously extract
                                                                                Fig. 1. ECO’s current highest-level evidence classes as depicted at
such information – about methods, evidence, and conclu-                                       http://evidenceontology.org/browse
sions – from the scientific literature using a variety of man-
ual and automated methods. This information is represented                         ECO terms, as ontology classes, contain standard defini-
variously so that it can be stored at databases where it can                    tions and synonyms and are networked with relationships.
be readily manipulated and used by researchers.                                 Thus, associating research data with ECO evidence terms
                                                                                allows bioinformatics resources to manage large volumes of
1.1     The Evidence & Conclusion Ontology                                      annotation data by providing mechanisms for sorting, query-
   The Evidence & Conclusion Ontology (ECO)1 systemati-                         ing, and performing quality control checks. For example,
cally describes types of scientific evidence in biological                      UniProt-Gene Ontology Annotation (UniProt-GOA) uses
research, such as evidence generated from laboratory exper-                     ECO to support searching of more than 365 million evi-
iments, computational methods, or statements curated from                       dence-linked GO annotations2 and the Gene Ontology3 re-
                                                                                source itself uses ECO in support of various quality control
                                                                                mechanisms including annotation consistency.4
Funding acknowledgement: This material is based upon work supported
by the National Science Foundation (NSF) Division of Biological Infra-
structure (DBI) under Award Number 1458400 to MCC.



                                                                                                                                                 1
Tauber and Chibucos



         1.1.1. Axes of classification                           (IAO)5, which defines ‘information content entity’ as “a
  The axes of classification in ECO are ‘evidence’ and ‘as-      generically dependent continuant that is about some thing.”
sertion method’, which are disjoint from one another (Fig.                ‘assertion method’ (ECO:0000217)
2).
                                                                    ‘Assertion method’ (ECO:0000217) is the second root
                                                                 class of ECO in addition to ‘evidence’, and it is used to de-
                                                                 scribe whether a human being (e.g. a professional biocura-
                                                                 tor) or a machine (e.g. a computational pipeline) generated a
                                                                 particular evidence-based annotation that is stored at a bio-
                                                                 logical database. This class and its node within the ECO
                                                                 ontology have a complex history outside the present discus-
                                                                 sion (see Chibucos, et al. 20141 for a more thorough discus-
                                                                 sion). Briefly, ‘assertion method’ has only two subclasses,
                                                                 ‘manual assertion’ and ‘automatic assertion’, which refer to
                                                                 statements made by humans and machines, respectively.

                                                                          Connecting ‘evidence’ and ‘assertion method’
                                                                    ‘Evidence’ is logically tied to ‘assertion method’ through
                                                                 the ‘used in’ relationship, enabling one to state whether a
                                                                 person or machine applied a particular piece of evidence in
                                                                 making an annotation (Fig. 2). For example, a human biocu-
                                                                 rator reading the literature to generate biological database
               Fig. 2. The two ECO root classes.                 annotations might read a scientific article where some ‘ex-
                                                                 perimental evidence’ (ECO:0000006) was presented about
                                                                 some metabolic pathway and its association with some dis-
         ‘evidence’ (ECO:0000000)                                ease in some organism. After carefully interpreting the
                                                                 methods and results presented in the paper, the biocurator
   ‘Evidence’ (ECO:0000000) - defined as “a type of in-          might draw a conclusion such as “metabolic pathway x is
formation that is used to support an assertion” - can be         involved in disease y”.
thought of as a description that may be representative of           This conclusion might be asserted by the curator, typical-
both the broad methods employed and any outputs generat-
                                                                 ly as a database annotation that could include multiple other
ed by such methods. For example, ‘clinical study evidence’
                                                                 pieces of information, depending on the database. Because a
(ECO:0000180) may refer both to the protocols used and
                                                                 person made the annotation, i.e. ‘manual assertion’
types of data generated during a controlled investigation that
                                                                 (ECO:0000218), and the evidence supporting the annotation
uses human subjects.
                                                                 was ‘experimental evidence’, these two disjoint classes be-
   Consider ‘chromatography evidence’ (ECO:0000325),
which is defined as “a type of experimental evidence that is     come connected as ‘experimental evidence used in manual
based on separation of constituent parts of a mixture (the       assertion’ (ECO:0000269).
mobile phase) as they pass differentially through a station-        Simultaneously recording both ‘evidence’ and ‘assertion
ary phase due to differences in partition coefficient and re-    method’ gives databases another dimension for interpreting
tention on the stationary phase.” A researcher considering       and presenting data. (Note: the ‘used in’ relationship is un-
some scientific conclusion supported by chromatography           der review and this structure of ECO is subject to continued
evidence might be evaluating a graph generated during a          development.)
chromatography experiment that depicts a peak, which rep-
resents light absorbance and elution time from a stationary               1.1.2. Current ECO status
column. But the peak alone is not taken as the evidence: the
results are considered within a particular context. Experi-         As ECO’s user base has continued to grow, so has the
mental conditions such as the type of solvent or column          number of classes. As of July 2017, there were 513 pure
used or observations such as how the chromatograph peak          ‘evidence’ classes, i.e. those not linked logically to ‘asser-
compares to peaks made with known standards are consid-          tion method’ but which have a subclass that is so linked.
ered, as well.                                                   316 additional classes were of the ‘used in manual assertion’
   Thus, ECO classes are considered summary in nature.           type, meaning that they are children of one of the approxi-
Each class can be seen as a type of ‘information content         mately 500 pure evidence classes, combined with the ‘used
entity’ (IAO:0000030) from Information Artifact Ontology         in’ logical definition for a ‘manual assertion’. Finally, there
                                                                 were 54 ‘used in automatic assertion’ terms.


2
                                                                                         ECO & OBI ONTOLOGY HARMONIZATION



   Up to this point, ECO has primarily been a class hierar-         annotate evidence from complex workflows (and would like
chy, only utilizing a ‘used in’ property to logically define        to see a tidy summary class).
how the evidence was generated. The addition of more logi-              Ideally, mapping classes between ontologies can be a
cal definitions through incorporation of the Ontology for           straightforward process. A class axiom using the
Biomedical Investigations (OBI)6 can lead to discovery of           owl:equivalentClass property is added to link a class in one
new relationships through reasoning and facilitate develop-         ontology to an equivalent class in another. However, this is
ment speed & consistency. It has also helped to further clari-      only possible and logically correct between heterogeneous
fy ECO’s axes of classification and standardize ECO’s Eng-          ontologies. In the case of orthogonal ontologies, it is easy to
lish definitions.                                                   see a correlation between two terms, but it is much more
                                                                    difficult to transform this into a class axiom. For example,
2   INTRODUCTION TO OBI                                             while ECO may define ‘microscopy evidence’, OBI defines
    The Ontology for Biomedical Investigations (OBI)6 de-           the process of ‘microscopy’. How does one state that the
scribes scientific investigations, e.g. study design & execu-       process of microscopy results in microscopy evidence?
tion, instruments & processes, data analysis, and so on, and            To make this logical transformation, an alignment On-
can be used to model how aspects of an investigation inter-         tology Design Pattern (ODP) must be created. This serves as
relate. OBI, like ECO, is developed in Web Ontology Lan-            an OWL template to be inserted as the object of the equiva-
guage (OWL). OBI uses upper-level Basic Formal Ontology             lence class axiom. In reality, even the simple axiom ‘x
(BFO)7 classes to guide development. BFO top-level classes          owl:equivalentClass y’ is an ODP, but, out of necessity, the
include ‘continuant’ and ‘occurrent’.                               ODPs for orthogonal ontologies tend to be more complex.


                                                                    3.1    Ontology Design Pattern
                                                                       The ECO-OBI ODP consists of four distinct components
                                                                    that are combined to create the mappings (Fig. 4). The %
                                                                    symbol is replaced with the OBI class for the mapping, and
                                                                    ‘evidence’ is replaced with the direct parent of the evidence
                                                                    class being mapped. These axioms are either equivalence or
                                                                    subclass statements, depending on the degree of specificity
                                                                    that can be achieved with existing OBI classes.




Fig. 3. Selected BFO7 classes (dark blue) and OBI6 classes (light
                             blue).

   OBI uses logical axioms to describe different parts of bi-
omedical investigations, which allows for very detailed
modeling of such investigations. As shown in Fig. 3, the
parts of an investigation may include a study design, inde-
pendent and dependent variables, and the assay conducted.
These are important components of the ECO-OBI map-
pings.

3   MAPPING ONTOLOGY CLASSES
    In order to make use of the logic already inherent to            Fig. 4. ECO-OBI Ontology Design Pattern (ODP) components
OBI, ECO classes must be mapped to their equivalent OBI                            with OWL axiom (blue text)
class(es), which already utilize various logical definitions.
The mappings import that logic to be used in reasoning for             It is important to note that each mapping may use any-
structural analysis and future knowledge discovery. Not             where from one to all of the components, depending on the
only can the ECO structure be reviewed and revised, but             complexity of the processes involved in generating the evi-
also these mappings provide benefits to ECO users who               dence. Specifically, many ECO evidence classes may not



                                                                                                                                 3
Tauber and Chibucos



include an independent variable that has been manipulated            we were able to go through, row by row, and determine the
to assess a dependent variable. This is true for assays that         best fit for each. This required manual review of the ECO
measure, detect, prepare, or simply visualize specimens,             class and manual searches of both OBI and GO. After we
such as microscopy.                                                  determined the design pattern was feasible, it was time to
   Many classes have completed mappings to all four ODP              test the axioms in the ontology itself.
components (Fig. 5).                                                     ROBOT8 is a versatile tool for working with OWL on-
                                                                     tologies and was created to work with biomedical ontolo-
                                                                     gies, although it can easily be applied to any ontology de-
                                                                     velopment. It allows developers to perform a variety of
                                                                     tasks, from filtering, to merging, and even converting ontol-
                                                                     ogy formats. One of the most useful features of ROBOT
                                                                     (and the one that was utilized for our harmonization efforts)
                                                                     is the template command. The spreadsheet created in the
                                                                     previous step was formatted with specific headers that
                                                                     ROBOT uses to transform the cell contents into axioms. The
                                                                     ROBOT template we used is demonstrated in Table 1, with
                                                                     two examples of mappings.
    Fig. 5. ECO class (light blue) with completed mapping to OBI
  (dark blue). This particular mapping is a subclass statement, as
 there are no OBI classes specific enough to make an equivalence
                      axiom logically correct.




   The actual subclass statement for ‘tissue grafting evi-
dence’ utilizes all four ECO mappings to OBI (Fig. 6).
                                                                         Table 1. ROBOT template displaying mapping between two
                                                                        ECO classes and respective OBI classes (explained in text).




                                                                        As shown in Table 1, the first row contains human read-
                                                                     able labels for each column that are not parsed by ROBOT.
                                                                     The second row contains the template strings. If a cell in the
                                                                     second row begins with a ‘C’, all entries in that column will
                                                                     be parsed as logical axioms. On the other hand, if it were to
                                                                     begin with an ‘A’, it would be parsed as an annotation. For
Fig. 6. Subclass statement relating ECO ‘tissue grafting evidence’   the OBI columns, the ‘...’ in row two contains the OWL
                              to OBI                                 axioms shown in the design pattern, and the % symbol is
                                                                     replaced by the content in a given cell. The column
                                                                     ‘CLASS_TYPE’ specifies if the generated axiom is either a
                                                                     type of subclass or equivalent statement.
3.2    Mapping process
    Before any mappings could begin, we needed to retrieve               After populating the table, for each ECO class in the ID
a set of ECO classes for testing. The ‘experimental evi-             column of a ROBOT template, ROBOT will parse the con-
dence’ node of ECO was chosen because these evidence                 tents of that row and build an axiom based on the infor-
classes can more easily be associated with various assays            mation in each cell that corresponds to the template strings
found in OBI. A SPARQL query was performed to get all                in the column headers.
children of ‘experimental evidence’ and associated axioms
as a CSV.                                                                     3.2.1. Results of mapping ECO-OBI
    In order to facilitate the workflow, the CSV was export-           The axioms created by the ROBOT template were im-
ed to a Google spreadsheet and headers were added with               mediately merged into ECO and reviewed in Protégé.
space for each component of the design pattern. This way,


4
                                                                             ECO & OBI ONTOLOGY HARMONIZATION



    Throughout the mapping process, we detected areas of
OBI to expand. In some cases, OBI did not have enough
terms to create an accurate mapping, so term suggestions
were made. We are currently in the process of requesting the
addition of 40 assay classes and 24 non-assay classes. Once
these new terms have been accepted into OBI, 161 map-
pings using them will be added to the ECO working branch
on GitHub9 for review.
    We believe that expending the effort to map ECO and
OBI has already been worth the effort. It has identified areas
for OBI development, resulted in greater logic within ECO,
and helped disentangle confused axes of classification with-
in ECO. Work will continue on harmonizing ECO and OBI
using the experimental node of ECO initially but expanding
eventually to other areas, e.g. sequence similarity.
    After ECO and OBI have robust mappings, we believe
that eventually ECO can leverage other external ontologies
in a similar fashion.

ACKNOWLEDGEMENTS
Special thanks to: Elvira Mitraka for term review assistance;
Matthew Brush for contributing the ODP, which was con-
ceived during a 2016 joint OBI-ECO meeting on evidence
in Baltimore; and Bjoern Peters, James Overton, Christian J.
Stoeckert, Jr. and other OBI developers for collaborating.

REFERENCES
1. Chibucos, M.C., Mungall, C.J., Balakrishnan, R., Christie, K.R., Hunt-
    ley, R.P., White, O., Blake, J.A., Lewis, S.E., and Giglio, M. (2014)
    Standardized description of scientific evidence using the Evidence On-
    tology (ECO). Database (Oxford), v.2014:bau075.
2. Dimmer, E.C., Huntley, R.P., Alam-Faruque, Y., Sawford, T., O'Dono-
    van, C., Martinet, M.J., et al. (2012) The UniProt-GO Annotation data-
    base in 2011. Nucleic Acids Research. 40:D565–D570.
3. The Gene Ontology Consortium. (2016) Expansion of the Gene Ontolo-
    gy knowledgebase and resources. Nucleic Acids Research.
    45(D1):D331-D338.
4. Chibucos, M.C., Siegele, D.A., Hu, J.C., Giglio, M. (2017) The Evi-
    dence and Conclusion Ontology (ECO): Supporting GO Annotations. In
    Christophe Dessimoz & Nives Škunca (eds.), The Gene Ontology
    Handbook, Methods in Molecular Biology, vol. 1446, pp. 245-259.
    New York City: Humana Press (Springer). ISBN 978-1-4939-3743-1
5. https://github.com/information-artifact-ontology/IAO
6. Bandrowski, A., Brinkman, R., Brochhausen, M., Brush, M.H., Bug, B.,
    Chibucos, M.C., et al., (2016) The Ontology for Biomedical Investiga-
    tions. PLoS One. 11(4):e0154556.
7. Arp, R., Smith, B., Spear, A.D. (2015) Building Ontologies with Basic
    Formal Ontology. Cambridge: The MIT Press.
8. ROBOT on GitHub: https://github.com/ontodev/robot
9. ECO on GitHub: https://github.com/evidenceontology/evidenceontology




                                                                                                            5