=Paper= {{Paper |id=Vol-1747/IP17_ICBO2016 |storemode=property |title=Comparison of Ontology Mapping Techniques to Map Plant Trait Ontologies |pdfUrl=https://ceur-ws.org/Vol-1747/IP17_ICBO2016.pdf |volume=Vol-1747 |authors=Marie-Angélique Laporte,Léo Valette,Laurel Cooper,Chris Mungall,Austin Meier,Pankaj Jaiswal,Elizabeth Arnaud |dblpUrl=https://dblp.org/rec/conf/icbo/LaporteVCMMJA16 }} ==Comparison of Ontology Mapping Techniques to Map Plant Trait Ontologies == https://ceur-ws.org/Vol-1747/IP17_ICBO2016.pdf
  Comparison of ontology mapping techniques to map
                plant trait ontologies

                                  Marie-Angélique Laporte, Léo Valette, Elizabeth Arnaud
                                                         Bioversity International
                                                          Montpellier, France
                                                         m.a.laporte@cgiar.org


                                         Laurel Cooper, Austin Meier, Pankaj Jaiswal
                                             Department of Botany and Plant Pathology
                                                     Oregon State University
                                                         Corvallis, USA

                                                         Christopher J. Mungall
                                          Berkeley Bioinformatics Open-Source Projects
                                             Lawrence Berkeley National Laboratory
                                                         Berkeley, USA

    Abstract—Crop specific ontologies for phenotype annotations        automated concept mapping techniques to be able to apply
in breeding have proliferated over the last 10 years. Across-crop      semantics channels for data integration and discovery.
data interoperability involves linking those ontologies together.
For this purpose, the Planteome project is mapping the Crop                The Planteome project (www.planteome.org) aims to
Ontology traits (www.cropontology.org) to the reference ontology       support comparative plant biology, and provides integrated
for plant traits, Trait Ontology (TO). Manual mapping is time-         access to annotated datasets generated by inter and intra-
consuming and not sustainable in the long-run as ontologies keep       specific comparative analysis of transcriptomes, proteomics,
on evolving and multiplicating. We are thus working on                 phenomics and genome annotation. To address this objective,
developing reliable automated mapping techniques to assist             Planteome is currently developing, and promoting the use of a
curators in performing semantic integration. Our study shows           set of reference ontologies for plants, proposing species-neutral
the benefit of the ontology matching technique based on formal         concepts, as well as common data annotation standards.
definitions and shared ontology design patterns, compared to           Harmonization between the species-specific ontologies and the
standard automatic ontology matching algorithm, such as AML            Planteome reference ontologies is currently done by mapping
(AgreementMakerLight).                                                 Crop Ontology to the Plant Trait Ontology (TO) [2], which is
                                                                       the reference species-neutral ontology for plant traits aiming at
    Keywords—ontology    mapping;    ontology   design    patterns;    integrating the many crop-specific trait ontologies.
reference ontologies
                                                                           The purpose our study is to generate mappings in an
                      I. INTRODUCTION                                  efficient way in order to ease the work of the ontology curators
                                                                       in creating manual mappings. In this objective, we have
    The development of improved crop varieties relies on both          compared two automatic ontology mapping techniques. The
traditional breeding methods and next-generation methods such          first technique is widely used to align ontologies and consists
as high-throughput sequencing, molecular breeding and                  in applying a standard automatic matching algorithm. Indeed,
automated scoring of traits. In that context, a number of              AML (AgreementMakerLight) performs mappings based on
ontologies have been developed to face the data                        both the string similarities of the ontology terms and the
interoperability issues. They fulfill the needs of specific            ontology structure. Considering the number of ontologies to be
communities, but are species or clade-specific ontologies [1]          mapped and the inherent nature of ontologies to evolve over
and therefore block data harmonization across disciplines and          time, it can be hard to maintain automatically the mappings
communities.                                                           created using such a technique. Therefore, the second
   The crop breeding community, in particular widely uses the          technique relies on formal definitions and shared ontology
Crop Ontology (CO; www.cropontology.org), which is                     design patterns. The ontology design patterns are created using
composed of species-specific ontologies for fieldbook edition          Ontology Web Language (OWL) axioms based on Entity-
and data annotation [1]. Because these ontologies grow in size         Quality (EQ) statements, leading to a post-composition of
and number, it is essential to develop efficient and reliable          terms, similar to what has been proposed by the Ontology of
                                                                       Biological Attributes (OBA) [3]. The Entity (E) and Quality
                                                                       (Q) are sourced from the reference ontologies promoted by
Planteome. The Q comes from the Phenotype and Trait                                    TABLE I.              MAPPING RESULTS
Ontology (PATO) whereas the E comes from Plant Ontology
(PO) when it is related to plant structures, Gene Ontology (GO)                            Rice          Wheat          Lentil       Cassava
for subcellular components, Chemical Entities of Biological            # trait
                                                                                            157            238            66           175
Interest (ChEBI) for chemical entities or Environment                  classes
Ontology (EO) for the environment conditions. Automated                     AML          84 (54%)       73 (30%)      28 (42%)      59 (34%)
reasoning engines are then used to generate the mappings                    Design
                                                                                         121 (77%)      199 (84%)     47 (71%)      118 (67%)
between the species-specific ontologies and the reference                  Patterns
ontologies, while guarantying the validity of the unified
merged ontology (i.e. TO plus the species-specific CO). As a
result, TO is being enriched with well defined crop-specific                                      III. CONCLUSION
terms of Crop Ontology and Planteome can integrate additional
data annotated in a unified way by the breeding and the genetic          In an era of ontology proliferation, it is of vital importance
communities.                                                         to have reference ontologies and powerful tools that reduce the
                                                                     effort of ontology alignment. Standard mapping techniques do
                                                                     not fit the need of ontology evolution over time as their results
                II. RESULTS AND DISCUSSION
                                                                     are difficult to maintain automatically. Developing the
          The AML algorithm and the design patterns approach         mapping process based on ontology design patterns and logical
have been applied to four crop Trait Dictionaries of the Crop        axioms ensures validity confidence accuracy of the resulting
Ontology so far: cereals rice and wheat, legume lentil and root      ontology mappings. Scientists from the breeding community
tuber crop cassava. Those ontologies are very different in           can continue to use the standards preferred by them to
terms of plant anatomy and morphology, but also in terms of          annotate/record their data, reducing the effort they need to
count and complexity of phenotypic traits. Table 2                   provide. Planteome, through the TO, provides unified access to
summarizes the results of the mappings process on trait terms.       the breeding and the genetic data, opening up the possibility to
Mapping using formal definitions resulted in two-fold increase       perform large scale analysis such as comparative genomics by
successful mappings. On average, AML was able to propose             promoting a species neutral approach.
mappings for ~40% of the CO classes in each ontology
compared to ~75% mapped terms using the formal definition                                    ACKNOWLEDGMENT
approach. This can be explained by the fact that crop specific          This work was supported by IOS:1340112 from the NSF.
ontologies use very specific terminologies, especially for the       Additionally, CJM acknowledges the support of the Director,
Entity part of the EQ statement. Although the specific plant         Office of Science, Office of Basic Energy Sciences, of the U.S.
entities are defined in the Plant Ontology (PO) as synonyms of       Department of Energy under Contract No. DE-AC02-
species neutral entities, all the synonyms were not added to         05CH11231
TO and CO when the terms were pre-composed. The AML
algorithm was thus not able to use this information, whereas
                                                                                                   REFERENCES
the PO synonyms have been used in order to build the formal
definitions of the CO terms. Furthermore, because the class          [1]   Shrestha, R., Arnaud, E., Mauleon, R., Senger, M., Davenport, G.F.,
                                                                           Hancock, D., Morrison, N., Bruskiewich, R. and McLaren, G., 2010.
hierarchy is quite simple in the different CO, AML was not                 Multifunctional crop trait ontology for breeders' data: field book,
able to use the ontology structures to improve the mapping                 annotation, data discovery and semantic enrichment of the
results: only equivalent terms were found using AML.                       literature. AoB plants, 2010, p.plq008.
          Disease resistance traits are important for breeders. A    [2]   Arnaud, E., Cooper, L., Shrestha, R., Menda, N., Nelson, R.T., Matteis,
disease results from the combination of a host species, a                  L., Skofic, M., Bastow, R., Jaiswal, P., Mueller, L.A. and McLaren, G.,
                                                                           2012, October. Towards a Reference Plant Trait Ontology for Modeling
pathogen and an environment, the disease resistance traits are             Knowledge of Plant Traits and Phenotypes. In KEOD (pp. 220-225).
crop-specific. Biotic stress traits include disease-related traits   [3]   https://github.com/obophenotype/bio-attribute-ontology,
and can cover as much as 20% of the individual CO. Those                   DOI:10.5281/zenodo.47337
traits cannot have an exact correspondence in TO. Thus AML
was not able to find mappings for those terms. Based on the
formal definitions, a reasoner linked those terms to be
subclasses of one the TO stress trait.
          Finally, all the classes in TO haven’t been formally
defined. Indeed, design patterns are hard to develop for very
complex traits such as yield-related traits. This is why the all
the CO classes cannot be mapped to TO classes using the
design pattern technique. Manual mapping is still needed in
order to map those traits. The mapping coverage will be
improved in the future. The mapped ontologies are available
on www.planteome.org as well as on Planteom’s GitHub
repository (https://github.com/Planteome).