Reusing the NCBO BioPortal technology for agronomy to build AgroPortal Clément Jonquet,1,2,6 Anne Toulet,1,2, 2 Computational Biology Institute (IBC) of Elizabeth Arnaud,3 Sophie Aubin,4 Esther Dzalé Montpellier, France Yeumo,4 Vincent Emonet,1 John Graybeal,6 Mark A. 3 Bioversity International, Montpellier, France Musen,6 Cyril Pommier,4 Pierre Larmande2,5 4 INRA Versailles, France 5 1 UMR DIADE, IRD Montpellier, France Laboratory of Informatics, Robotics and 6 Center for BioMedical Informatics Research Microelectronics of Montpellier (LIRMM) (BMIR), Stanford, USA University of Montpellier & CNRS, France jonquet@lirmm.fr Abstract— Many vocabularies and ontologies are produced to ontologies become important. In fact, there exists a need of a represent and annotate agronomic data. By reusing the NCBO one-stop-shop for agronomical, environmental and plant BioPortal technology, we have already designed and sciences ontologies enabling to identify and select an ontology implemented an advanced prototype ontology repository for the for a specific task as well as offering generic services to exploit agronomy domain. We plan to turn that prototype into a real them in search, annotation or other scientific data management service to the community. The AgroPortal project aims at processes. Therefore, our goal is to enable straightforward use reusing the scientific outcomes and experience of the biomedical of agronomic related ontologies, avoiding data managers and domain in the context of plant, agronomic, food, environment researchers the burden to deal with complex knowledge (perhaps animal) sciences. We offer an ontology portal which engineering issues. features ontology hosting, search, versioning, visualization, comment, recommendation, enables semantic annotation, as well In the biomedical domain, the NCBO BioPortal as storing and exploiting ontology alignments. All of these within (http://bioportal.bioontology.org) [3] is a well-known open a fully semantic web compliant infrastructure. The AgroPortal repository for biomedical ontologies originally spread out over specifically pays attention to respect the requirements of the the web and in different formats. The NCBO BioPortal agronomic community in terms of ontology formats (e.g., SKOS, functionalities have been progressively extended in the last 10 trait dictionaries) or supported features. In this paper, we years, and the platform is fully semantic web compliant present our prototype as well as preliminary outputs of four (ontologies, mappings and annotations are stored in an RDF driving agronomic use cases. With the experience acquired in the triple store). However, the BioPortal is specific for health and biomedical domain and building atop of an already existing technology, we think that AgroPortal offers a robust and stable biomedical ontologies and even if an overlap exists, the portal reference repository that will become highly valuable for the does not span to the agronomic, environment or animal agronomic domain. domains. An important aspect is that NCBO technology is domain-independent and open source. A BioPortal virtual Keywords—ontology repository, ontology mapping, semantic appliance 1 is available as a server machine embedding the annotation, agronomic sciences. complete code and deployment environment, allowing anyone to set up a local ontology repository and customize it. I. INTRODUCTION In this paper we present an ontology repository advanced Similarly to what happens in biomedicine, communities prototype to support these challenges in agronomy and plant engaged in agronomic research need to access specific sets of sciences. The portal is built atop of the NCBO BioPortal ontologies for data annotation and integration. For instance, it technology. The main objective of the AgroPortal project is to has been established that the scientific challenges in plant develop and support a reference ontology repository for the breeding have switched from genetics to phenotyping and that agronomic domain. standard traits/phenotypes vocabularies are necessary to facilitate breeder’s data integration and comparison. In parallel II. RELATED WORK of very specific crop dictionaries [1], important organizations In the biomedical or agronomic domains there exists have produced large reference vocabularies such as several “knowledge organization systems” listings such as AGROVOC (Food and Agriculture Organization), NAL BioSharing (biosharing.org) or the VEST Registry Thesaurus (National Agricultural Library) or the CAB (aims.fao.org/vest-registry). They usually register ontologies Thesaurus (Centre for Agricultural Bioscience International) and provide a few metadata about them. However, because and are currently working on integrating them [2]. The more they are registries for different kind of resources, they do not ontologies are being produced in the domain, the more the need to create, store and retrieve alignments between those 1 www.bioontology.org/wiki/index.php/Category:NCBO_Virtual_Appliance support the level of features that an ontology repository offers. SPARQL endpoint (http://sparql.agroportal.lirmm.fr). While More specifically to plant domain, the Crop Ontology web assuring the day to day maintenance and monitoring of the application (www.cropontology.org) [4] publishes online sets portal and keeping it up-to-date with the NCBO technology, of ontologies & dictionaries required for describing crop we have started to work on customizations and specific germplasm, traits and evaluation trials. It contains 18 species- services for the agronomic/plant community. For instances: specific ontologies in addition to ontologies related to the crop organizing the content of the portal, working on multilingual germplasm domain. The current web application facilitates the support, interconnecting BioPortal and AgroPortal, scoring complete ontology-engineering life cycle starting with annotations, supporting different formats, adding new collaborative construction, publishing, use and modification. metadata. However, it necessitates important improvements of the current versioning, curation, multilingual aspects, user interface as well as for data annotation and mapping features. The Planteome portal (www.planteome.org) [5], is reusing the Gene Ontology project AmiGO technology to build a database of searchable and browsable annotations for plant traits, phenotypes, diseases, genomes, gene expression data across a wide range of plant species. Although the portal hosts the reference ontologies in the plant biology (e.g., PO, TO, EO), the portal focus is on data (not ontologies) and the scope is not as large as the one we envision for AgroPortal. III. A PORTAL FOR AGRONOMIC RELATED ONTOLOGIES We have clearly identified that the NCBO technology was the one that implements the most of the features (ontology & mapping repository, annotator, recommender, community support, etc.) the community would certainly be interested in, while being aware of the technical challenges of developing such a various and complex software. In addition, our vision is to adopt, as the NCBO did, an open and generic approach where users can themselves easily participate to the platform, upload and comment content (ontologies, mappings, projects). Plus, there are two major motivations for AgroPortal to reusing the outcomes of biomedicine: (i) to avoid re-developing technologies that have already been designed and extensively Fig. 1. Screenshots from the AgroPortal user interface used; (ii) to offer the same tools, services and formats to both community to facilitate the interface and interaction between the domains e.g., to enable a user to query the BioPortal or the IV. DRIVING AGRONOMIC USE CASES AgroPortal without changing a line of code. A. Agronomic Linked Data (AgroLD) within IBC We have developed and deployed an advanced prototype The Computational Biology Institute of Montpellier (IBC – platform (v1.0 beta released in January 2016) http://www.ibc-montpellier.fr), develops methods for data http://agroportal.lirmm.fr – that currently hosts 49 ontologies – integration and knowledge management within agronomic including 28 not originally present in BioPortal2 – and we are sciences to improve information accessibility and working on 37 candidate ontologies. The platform counts interoperability. The project is interested in identifying genes already 38 registered users. The features offered by the portal controlling roots and panicle branching as well as genes are for example: (i) to search across all the ontologies, (ii) to orthologous relationship for rice genes families. Using 8 annotate a piece of text with all the ontologies, (iii) to store and ontologies for annotation, the project has built the AgroLD serve mappings between ontologies within the portal and with RDF knowledge base (http://agrold.org) that integrates data the NCBO BioPortal. All other features from BioPortal are from a variety of plant resources (e.g., Gramene, SouthGreen, generically available for the AgroPortal: ontology versioning, UniProtKB, OryGeneDB) and provides a portal for UI widget, ontology metrics, ontology recommender service, bioinformaticians to exploit the homogenized data models to projects listing, community feedback (comment, subscription efficiently build research hypotheses [6]. to ontology changes), users’ management (and public or private access to ontologies). In addition, two endpoints allow B. RDA Wheat Data Interoperability (WDI) working group automatic querying of the content of the portal: (i) a REST web service API (http://data.agroportal.lirmm.fr) and (ii) a The WDI working group is part of the Research Data Alliance (RDA – https://rd-alliance.org). Its goal is to provide a 2 As of now, for technical reasons, we had to duplicate a few ontologies but common framework for describing, representing, linking and the long term vision is an interconnected network of bioportals that will publishing wheat data with respect to open standards. One of enable anyone to access easily an ontology independently from where it’s the needs identified by the group is to offer a repository of actually hosted. linked vocabularies and ontologies that are relevant for wheat. Although multiple questions need to be addressed, we do NCBO technology has been identified as suitable tool to believe that the NCBO technology is a good candidate for this address this need allowing one to search for terms across project and we see here an opportunity to capitalize technology multiple vocabularies and ontologies, browse mappings and scientific outcomes of the ten last years. between terms, receive recommendations on which vocabularies and ontologies are most relevant for a corpus and Considering the position of the current NCBO BioPortal annotate text with terms. The WDI is maintaining a list of and the importance of having such an equivalent repository of vocabularies and ontologies within an AgroPortal specific slice ontologies for the agronomic, environment and plant sciences, (http://wheat.agroportal.lirmm.fr) which has been reported in we therefore expect a broad adoption of the AgroPortal in the the WDI’s set of guidelines for wheat data description community. The implication of associated partners (IBC, IRD, (http://datastandards.wheatis.org). More recently, two other CIRAD, INRA, Bioversity International) illustrates the impact RDA working groups (Rice Data Interoperability and and interests first in France, but also internationally (e.g., AgriSemantics) have expressed interest in using AgroPortal as Planteome, Elixir, BioSharing, EBI, FAO). Making available a backbone for data integration and/or standardization. such a portal allows the researchers to focus on the development of new ontologies and mappings between ontologies with the perspective of leveraging them in their C. INRA Linked Open Vocabularies (LovInra) research and not being afraid of producing an additional piece LovInra is an effort to publish vocabularies produced or co- in the big data cake. Exporting NCBO research results and produced by INRA scientists and foster their reuse beyond the technology contributes to long term support of that technology original researchers. Many of such resources developed within while reinforce the connections with the biomedical domain. specific focus projects remain unknown to the research community despite of their value. To achieve this goal, there is In the future we will identify more potential users for the a clear need to publish the vocabularies with respect to open portal and support new research scenarios. For each ontology standards and link them to existing resources. Here again, available in the portal, we will go through an extensive NCBO technology has been identified a suitable repository for description of its metadata in order for the portal to facilitate this third used case. A specific group of ontologies has been the comprehension of the landscape of agronomical ontologies. setup in AgroPortal for ontologies produced or used by INRA and we are helping ontology editors to follow the semantic web ACKNOWLEDGMENT standards when making their ontologies sharable and available. This work is partly achieved within by Semantic Indexing of French biomedical Resources (SIFR – www.lirmm.fr/sifr) project funded by the French National Research Agency, grant ANR-12-JS02-01001, the NUMEV D. The Crop Ontology project Labex (www.lirmm.fr/numev), grant ANR-10-LABX-20, the Computational The Crop Ontology project (www.cropontology.org) of the Biology Institute of Montpellier (www.ibc-montpellier.fr), grant ANR-11- BINF-0002 as well as by University of Montpellier and the CNRS. We also Consultative Group on International Agricultural Research thank the National Center for Biomedical Ontologies for help and time spent (CGIAR) is AgroPortal’s fourth use case. The main goals of with us in deploying the AgroPortal. this project are: to publish online fully documented lists of breeding traits used for producing standard field books; and to REFERENCES support data analysis and integration of genetic and phenotypic data through harmonized breeders’ data annotation. The project [1] R. Shrestha, E. Arnaud, R. Mauleon, M. Senger, G. F. Davenport, D. Hancock, N. Morrison, R. Bruskiewich, and G. McLaren, also offers a forum for scientists to discuss their variables, “Multifunctional crop trait ontology for breeders’ data: field book, methods and scales of measurement, and field-books. We work annotation, data discovery and semantic enrichment of the literature,” on leveraging the backend of the cropontology.org web AoB Plants, vol. 2010, May 2010. application with the AgroPortal web service API, while [2] T. Baker and O. Suominen, “Global Agricultural Concept Scheme keeping the current web application as the primary point of (GACS): A multilingual thesaurus hub for Linked Data.” Report, 2014. access. We actually offer new functionalities to the Crop [3] N. F. Noy, N. H. Shah, P. L. Whetzel, B. Dai, M. Dorf, N. B. Griffith, Ontology community such as versioning, SPARQL endpoint, C. Jonquet, D. L. Rubin, M.-A. Storey, C. G. Chute, and M. A. Musen, “BioPortal: ontologies and integrated data resources at the click of a notes, the annotation tool, while not breaking the uses of the mouse,” Nucleic Acids Research, vol. 37, pp. 170–173, May 2009. current application. In addition, we work on supporting the [4] L. Matteis, P. Chibon, H. Espinosa, M. Skofic, R. Finkers, alignment (or mapping) of terms within and across different R. Bruskiewich, and E. Arnaud, “Crop ontology: vocabulary for crop- plant related ontologies: both within the crop ontologies related concepts.,” in 1st International Workshop on Semantics for themselves (in different crop branch) or with other reference Biodiversity (P. Larmande, E. Arnaud, I. Mougenot, C. Jonquet, ontologies commonly used in plant biology. T. Libourel, and M. Ruiz, eds.), vol. CEUR Workshop Proceedings, (Montpellier, France), pp. 37–46, May 2013. [5] P. Jaiswal, L. Cooper, J. L. Elser, A. Meier, M.-A. Laporte, C. Mungall, V. CONCLUSION B. Smith, E. K. Johnson, M. Seymour, J. Preece, X. Xu, R. S. Kitchen, B. Qu, E. Zhang, E. Arnaud, S. Carbon, S. Todorovic, and D. W. In this paper we have briefly introduced AgroPortal, an Stevenson, “Planteome: A resource for Common Reference Ontologies open ontology repository for the agronomy domain. We have and Applications for Plant Biology,” in 24th Plant and Animal Genome discussed four use cases that are already using the portal to Conference, PAG’16, (San Diego, USA), January 2016. support their work on data interoperability. The thematic [6] A. Venkatesan, P. Larmande, C. Jonquet, M. Ruiz, and P. Valduriez, boundaries of the portal are not precisely defined yet, (e.g., “Facilitating efficient knowledge management and discovery in the agriculture also includes animals) and it will be to the users to Agronomic Sciences,” in 4th Plenary Meeting of the Research Data Alliance, (Amsterdam, The Netherlands), September 2014. express what they expect to find into such a repository.