=Paper=
{{Paper
|id=Vol-1747/IT406-IP35_ICBO2016
|storemode=property
|title=The Planteome Project
|pdfUrl=https://ceur-ws.org/Vol-1747/IT406-IP35_ICBO2016.pdf
|volume=Vol-1747
|authors=Laurel Cooper,Austin Meier,Justin Elser,Justin Preece,Xu Xu,Ryan Kitchen,Botong Qu,Eugene Zhang,Sinisa Todorovic,Pankaj Jaiswal,Marie-Angélique Laporte,Elizabeth Arnaud,Seth Carbon,Chris Mungall,Barry Smith,Georgios Gkoutos,John Doonan
|dblpUrl=https://dblp.org/rec/conf/icbo/CooperMEPXKQZTJ16
}}
==The Planteome Project ==
The Planteome Project Barry Smith Laurel Cooper, Austin Meier, Justin L. Elser, Justin University at Buffalo, Buffalo, NY, USA Preece, Xu Xu, Ryan S. Kitchen, Botong Qu, Eugene Zhang, Sinisa Todorovic, Pankaj Jaiswal Georgios Gkoutos Oregon State University, Corvallis, OR, USA University of Birmingham, UK and University of Aberystwyth, UK Marie-Angélique Laporte, Elizabeth Arnaud John Doonan Bioversity International, Montpellier, France University of Aberystwyth, UK Seth Carbon, Chris Mungall Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract— The Planteome project is a centralized online plant knowledge gained from the next-generation data can be informatics portal which provides semantic integration of widely utilized for crop improvement. diverse datasets with the goal of plant improvement. Traditional plant breeding methods for crop improvement may be combined B. What is the Planteome? with next-generation analysis methods and automated scoring of The Planteome Project (www.planteome.org) is a traits and phenotypes to develop improved varieties. The centralized online informatics portal and database, consisting Planteome project (www.planteome.org) develops and hosts a suite of reference ontologies for plants associated with a growing of a suite of reference ontologies for plants, an associated corpus of genomics data. Data annotations linking phenotypes and corpus of plant genomics and phenomics data, and tools for germplasm to genomics resources are achieved by data data analysis and annotation. Analyses of these data sets from transformation and mapping species-specific controlled genetic and genomic studies have the potential to improve our vocabularies to the reference ontologies. Analysis and annotation understanding of the molecular basis of economically relevant tools are being developed to facilitate studies of plant traits, traits. In order to utilize this data, researchers must be able to phenotypes, diseases, gene function and expression and genetic connect the relevant plant traits of interest to the spatial and diversity data across a wide range of plant species. The project temporal expression patterns of genes, and elucidate their roles database and the online resources provide researchers tools to in biological processes in plants. search and browse and access remotely via APIs for semantic integration in annotation tools and data repositories providing resources for plant biology, breeding, genomics and genetics. C. Goals of the Planteome Project: Keywords—ontology; traits phenotype; semantic; data integration, 1. A suite of interrelated reference ontologies to describe plants major knowledge domains of plant biology, comprising plant phenotype and traits, environments, I. INTRODUCTION and biotic and abiotic stresses. 2. Standards, workflows and tools for annotation of plant A. Rationale genomics data, and metadata for curation and It is estimated that the world population is projected to improved annotation of genes, genomes, phenotype reach 9.6 billion people in next few decades and germplasm. (http://www.wri.org/blog/2013/12/global-food-challenge- explained-18-graphics). Therefore, the challenge is how to feed 3. The Planteome browser and database, a centralized, this growing population, while protecting the earth’s online informatics portal and repository where environment. Traditional plant breeding methods for plant reference ontologies for plants are used to access data improvement may be combined with next-generation analysis resources for plant traits, phenotypes, diseases, gene methods, including the high-throughput and automated scoring of traits and phenotypes to develop improved varieties. Data expression and genetic diversity data across a wide from high-throughput sequencing, transcriptomic, proteomic, range of plant species. phenomic and genome annotation projects can be linked to 4. Outreach involving the plant research community and germplasm resources through the use of interoperable, K-12 and undergraduate students. reference vocabularies (ontologies). In this way, the II. THE SCOPE OF THE PLANTEOME mappings to the reference ontologies and link phenotypes and The scope of the ontologies in the Planteome project ranges germplasm to genomics resources. from a broad overview of plant environments and taxonomy, to the cellular and molecular level of expressed genes and their III. DEVELOPMENT OF THE PLANTEOME ONTOLOGY biological functions. The Planteome ontologies, described in NETWORK more detail below, consist of the Plant Ontology (PO) [1-6], Plant Trait Ontology (TO) [7, 8], the Plant Environment The development of the Planteome Project ontology Ontology (EO) [7] and the Plant Stress Ontology (PSO). The network is a fundamental change in the way of thinking about Planteome project imports and integrates with relevant ontologies for plants. In the previous project, the Plant reference ontologies developed by collaborating groups; the Ontology (http://www.plantontology.org/), a single reference Gene Ontology (GO) [9, 10], the Phenotypic Qualities ontology was developed and used to annotate plant genomic Ontology (PATO) [11], the Environment Ontology (ENVO) data to ontology terms describing plant structures and plant [12], and the Chemical Entities of Biological Interest (ChEBI) developmental stages. The addition of the other reference and [13]. In addition, the Planteome integrates and maps species- or species-specifc ontologies for plants enriches the annotation clade-specific application ontologies developed by the Crop environment so a more complete picture of the metadata of Ontology (CO) project [14]. Together this suite of reference plant pheotypes can be expressed. ontologies can be used to fully annotate and link together the vital plant knowledge domain. In order to create the network, ontology terms in the TO and the species-specifc crop trait ontologies have been The central reference ontology for plant anatomy and plant ‘decomposed’ into the corresponding Entity (E) - Quality (Q) developmental stages, the Plant Ontology (PO) [1-6] grew out statements which utilize terms from the other reference of the need to create associations between standardized ontologies, such as PO and GO for the entities and PATO for terminology for plants and genomics data, and was based the the qualities. In this way, a network is formed which links all work done to develop the Gene Ontology in the late 1990s [9, the various ontologies together. 10]. The PO is recognized worldwide as the reference ontology for plant structures and developmental stages, and is linked to One of the lessons learned in developing this network is that data from a wide variety of plants, from traditional model some of the reference ontologies and vocabularies developed species to the crop plants that feed the world's growing by our collaborators (such as ChEBI, and the NCBI population. Taxonomy) are so large that they are cumbersome to display on our browser. For these, we have developed script to extract Plant improvement relies on analyses of plant traits and a relevant “slim” version which contains the needed terms. phenotypes. For these purposes, the Plant Trait Ontology (TO) [9, 10] describes a wide range of precomposed plant traits IV. PLANTEOME ANNOTATION DATABASE consistent with Entity (E) - Quality (Q) statements and leads to an understanding of the molecular processes that underlie The Planteome database provides ontology terms and them. Each trait is a measurable or observable characteristic of definitions along with the associated ‘annotations’ [15], a plant structure (PO:000901), a plant cellular component between the ontology terms and data sourced from numerous (GO:0005575), or a plant structure development stage plant genomics data sets. The Planteome 1.0 Beta Release (PO:0009012), as well as plant biological processes (Nov. 2015) contains about 47 million annotations linking (GO:0008150) and molecular functions (GO:0003674). The reference ontology terms to data objects representing genes, TO encompasses nine broad, upper-level categories of plant gene models, proteins, RNAs, germplasm and quantitative traits: biochemical trait (TO:0000277), biological process trait trait loci (QTLs) from 87 different plant species. These data (TO:0000283), plant growth and development trait are currently contributed by 29 different data sources. (TO:0000357), plant morphology trait (TO:0000017), quality Planteome curators and researchers at various collaborating trait (TO:0000597), stature or vigor trait (TO:0000133), database groups work closely to develop the annotation files sterility or fertility trait (TO:0000392), stress trait in the standardized data format database. The database is (TO:0000164) and yield trait (TO:0000371). accessible online (http://planteome.org/) and also available for bulk download The Plant Environment Ontology (EO) is used to describe (http://palea.cgrb.oregonstate.edu/viewsvn/associations/). the plant growth conditions and study types and can be combined with the terms from the other reference ontologies to The annotation database includes functional Gene Ontology fully annotate a plant phenotype description. annotations for 60 species. These predictions were done using two methods. The first method utilized an InterProScan [16] In addition to the reference ontologies, the Planteome works to identify protein domains. The resulting analysis files were closely with developers of the species-specific vocabularies then parsed to associate the protein domains to GO terms. The such as the Crop Ontology [14] to integrate their terms, create second method was to project ontology annotations based on Fig. 1. Annotation of Rice brd1 mutant with reference ontology terms to and facilitates gene discovery through inter- and intra-species capture the phenotype. The rice plant image is adapted with permission from comparisons. [19] © John Wiley and Sons. VI. PLANTEOME TOOLS FOR COLLABORATION AND orthology to Arabidopsis thaliana genes. Orthology was ONTOLOGY INTEGRATION predicted with InParanoid [17], a program that takes The Planteome project is developing a number of tools to reciprocal BLAST output and uses pairwise similarity scores increase access to the ontology terms and to increase the to determine orthologous clusters of genes. This is followed interoperability of the annotated data. by creating gene super clusters by pooling species-pair All the Planteome ontologies are publically available and clusters with common genes. The orthologous super clusters are maintained at the Planteome GitHub site of the 60 species were compared with the known annotation (https://github.com/Planteome) for sharing and tracking files for Arabidopsis thaliana for GO, and new annotation revsions. This site facilitates community feedback; users can files were generated. Planteome is the only online source make comments, request terms and suggest changes to the providing GO functional annotation of genes identified for Planteome ontologies. In addition, the Planteome GitHub site many of these species. also features species-specific vocabularies such as those from Crop Ontology (http://www.cropontology.org/). V. CASE STUDY EXAMPLE: PHENOTYPE ANNOTATION OF Another new tool which is under development is a Trait RICE BRASSINOSTEROID (BR)-DEFICIENT DWARF MUTANT Ontology-specific (http://to.termgenie.org/) instance of the Brassinosteroid (BR)-deficient (brd1) dwarf mutants of rice TermGenie tool [20]. TermGenie uses a pattern-based were characterized to determine the roles that BRs play in approach to rapidly generate new terms and place them normal plant growth and development in a monocot plant [19]. appropriately within the ontology structure. All terms are Fig. 1 shows an example of how the reference ontologies can reviewed by a Planteome curator before the final commit to the be used to annotate the phenotype of a (BR)-Deficient dwarf ontology. TermGenie can be used to quickly obtain a TO term mutant rice, brd1-1. This image is a compliation of ontology for annotation, if an appriopriate one does not already exist. terms from various Planteome reference ontologies that have Planteome is developing an application programming been used to annotate the expression of brd1 (Os03g0602300) interface (API) that will allow collaborators to access and use in the Planteome database. These annotations were contributed the hosted data in their web sites and applications. The first two from a variety of sources, such as Gramene API methods – currently accessible from the Planteome (http://www.gramene.org/), EnsemblPlants development environment – query Planteome-hosted (http://plants.ensembl.org/index.html), and The Rice ontologies for terms, term definitions, and other attributes, Annotation Project (RAP) (http://rapdb.dna.affrc.go.jp/) and returning them in JSON format. The “search” method is fast can be used to describe all aspects of the brd1 mutant enough to be used in an autocomplete search box. phenotype. All the Planteome reference and species-specific ontologies are Gathering the annotations together in a unified platform available through the API service. Currently, the API only such as the Planteome allows the data to be made accessible serves the term information, but the Planteome project plans to add API methods to access annotation data, as well. The Planteome project is collaborating with the Bisque [7] Jaiswal P, Ware D, Ni J, Chang K, Zhao W, Schmidt S, et al. (2002) Image Analysis Environment (Center for Bio-Image Gramene: development and integration of trait and gene ontologies for rice. Comparative and Functional Genomics 3: 132–136. Informatics, UCSB; http://www.cyverse.org/bisque) on [8] Arnaud E, Cooper L, Shrestha R, Menda N, Nelson RT, Matteis L, integrated image segmentation and ontology annotation et al. (2012) Towards a reference Plant Trait Ontology for modeling features. The Planteome project already hosts such a tool as a knowledge of plant traits and phenotypes. Proceedings of the desktop application; Annotation of Image Segments with International Conference on Knowledge Engineering and Ontology Ontologies (AISO; http://planteome.org/node/3), but we wish Development. Barcelona, Spain, pp 220–225. to move its functionality online as a module within Bisque, [9] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, taking advantage of its shared CyVerse authentication, data et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25: 25–29. store, and computation infrastructure. The ontology data itself [10] The Gene Ontology Consortium (2014) Gene Ontology Consortium: will be served from external services, such as the Planteome going forward. Nucleic Acids Research. doi: 10.1093/nar/gku1179. API. [11] Gkoutos G, Green E, Mallon A-M, Hancock J, Davidson D (2004) Using ontologies to describe mouse phenotypes. Genome Biol 6: R8 VII. CONCLUSIONS [12] Buttigieg P, Morrison N, Smith B, Mungall C, Lewis S (2013) The environment ontology: contextualising biological and biomedical The Planteome project is a centralized online plant entities. Journal of Biomedical Semantics 4: 43 informatics portal and which integrates reference ontologies for [13] Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan plants, and species-specific controlled vocabularies with a V, et al. (2016) ChEBI in 2016: Improved services and an expanding large and growing corpus of plant genomics data. This collection of metabolites. Nucleic Acids Research 44: D1214–D1219 platform provides semantic integration of widely diverse [14] Shrestha, R, Davenport, GF Bruskiewich, R, Arnaud, E. (2011) Development of crop ontology for sharing crop phenotypic information. datasets with the goal of plant improvement. Drought phenotyping in crops: from theory to practice. pp 171–179 [15] Hill DP, Smith B, McAndrews-Hill MS, Blake J (2008) Gene Ontology annotations: what they mean and where they come from. BMC ACKNOWLEDGMENT Bioinformatics 9: S2 Funding for the Planteome project is provided by the [16] Quevillon E, Silventoinen V, Pillai S, et al. 2005. InterProScan: National Science Foundation award IOS #1340112 protein domains identifier. Nucleic Acids Research. 33(Web Server issue):W116-W120. doi:10.1093/nar/gki442. [17] Remm M, Storm CEV and Sonnhammer ELL (2001). Automatic REFERENCES Clustering of Orthologs and In-paralogs from Pairwise Species Comparisons. JMB, 314:1041-1052. [1] Jaiswal, P, S Avraham, K Ilic, EA Kellogg, S McCouch, A Pujar, et [18] Altschul, SF, Madden, TL, Schäffer, AA, Zhang, J, Zhang, Z, al., 2005. Plant Ontology (PO): A Controlled Vocabulary of Plant Miller, W, et al. (1997). Gapped BLAST and PSI-BLAST: a new Structures and Growth Stages. Comp Funct Genomics,. 6(7--‐‑8): p. 388- generation of protein database search programs. Nucleic Acids Res. 97 (references) 25:3389-3402. [2] Pujar, A, P Jaiswal, EA Kellogg, K Ilic, L Vincent, S Avraham, et [19] Hong, Z, Ueguchi-Tanaka, M, Shimizu-Sato, S, Inukai, Y, Fujioka, al. 2006. Whole-‐‑plant growth stage ontology for angiosperms and its S, Shimada, Y, et al (2002) Loss-of-function of a rice brassinosteroid application in plant biology. Plant Physiol, 142(2): p. 414--‐‑28. biosynthetic enzyme, C-6 oxidase, prevents the organized arrangement [3] Ilic, K, EA Kellogg, P Jaiswal, F Zapata, PF Stevens, LP Vincent, et and polar elongation of cells in the leaves and stem. The Plant Journal al., 2007. The plant structure ontology, a unified vocabulary of anatomy 32: 495–508 and morphology of a flowering plant. Plant Physiol. 143(2): p. 587--‐‑599. [20] Dietze, H, Berardini, T, Foulger, R, Hill, D, Lomax, J, [4] Avraham, S, CW Tung, K Ilic, P Jaiswal, EA Kellogg, S McCouch, OsumiSutherland, D, Roncaglia P, Mungall C (2014) TermGenie - A web application for pattern-based ontology class generation. Journal of et al., 2008. The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and Biomedical Semantics 5: 48 annotations. Nucleic Acids Res., 36(Database issue): p. D449--‐‑54.. [21] Lingutla N, Preece J, Todorovic S, Cooper L, Moore L, Jaiswal P (2014) AISO: Annotation of Image Segments with Ontologies. Journal [5] Cooper L, Walls RL, Elser J, Gandolfo MA, Stevenson DW, Smith B, et al. (2013) The Plant Ontology as a tool for comparative plant of Biomedical Semantics 5: 50 anatomy and genomic analyses. Plant and Cell Physiology 54: e1–e1 [6] Cooper L and Jaiswal P (2016) The Plant Ontology: A Tool for Plant Genomics. In D Edwards, ed, Plant Bioinformatics. Springer New York, pp 89–114