=Paper= {{Paper |id=Vol-1747/IT406-IP35_ICBO2016 |storemode=property |title=The Planteome Project |pdfUrl=https://ceur-ws.org/Vol-1747/IT406-IP35_ICBO2016.pdf |volume=Vol-1747 |authors=Laurel Cooper,Austin Meier,Justin Elser,Justin Preece,Xu Xu,Ryan Kitchen,Botong Qu,Eugene Zhang,Sinisa Todorovic,Pankaj Jaiswal,Marie-Angélique Laporte,Elizabeth Arnaud,Seth Carbon,Chris Mungall,Barry Smith,Georgios Gkoutos,John Doonan |dblpUrl=https://dblp.org/rec/conf/icbo/CooperMEPXKQZTJ16 }} ==The Planteome Project == https://ceur-ws.org/Vol-1747/IT406-IP35_ICBO2016.pdf
                                       The Planteome Project
                                                                                            Barry Smith
 Laurel Cooper, Austin Meier, Justin L. Elser, Justin                         University at Buffalo, Buffalo, NY, USA
Preece, Xu Xu, Ryan S. Kitchen, Botong Qu, Eugene
      Zhang, Sinisa Todorovic, Pankaj Jaiswal                                            Georgios Gkoutos
      Oregon State University, Corvallis, OR, USA                        University of Birmingham, UK and University of
                                                                                        Aberystwyth, UK
    Marie-Angélique Laporte, Elizabeth Arnaud
                                                                                            John Doonan
       Bioversity International, Montpellier, France
                                                                                   University of Aberystwyth, UK
              Seth Carbon, Chris Mungall
 Lawrence Berkeley National Laboratory, Berkeley, CA,
                        USA



    Abstract— The Planteome project is a centralized online plant    knowledge gained from the next-generation data can be
informatics portal which provides semantic integration of widely     utilized for crop improvement.
diverse datasets with the goal of plant improvement. Traditional
plant breeding methods for crop improvement may be combined          B. What is the Planteome?
with next-generation analysis methods and automated scoring of           The Planteome Project (www.planteome.org) is a
traits and phenotypes to develop improved varieties. The
                                                                     centralized online informatics portal and database, consisting
Planteome project (www.planteome.org) develops and hosts a
suite of reference ontologies for plants associated with a growing   of a suite of reference ontologies for plants, an associated
corpus of genomics data. Data annotations linking phenotypes and     corpus of plant genomics and phenomics data, and tools for
germplasm to genomics resources are achieved by data                 data analysis and annotation. Analyses of these data sets from
transformation and mapping species-specific controlled               genetic and genomic studies have the potential to improve our
vocabularies to the reference ontologies. Analysis and annotation    understanding of the molecular basis of economically relevant
tools are being developed to facilitate studies of plant traits,     traits. In order to utilize this data, researchers must be able to
phenotypes, diseases, gene function and expression and genetic       connect the relevant plant traits of interest to the spatial and
diversity data across a wide range of plant species. The project     temporal expression patterns of genes, and elucidate their roles
database and the online resources provide researchers tools to
                                                                     in biological processes in plants.
search and browse and access remotely via APIs for semantic
integration in annotation tools and data repositories providing
resources for plant biology, breeding, genomics and genetics.        C. Goals of the Planteome Project:

Keywords—ontology; traits phenotype; semantic; data integration,         1.   A suite of interrelated reference ontologies to describe
plants                                                                        major knowledge domains of plant biology,
                                                                              comprising plant phenotype and traits, environments,
                      I.     INTRODUCTION                                     and biotic and abiotic stresses.

                                                                         2.   Standards, workflows and tools for annotation of plant
A. Rationale
                                                                              genomics data, and metadata for curation and
    It is estimated that the world population is projected to
                                                                              improved annotation of genes, genomes, phenotype
reach 9.6 billion people in next few decades
                                                                              and germplasm.
(http://www.wri.org/blog/2013/12/global-food-challenge-
explained-18-graphics). Therefore, the challenge is how to feed
                                                                         3.   The Planteome browser and database, a centralized,
this growing population, while protecting the earth’s
                                                                              online informatics portal and repository where
environment. Traditional plant breeding methods for plant
                                                                              reference ontologies for plants are used to access data
improvement may be combined with next-generation analysis
                                                                              resources for plant traits, phenotypes, diseases, gene
methods, including the high-throughput and automated scoring
of traits and phenotypes to develop improved varieties. Data                  expression and genetic diversity data across a wide
from high-throughput sequencing, transcriptomic, proteomic,                   range of plant species.
phenomic and genome annotation projects can be linked to
                                                                         4.   Outreach involving the plant research community and
germplasm resources through the use of interoperable,
                                                                              K-12 and undergraduate students.
reference vocabularies (ontologies). In this way, the
            II.     THE SCOPE OF THE PLANTEOME                     mappings to the reference ontologies and link phenotypes and
   The scope of the ontologies in the Planteome project ranges     germplasm to genomics resources.
from a broad overview of plant environments and taxonomy, to
the cellular and molecular level of expressed genes and their          III.     DEVELOPMENT OF THE PLANTEOME ONTOLOGY
biological functions. The Planteome ontologies, described in                              NETWORK
more detail below, consist of the Plant Ontology (PO) [1-6],
Plant Trait Ontology (TO) [7, 8], the Plant Environment               The development of the Planteome Project ontology
Ontology (EO) [7] and the Plant Stress Ontology (PSO). The         network is a fundamental change in the way of thinking about
Planteome project imports and integrates with relevant             ontologies for plants. In the previous project, the Plant
reference ontologies developed by collaborating groups; the        Ontology (http://www.plantontology.org/), a single reference
Gene Ontology (GO) [9, 10], the Phenotypic Qualities               ontology was developed and used to annotate plant genomic
Ontology (PATO) [11], the Environment Ontology (ENVO)              data to ontology terms describing plant structures and plant
[12], and the Chemical Entities of Biological Interest (ChEBI)     developmental stages. The addition of the other reference and
[13]. In addition, the Planteome integrates and maps species- or   species-specifc ontologies for plants enriches the annotation
clade-specific application ontologies developed by the Crop        environment so a more complete picture of the metadata of
Ontology (CO) project [14]. Together this suite of reference       plant pheotypes can be expressed.
ontologies can be used to fully annotate and link together the
vital plant knowledge domain.                                         In order to create the network, ontology terms in the TO and
                                                                   the species-specifc crop trait ontologies have been
   The central reference ontology for plant anatomy and plant      ‘decomposed’ into the corresponding Entity (E) - Quality (Q)
developmental stages, the Plant Ontology (PO) [1-6] grew out       statements which utilize terms from the other reference
of the need to create associations between standardized            ontologies, such as PO and GO for the entities and PATO for
terminology for plants and genomics data, and was based the        the qualities. In this way, a network is formed which links all
work done to develop the Gene Ontology in the late 1990s [9,       the various ontologies together.
10]. The PO is recognized worldwide as the reference ontology
for plant structures and developmental stages, and is linked to       One of the lessons learned in developing this network is that
data from a wide variety of plants, from traditional model         some of the reference ontologies and vocabularies developed
species to the crop plants that feed the world's growing           by our collaborators (such as ChEBI, and the NCBI
population.                                                        Taxonomy) are so large that they are cumbersome to display
                                                                   on our browser. For these, we have developed script to extract
   Plant improvement relies on analyses of plant traits and        a relevant “slim” version which contains the needed terms.
phenotypes. For these purposes, the Plant Trait Ontology (TO)
[9, 10] describes a wide range of precomposed plant traits                    IV.   PLANTEOME ANNOTATION DATABASE
consistent with Entity (E) - Quality (Q) statements and leads to
an understanding of the molecular processes that underlie          The Planteome database provides ontology terms and
them. Each trait is a measurable or observable characteristic of   definitions along with the associated ‘annotations’ [15],
a plant structure (PO:000901), a plant cellular component          between the ontology terms and data sourced from numerous
(GO:0005575), or a plant structure development stage               plant genomics data sets. The Planteome 1.0 Beta Release
(PO:0009012), as well as plant biological processes                (Nov. 2015) contains about 47 million annotations linking
(GO:0008150) and molecular functions (GO:0003674). The             reference ontology terms to data objects representing genes,
TO encompasses nine broad, upper-level categories of plant         gene models, proteins, RNAs, germplasm and quantitative
traits: biochemical trait (TO:0000277), biological process trait   trait loci (QTLs) from 87 different plant species. These data
(TO:0000283), plant growth and development trait                   are currently contributed by 29 different data sources.
(TO:0000357), plant morphology trait (TO:0000017), quality         Planteome curators and researchers at various collaborating
trait (TO:0000597), stature or vigor trait (TO:0000133),           database groups work closely to develop the annotation files
sterility or fertility trait (TO:0000392), stress trait            in the standardized data format database. The database is
(TO:0000164) and yield trait (TO:0000371).                         accessible online (http://planteome.org/) and also available for
                                                                   bulk                                                 download
   The Plant Environment Ontology (EO) is used to describe         (http://palea.cgrb.oregonstate.edu/viewsvn/associations/).
the plant growth conditions and study types and can be
combined with the terms from the other reference ontologies to        The annotation database includes functional Gene Ontology
fully annotate a plant phenotype description.                      annotations for 60 species. These predictions were done using
                                                                   two methods. The first method utilized an InterProScan [16]
   In addition to the reference ontologies, the Planteome works    to identify protein domains. The resulting analysis files were
closely with developers of the species-specific vocabularies       then parsed to associate the protein domains to GO terms. The
such as the Crop Ontology [14] to integrate their terms, create    second method was to project ontology annotations based on
Fig. 1. Annotation of Rice brd1 mutant with reference ontology terms to       and facilitates gene discovery through inter- and intra-species
capture the phenotype. The rice plant image is adapted with permission from   comparisons.
[19] © John Wiley and Sons.

                                                                               VI.  PLANTEOME TOOLS FOR COLLABORATION AND
orthology to Arabidopsis thaliana genes. Orthology was                        ONTOLOGY INTEGRATION
predicted with InParanoid [17], a program that takes                              The Planteome project is developing a number of tools to
reciprocal BLAST output and uses pairwise similarity scores                   increase access to the ontology terms and to increase the
to determine orthologous clusters of genes. This is followed                  interoperability of the annotated data.
by creating gene super clusters by pooling species-pair
                                                                                  All the Planteome ontologies are publically available and
clusters with common genes. The orthologous super clusters
                                                                              are     maintained    at    the   Planteome     GitHub      site
of the 60 species were compared with the known annotation
                                                                              (https://github.com/Planteome) for sharing and tracking
files for Arabidopsis thaliana for GO, and new annotation                     revsions. This site facilitates community feedback; users can
files were generated. Planteome is the only online source                     make comments, request terms and suggest changes to the
providing GO functional annotation of genes identified for                    Planteome ontologies. In addition, the Planteome GitHub site
many of these species.                                                        also features species-specific vocabularies such as those from
                                                                              Crop Ontology (http://www.cropontology.org/).
  V.      CASE STUDY EXAMPLE: PHENOTYPE ANNOTATION OF                             Another new tool which is under development is a Trait
    RICE BRASSINOSTEROID (BR)-DEFICIENT DWARF MUTANT                          Ontology-specific (http://to.termgenie.org/) instance of the
    Brassinosteroid (BR)-deficient (brd1) dwarf mutants of rice               TermGenie tool [20].         TermGenie uses a pattern-based
were characterized to determine the roles that BRs play in                    approach to rapidly generate new terms and place them
normal plant growth and development in a monocot plant [19].                  appropriately within the ontology structure. All terms are
Fig. 1 shows an example of how the reference ontologies can                   reviewed by a Planteome curator before the final commit to the
be used to annotate the phenotype of a (BR)-Deficient dwarf                   ontology. TermGenie can be used to quickly obtain a TO term
mutant rice, brd1-1. This image is a compliation of ontology                  for annotation, if an appriopriate one does not already exist.
terms from various Planteome reference ontologies that have                       Planteome is developing an application programming
been used to annotate the expression of brd1 (Os03g0602300)                   interface (API) that will allow collaborators to access and use
in the Planteome database. These annotations were contributed                 the hosted data in their web sites and applications. The first two
from a variety of sources, such as Gramene                                    API methods – currently accessible from the Planteome
(http://www.gramene.org/),                      EnsemblPlants                 development environment – query Planteome-hosted
(http://plants.ensembl.org/index.html),    and   The      Rice                ontologies for terms, term definitions, and other attributes,
Annotation Project (RAP) (http://rapdb.dna.affrc.go.jp/) and                  returning them in JSON format. The “search” method is fast
can be used to describe all aspects of the brd1 mutant                        enough to be used in an autocomplete search box.
phenotype.
                                                                              All the Planteome reference and species-specific ontologies are
   Gathering the annotations together in a unified platform                   available through the API service. Currently, the API only
such as the Planteome allows the data to be made accessible                   serves the term information, but the Planteome project plans to
                                                                              add API methods to access annotation data, as well.
    The Planteome project is collaborating with the Bisque                          [7]  Jaiswal P, Ware D, Ni J, Chang K, Zhao W, Schmidt S, et al. (2002)
Image Analysis Environment (Center for Bio-Image                                         Gramene: development and integration of trait and gene ontologies for
                                                                                         rice. Comparative and Functional Genomics 3: 132–136.
Informatics, UCSB; http://www.cyverse.org/bisque) on
                                                                                    [8] Arnaud E, Cooper L, Shrestha R, Menda N, Nelson RT, Matteis L,
integrated image segmentation and ontology annotation                                    et al. (2012) Towards a reference Plant Trait Ontology for modeling
features. The Planteome project already hosts such a tool as a                           knowledge of plant traits and phenotypes. Proceedings of the
desktop application; Annotation of Image Segments with                                   International Conference on Knowledge Engineering and Ontology
Ontologies (AISO; http://planteome.org/node/3), but we wish                              Development. Barcelona, Spain, pp 220–225.
to move its functionality online as a module within Bisque,                         [9] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,
taking advantage of its shared CyVerse authentication, data                              et al. (2000) Gene Ontology: tool for the unification of biology. Nat
                                                                                         Genet 25: 25–29.
store, and computation infrastructure. The ontology data itself
                                                                                    [10] The Gene Ontology Consortium (2014) Gene Ontology Consortium:
will be served from external services, such as the Planteome                             going forward. Nucleic Acids Research. doi: 10.1093/nar/gku1179.
API.
                                                                                    [11] Gkoutos G, Green E, Mallon A-M, Hancock J, Davidson D (2004)
                                                                                         Using ontologies to describe mouse phenotypes. Genome Biol 6: R8
                          VII.       CONCLUSIONS                                    [12] Buttigieg P, Morrison N, Smith B, Mungall C, Lewis S (2013) The
                                                                                         environment ontology: contextualising biological and biomedical
    The Planteome project is a centralized online plant                                  entities. Journal of Biomedical Semantics 4: 43
informatics portal and which integrates reference ontologies for                    [13] Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan
plants, and species-specific controlled vocabularies with a                              V, et al. (2016) ChEBI in 2016: Improved services and an expanding
large and growing corpus of plant genomics data. This                                    collection of metabolites. Nucleic Acids Research 44: D1214–D1219
platform provides semantic integration of widely diverse                            [14] Shrestha, R, Davenport, GF Bruskiewich, R, Arnaud, E. (2011)
                                                                                         Development of crop ontology for sharing crop phenotypic information.
datasets with the goal of plant improvement.                                             Drought phenotyping in crops: from theory to practice. pp 171–179
                                                                                    [15] Hill DP, Smith B, McAndrews-Hill MS, Blake J (2008) Gene
                                                                                         Ontology annotations: what they mean and where they come from. BMC
                           ACKNOWLEDGMENT                                                Bioinformatics 9: S2
   Funding for the Planteome project is provided by the                             [16] Quevillon E, Silventoinen V, Pillai S, et al. 2005. InterProScan:
National Science Foundation award IOS #1340112                                           protein domains identifier. Nucleic Acids Research. 33(Web Server
                                                                                         issue):W116-W120. doi:10.1093/nar/gki442.
                                                                                    [17] Remm M, Storm CEV and Sonnhammer ELL (2001). Automatic
                                 REFERENCES                                              Clustering of Orthologs and In-paralogs from Pairwise Species
                                                                                         Comparisons. JMB, 314:1041-1052.
[1]   Jaiswal, P, S Avraham, K Ilic, EA Kellogg, S McCouch, A Pujar, et             [18] Altschul, SF, Madden, TL, Schäffer, AA, Zhang, J, Zhang, Z,
      al., 2005. Plant Ontology (PO): A Controlled Vocabulary of Plant                   Miller, W, et al. (1997). Gapped BLAST and PSI-BLAST: a new
      Structures and Growth Stages. Comp Funct Genomics,. 6(7--­‐‑8): p. 388-            generation of protein database search programs. Nucleic Acids Res.
      97 (references)                                                                    25:3389-3402.
[2]   Pujar, A, P Jaiswal, EA Kellogg, K Ilic, L Vincent, S Avraham, et             [19] Hong, Z, Ueguchi-Tanaka, M, Shimizu-Sato, S, Inukai, Y, Fujioka,
      al. 2006. Whole-­‐‑plant growth stage ontology for angiosperms and its             S, Shimada, Y, et al (2002) Loss-of-function of a rice brassinosteroid
      application in plant biology. Plant Physiol, 142(2): p. 414--­‐‑28.                biosynthetic enzyme, C-6 oxidase, prevents the organized arrangement
[3]   Ilic, K, EA Kellogg, P Jaiswal, F Zapata, PF Stevens, LP Vincent, et               and polar elongation of cells in the leaves and stem. The Plant Journal
      al., 2007. The plant structure ontology, a unified vocabulary of anatomy           32: 495–508
      and morphology of a flowering plant. Plant Physiol. 143(2): p. 587--­‐‑599.   [20] Dietze, H, Berardini, T, Foulger, R, Hill, D, Lomax, J,
[4]   Avraham, S, CW Tung, K Ilic, P Jaiswal, EA Kellogg, S McCouch,                     OsumiSutherland, D, Roncaglia P, Mungall C (2014) TermGenie - A
                                                                                         web application for pattern-based ontology class generation. Journal of
      et al., 2008. The Plant Ontology Database: a community resource for
      plant structure and developmental stages controlled vocabulary and                 Biomedical Semantics 5: 48
      annotations. Nucleic Acids Res., 36(Database issue): p. D449--­‐‑54..         [21] Lingutla N, Preece J, Todorovic S, Cooper L, Moore L, Jaiswal P
                                                                                         (2014) AISO: Annotation of Image Segments with Ontologies. Journal
[5]   Cooper L, Walls RL, Elser J, Gandolfo MA, Stevenson DW, Smith
      B, et al. (2013) The Plant Ontology as a tool for comparative plant                of Biomedical Semantics 5: 50
      anatomy and genomic analyses. Plant and Cell Physiology 54: e1–e1
[6]   Cooper L and Jaiswal P (2016) The Plant Ontology: A Tool for Plant
      Genomics. In D Edwards, ed, Plant Bioinformatics. Springer New York,
      pp 89–114