=Paper= {{Paper |id=Vol-2285/ICBO_2018_paper_25 |storemode=property |title=OOPS: The Ontology of Plant Stress, A Semi-Automated Standardization Methodology |pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_25.pdf |volume=Vol-2285 |authors=Austin Meier,Marie-Angélique Laporte,Justin Elser,Laurel Cooper, Justin Preece,Pankaj Jaiswal,Jorrit Poelen |dblpUrl=https://dblp.org/rec/conf/icbo/MeierLECPJP18 }} ==OOPS: The Ontology of Plant Stress, A Semi-Automated Standardization Methodology== https://ceur-ws.org/Vol-2285/ICBO_2018_paper_25.pdf
       Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                          1




                       OOPS: The Ontology Of Plant Stress
                                     A semi-automated standarization methodology

                                                                                                   Marie-Angélique Laporte
Austin Meier, Laurel Cooper, Justin Elser, Pankaj Jaiswal.                                               Bioversity
                Oregon State University                                                              Montpellier, France
              Corvallis, OR. United States
                meiera@oregonstate.edu                                                                 Jorrit H Poelen
                                                                                                 400 Perkins Street, Apt. 104
                                                                                                  Oakland, CA 94610, USA


    Abstract— Plant stress traits are important breeding targets               fewer resources. However from the pathology side, using the
for all crop species. Massive amounts of research dollars are spent            metadata we can also build a network of ontologies from
generating data to combat plant diseases and environmental                     different knowledge domains to suggest how a stress/disease is
stress. Often this data is used to achieve a single goal, and then left        manifested. This can be helpful for not just the researchers, but
in a repository to never be used again. As a scientific community,             can be integrated into online digital tools to help farmers,
we should be striving to make all publicly funded data reusable,               agriculture extension specialists, education and machine
and interoperable. This goal is achievable only through careful                learning-based data processors for active learning.
annotation using universal data and metadata standards. One
such standard is the use of a standardized vocabulary, or                                                  II. METHODS
ontology. This paper presents a semi-automated method to define
and label plant stresses using a combination of web scraping and               A. Overview
ontology design patterns. Standardizing the definitions and                        The hierarchy of the Ontology Of Plant Stress (OOPS)
linking plant stress with established hierarchies leverages previous           separates plant stress into two general subclasses: biotic stress,
work of developed knowledge bases such as taxonomic                            and abiotic stress classes (Fig 1.) The abiotic stress class has two
classifications and other ontologies.
                                                                               subclasses: plant stress caused by an excess or deficiency of
    Keywords—ontology; plant pathology; nutrient deficiency; data              some element. The biotic stress class has two children terms,
standards; Planteome; automation; web scraping.                                herbivory stress and plant disease. These upper level hierarchy
                                                                               terms are manually curated, and can be adjusted, or added to if
                        I. INTRODUCTION                                        the need arises. Initial abiotic stress terms were populated using
                                                                               existing abiotic stress traits found in the Plant Trait Ontology
    Global climate change and international travel has
                                                                               (TO [2]) and initial plant disease terms were identified by
introduced more and more diseases to previously unaffected
                                                                               scraping the American Phytopathological Society website
regions. The varieties of crops grown in these regions are
                                                                               (www.apsnet.org) using the Samara webscraping application [3]
typically very susceptible, and yield losses are
massive. Spraying pesticides is costly, and damaging to the
environment. It takes too long to identify, and integrate
resistance genes into existing elite varieties using traditional
breeding methods.
    Many diseases already have a substantial amount of research
and data available related to resistance genes, pathways, and
quantitative trait loci (QTLs). However, this data is not easily
accessible and even when it is, it can often be difficult to
interpret.
    By standardizing the naming of plant diseases, their host and
pathogen from an ordered taxonomy (e.g. NCBI Taxonomy [1]
), and the datasets on genes, QTLs, genetic markers and gene
expression, we can ask semantic questions such as: “What genes                Fig1.
overlap the resistance QTL, and how they are expressed in                     A top level view of the Ontology of Plant Stress (OOPS). All classes fall
response to a pathogen in a given species?”, “If the same                     under the parent class plant stress. The two child terms under the top
pathogen affects a closely-related plant hosts, does it trigger the           level divide plant stress processes into either biotic stress or abiotic
expression of gene homologs?” Or “Is there a common                           stress. Classes highlighted in blue represent classes in which there is no
resistance gene motif that is shown to be effective against this              specificity to the host plant experiencing the stress process. Classes
pathogen?” Being able to leverage existing datasets will                      highlighted in yellow indicate stresses in which a specific interaction is
expedite identification of resistance sources, and reduce                     occurring between the host plant and the stressor. Example stress classes
breeding integration times; producing more food, and using                    from table 1 and 2 are displayed in grey.




       ICBO 2018                                                   August 7-10, 2018                                                       1
      Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                         2



                                                                                Element                             Plant Structure
B. Design patterns
    In order to increase automation in development of the                       Nitrogen atom (CHEBI: 29352)        whole plant (PO:0000003)
Ontology of Plant Stress, we are using a set of design patterns
that describe different plant stresses compliant with the Dead                  Phosphorus (CHEBI:28659)            whole plant (PO:0000003)
Simple OWL Design Patterns (DOS-DPs) format [4]. Using
design patterns allows term lists to be maintained in flat tables
that can be automatically converted into web ontology language                  Nitrogen atom (CHEBI: 29352)        leaf (PO:0025034)
(OWL). In its current pre-release state, OOPS uses three distinct
patterns to define plant stress ontology terms: deficiencies, and            Table 1: Flat list describing entities used to construct excess or
excess for abiotic stress processes. A single ‘disease pattern’ is           deficiency plant stress terms in OOPS. Example terms identified from
used for biotic stresses.                                                    the Plant Trait Ontology terms, nitrogen sensitivity (TO:0000011), and
                                                                             phosphorus sensitivity (TO:0000102). The first column contains the
C. Abiotic stress patterns
                                                                             stressor agent, often a chemical entity. The second column contains
    Plants can experience stress from exposure to a multitude of             the anatomical plant structure (from the Plant Ontology) affected by
different chemical elements, and the process of experiencing                 the stress process.
stress is dependent on the concentration of said element for a
given species or variety of plant in contrast to a reference
entity. Abiotic stresses are divided into subclasses based on the             D. Biotic stress patterns
excess and deficient states of the stressor element. Stresses
                                                                                  The Biotic stress class has two subclasses: herbivory, and
caused by exposure to an experimental condition containing too
much of an element fall under the “excess” pattern, whereas                   plant disease. The Herbivory stress pattern is under
stresses caused by exposure to an experimental condition that is              development, and the plant disease stress pattern results in the
deficient/lacking a particular element are said to be                         following axiom.
“deficient”. The pattern returns an ontology term with the                        Disease pattern:
axioms in Manchester syntax [5] as follows:                                       "'plant disease process' and ('has
                                                                              participant' some HOST) and 'causally
   Excess pattern:                                                            downstream of' some ('plant treatment' and
  "'abiotic plant stress' and 'causally                                       'has exposure stimulus' some PATHOGEN) and
downstream of' some ('plant treatment' and                                    'occurs in' some PLANT STRUCTURE"
'has exposure stimulus' some (ELEMENT and
'has quality' some 'increased amount')) and                                       Defining diseases as processes allows the annotation of
'occurs in' some PLANT STRUCTURE"                                             stage-specific disease symptoms as infection occurs. Plant
   Deficiency pattern:                                                        diseases are defined by three object classes: host, pathogen,
                                                                              and the plant structure where infection occurs. This pattern
  "'abiotic plant stress' and 'causally                                       defines a host as some participant in the process, whereas the
downstream of' some ('plant treatment' and                                    pathogen is said to be an exposure stimulus in an environment
'has exposure stimulus' some (ELEMENT and                                     containing the pathogen. The disease process is said to occur
'has quality' some 'decreased amount')) and                                   in some plant structure (PO:0009011). This additional
'occurs in' some PLANT STRUCTURE"
                                                                              requirement allows root diseases to be defined separately from
                                                                              shoot diseases in the case that both are caused by the same
   In the above axioms, the ‘ELEMENT’ is defined by some                      pathogen (Table 2). Identification and treatment of diseases
entity which is the agent responsible for the stress. This element            depends on the location of the infection. In the cases where the
can be anything, but is typically some chemical entity, defined               pathogen infection is systemic, whole plant (PO:0000003) is
using Chemical Entities of Biological Interest (ChEBI [6]). The               used           as          the          plant           structure.
‘PLANT STRUCTURE’ is where the stress occurs or is                               Unlike abiotic stresses, plant diseases are processes that are
observed, typically defined by a plant anatomy term from the                  specific to their host plant. It is understood that certain plant
plant ontology (PO [2]), which can be a specific plant part (eg:              pathogens are capable of infecting multiple hosts [9], and this
root (PO:0009005), or vascular leaf (PO:0009025)), but is often               can cause some term inflation within the ontology. This is an
more generally defined as the whole plant (PO:0000003).                       acceptable side effect of describing plant stress in as
Examples of the tabular list needed to generate both excess stress            unambiguous terms as possible. Currently, both hosts and
terms and deficiency stress terms can be seen in Table 1.
                                                                              pathogens (including pests) are defined by their NCBI taxon ID
                                                                              and are grouped by their taxonomic clade. This allows filtering
                                                                              of diseases based on host, or causal agent (eg: viral diseases vs.
                                                                              bacterial diseases, or potato diseases vs Solanaceae diseases).
                                                                              This will allow potato breeders to filter out all diseases that do
                                                                              not affect potato, or potentially gain insight into resistance
                                                                              mechanisms by expanding the filters to include diseases




      ICBO 2018                                                   August 7-10, 2018                                                      2
         Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                        3


affecting all solanaceous crops. Examples of the tabular format                      Given that the APS pages used to extract information were
needed to generate plant disease terms can be seen in Table 2.                   designed for consumption by humans, the structure of the
                                                                                 information is not consistent. By providing a rapid, automated
                                                                                 process to extract, correct and publish a machine-readable
  Host                      Pathogen                      Plant                  datasets, we put in place a repeatable process in which
                                                          Structure              corrections can be made relatively quickly by avoiding
                                                                                 unnecessary manual inputs. For instance, a change in a name
  Oryza sativa              Xanthomonas oryzae            whole plant            mapping file in Samara will automatically trigger a new scrape
  (NCBITaxon:4530)          pv. Oryzicola                 (PO:0000003)           of the APS resource using a Jenkins job running on a server
                            (NCBITaxon:1080340)                                  provided by the Berkeley BBOP [8]. A new dataset will become
                                                                                 available less than 20 minutes after that name mapping change
  Oryza sativa              Xanthomonas oryzae            vascular leaf          is made. Also, dataset archives produced by this automated
  (NCBItaxon:4530)          pv. Oryzicola                 (PO:0009025)           process are regularly ingested by Global Biotic Interactions
                            (NCBITaxon:1080340)                                  (GloBI, https://globalbioticinteractions.org) to further increase
                                                                                 the visibility of the APS dataset and the OOPS to stimulate re-
Table 2: Example rows from the flat list of entities used to generate            use and make it easier to detect suspicious data records.
plant disease terms in OOPS. Three entities are needed: host,
pathogen, and plant structure. Both host and pathogen come from
NCBI Taxonomy hierarchy, and the plant structure entity affected by
the plant disease is from the Plant Ontology.



E. Initial term population
    The initial set of abiotic stresses were determined by
extracting all of the abiotic plant traits from the Plant Trait
Ontology. Any time a plant trait was defined as the response to
a chemical entity (ChEBI), two stress terms were created: one
each for the excess and deficient state of the said chemical
entity.
F. Samara’s APS web scrape
    To collect plant disease names, the American
Phytopathology Society (APS) web publication "Common
Names of Plant Diseases" [7], was scraped by the Samara tool
[3]. Samara is a command-line tool implement in scala
(https://scala-lang.org) that extracts plant trait data from open
data sources like APS and USDA-GRIN (www.apsnet.org,
www.grin-global.org).

     To convert human readable pages from APS’s "Common
Name of Plant Diseases" resource, an automated process was                         Figure 2: Example differentiation of three plant diseases that
implemented. The first step of this process is to extract all                      previously would be indistinguishable by using only those
disease names, source citations, host plant and pathogen from                      diseases’ common names. By combining the taxonomy of
individual host disease pages. The second step corrects                            both host and pathogen, we can create unique labels to
troublesome names using a version controlled name map                              differentiate between similarly named diseases with
(i.e., nameMap.tsv). The third step links host and pathogen                        completely different causal species.
names to NCBI Taxonomy, OBO Relations Ontology (e.g.,
pathogen of, http://purl.obolibrary.org/obo/RO_0002556) and
Plant Ontology for other entities such as host parts (e.g., leaf or                                        III. DISCUSSION
root). The relationship, or interaction type, is inferred from the                   The constant arms race between plant hosts, and the
context of the resource and the host parts were extracted from                   pathogens that infect them is guided by evolution - the resulting
the common name for the disease using a word matching                            inference being genes that share similar sequence or domains
algorithm. The final step exports the results into a tab-                        often share similar functions. OOPS utilizes the relatedness of
separated-value file to make the results available for                           plant stress participants (host and pathogen in the case of
downstream processing. This process is then repeated to                          disease, and chemical entity in abiotic stress), and will give
optimize the quality of the name mapping and linking                             scientists improved accuracy when forming hypothesis about
methods.                                                                         gene function, or candidate genes that may be linked to plant
                                                                                 traits of interest. Standardizing the definition of plant stresses,




         ICBO 2018                                                    August 7-10, 2018                                                    3
       Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                                 4


and using this standard vocabulary in the annotation of genes,                 used to produce robust versioning of stress term edits.
genomes, QTL, mutants, and the data gathered via field books                   Reaching out to subject matter experts, such as CGIAR
from plant breeding or field trial experiments can help in                     Research Centers will be key to accurate plant disease
building common semantic queries for hypothesis generation,                    descriptions. Reaching out to APS will be important for
and provide accuracy in the annotation process. Using existing                 widespread adoption, and community efforts needed to stay up
taxonomic hierarchies, and ontologies, researchers can leverage                to date on plant disease nomenclature, and identification. For
relatedness between both plant hosts, causative pathogens, and                 instance, we imagine a collaboration in which APS updates the
even chemical entities to more accurately predict targets for                  Common Names of Plant Diseases [7] pages such that
molecular markers, and identify candidate stress responsive                    taxonomic terms (host, pathogen) and diseases are linked to
gene functions. These standards will also help aggregate existing
                                                                               NCBI Taxonomy and OOPS respectively, and make them
data, and assist in future-proofing new data to ensure that the
massive amounts of both phenotypic and genotypic data being                    available in formats that are friendly to humans (e.g., html) and
generated can be interoperable instead of being used for an                    machines (e.g., tsv, rdf). In addition, after the release of a stable
singular task, and dumped into a repository to collect dust.                   OOPS, the intent is to link it to the Plant Trait Ontology by
                                                                               using OOPS terms within TO stress responsivity traits. This
    The real innovation and advancement of this work is the                    way, TO, PO, NCBITaxonomy, and ChEBI can all be linked
emphasis on automation. Much of the accuracy of the disease                    together, to form a more robust knowledge graph within
terms require information from a subject matter expert. These                  Planteome.
experts are often not familiar with ontologies and various
formats like OWL and ontology editing tools, and would require                                           ACKNOWLEDGMENT
extensive     training    and    guidance     in      order    to
                                                                                This work was supported by IOS:1340112 from the National
contribute. Therefore, the use of design patterns to automate
ontology development, term addition, and edits, allows curators,                                  Science Foundation.
and contributors to maintain OOPS using just a flat list. This                                                REFERENCES
lowered bar for ontology curation reduces effort in training new
contributors, additional curators, and the overall overhead for
maintenance.      Efforts to simplify the construction and                     [1]  Federhen S. The NCBI Taxonomy database. Nucleic Acids Research.
                                                                                    2012;40(Database issue):D136-D143. doi:10.1093/nar/gkr1178.
maintenance will also improve community involvement and
                                                                               [2] Cooper L, Meier A, Laporte M-A, Elser JL, Mungall C, Sinn BT,
adoption.                                                                           Cavaliere D, Dunn NA, Smith B, Qu B et al.. 2018. The Planteome
                                                                                    database: an integrated resource for reference ontologies, plant genomics
    Construction of an ontology requires expert domain                              and phenomics. Nucleic Acids Research. 10.1093/nar/gkx1152. Vol
knowledge to ensure accuracy of the resulting hierarchy. OOPS                       46:D1168-1180
is no exception. Plant stress spans the entirety of the plant                  [3] Jorrit Poelen, & Marie-Angélique Laporte. (2018, May 7).
science field, and a single person cannot hope to understand and                    jhpoelen/samara         v0.2.0      (Version       v0.2.0).       Zenodo.
capture all of the instances of plant stress. That is part of the                   http://doi.org/10.5281/zenodo.1243234 ).
benefits of using these automated tools for developing an                      [4] Osumi-Sutherland D, Courtot M, Balhoff J.P., Christopher Mungall C.
ontology; when issues arise, or additional parental classes are                     Dead simple OWL design patterns. Journal of Biomedical Semantics
needed to further group stress, they can simply be added to the                     2017 8:18. h ttps://doi.org/10.1186/s13326-017-0126-0
upper level hierarchy list, and the reasoner can place child terms             [5] Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S,
                                                                                    (eds).OWL2 Web Ontology Language: Primer: W3C Recommendation;
using the appropriate pattern.                                                      2009. Available at http://www.w3.org/TR/owl2-primer/.
    As it currently stands, OOPS is available on GitHub                        [6] Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V,
(https://github.com/Planteome/ontology-of-plant-stress).                            Turner S, Swainston N, Mendes P, Steinbeck C. (2016). ChEBI in 2016:
                                                                                    Improved services and an expanding collection of metabolites. Nucleic
However, it is under construction, and no stable release is                         Acids Res.
available at this time.                                                        [7] Common names of plant diseases : American Phythopathology Society.
                                                                                    http://www.apsnet.org/publications/commonnames/Pages/default.aspx
                     IV. FUTURE DIRECTION
                                                                               [8] http://build.berkeleybop.org/view/Planteome/job/extract-apsnet-diseases
         Community involvement is key to ontology utility. To                  [9] Gilbert and Webb, Phylogenetic signal in plant pathogen–host range
make OOPS more robust and functional, we are planning to                            PNAS 2007. 104 (12) 4979-4983
implement a table editing tool that will be accessible to the                  [10] Cooper and Jaiswal, The Plant Ontolgy: A Tool for Plant Genomics.
public. Some form of version control (likely GitHub) will be                        Methods in Molecular Biology. Vol 1373




       ICBO 2018                                                   August 7-10, 2018                                                              4