=Paper=
{{Paper
|id=Vol-2285/ICBO_2018_paper_25
|storemode=property
|title=OOPS: The Ontology of Plant Stress, A Semi-Automated Standardization Methodology
|pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_25.pdf
|volume=Vol-2285
|authors=Austin Meier,Marie-Angélique Laporte,Justin Elser,Laurel Cooper, Justin Preece,Pankaj Jaiswal,Jorrit Poelen
|dblpUrl=https://dblp.org/rec/conf/icbo/MeierLECPJP18
}}
==OOPS: The Ontology of Plant Stress, A Semi-Automated Standardization Methodology==
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 1
OOPS: The Ontology Of Plant Stress
A semi-automated standarization methodology
Marie-Angélique Laporte
Austin Meier, Laurel Cooper, Justin Elser, Pankaj Jaiswal. Bioversity
Oregon State University Montpellier, France
Corvallis, OR. United States
meiera@oregonstate.edu Jorrit H Poelen
400 Perkins Street, Apt. 104
Oakland, CA 94610, USA
Abstract— Plant stress traits are important breeding targets fewer resources. However from the pathology side, using the
for all crop species. Massive amounts of research dollars are spent metadata we can also build a network of ontologies from
generating data to combat plant diseases and environmental different knowledge domains to suggest how a stress/disease is
stress. Often this data is used to achieve a single goal, and then left manifested. This can be helpful for not just the researchers, but
in a repository to never be used again. As a scientific community, can be integrated into online digital tools to help farmers,
we should be striving to make all publicly funded data reusable, agriculture extension specialists, education and machine
and interoperable. This goal is achievable only through careful learning-based data processors for active learning.
annotation using universal data and metadata standards. One
such standard is the use of a standardized vocabulary, or II. METHODS
ontology. This paper presents a semi-automated method to define
and label plant stresses using a combination of web scraping and A. Overview
ontology design patterns. Standardizing the definitions and The hierarchy of the Ontology Of Plant Stress (OOPS)
linking plant stress with established hierarchies leverages previous separates plant stress into two general subclasses: biotic stress,
work of developed knowledge bases such as taxonomic and abiotic stress classes (Fig 1.) The abiotic stress class has two
classifications and other ontologies.
subclasses: plant stress caused by an excess or deficiency of
Keywords—ontology; plant pathology; nutrient deficiency; data some element. The biotic stress class has two children terms,
standards; Planteome; automation; web scraping. herbivory stress and plant disease. These upper level hierarchy
terms are manually curated, and can be adjusted, or added to if
I. INTRODUCTION the need arises. Initial abiotic stress terms were populated using
existing abiotic stress traits found in the Plant Trait Ontology
Global climate change and international travel has
(TO [2]) and initial plant disease terms were identified by
introduced more and more diseases to previously unaffected
scraping the American Phytopathological Society website
regions. The varieties of crops grown in these regions are
(www.apsnet.org) using the Samara webscraping application [3]
typically very susceptible, and yield losses are
massive. Spraying pesticides is costly, and damaging to the
environment. It takes too long to identify, and integrate
resistance genes into existing elite varieties using traditional
breeding methods.
Many diseases already have a substantial amount of research
and data available related to resistance genes, pathways, and
quantitative trait loci (QTLs). However, this data is not easily
accessible and even when it is, it can often be difficult to
interpret.
By standardizing the naming of plant diseases, their host and
pathogen from an ordered taxonomy (e.g. NCBI Taxonomy [1]
), and the datasets on genes, QTLs, genetic markers and gene
expression, we can ask semantic questions such as: “What genes Fig1.
overlap the resistance QTL, and how they are expressed in A top level view of the Ontology of Plant Stress (OOPS). All classes fall
response to a pathogen in a given species?”, “If the same under the parent class plant stress. The two child terms under the top
pathogen affects a closely-related plant hosts, does it trigger the level divide plant stress processes into either biotic stress or abiotic
expression of gene homologs?” Or “Is there a common stress. Classes highlighted in blue represent classes in which there is no
resistance gene motif that is shown to be effective against this specificity to the host plant experiencing the stress process. Classes
pathogen?” Being able to leverage existing datasets will highlighted in yellow indicate stresses in which a specific interaction is
expedite identification of resistance sources, and reduce occurring between the host plant and the stressor. Example stress classes
breeding integration times; producing more food, and using from table 1 and 2 are displayed in grey.
ICBO 2018 August 7-10, 2018 1
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 2
Element Plant Structure
B. Design patterns
In order to increase automation in development of the Nitrogen atom (CHEBI: 29352) whole plant (PO:0000003)
Ontology of Plant Stress, we are using a set of design patterns
that describe different plant stresses compliant with the Dead Phosphorus (CHEBI:28659) whole plant (PO:0000003)
Simple OWL Design Patterns (DOS-DPs) format [4]. Using
design patterns allows term lists to be maintained in flat tables
that can be automatically converted into web ontology language Nitrogen atom (CHEBI: 29352) leaf (PO:0025034)
(OWL). In its current pre-release state, OOPS uses three distinct
patterns to define plant stress ontology terms: deficiencies, and Table 1: Flat list describing entities used to construct excess or
excess for abiotic stress processes. A single ‘disease pattern’ is deficiency plant stress terms in OOPS. Example terms identified from
used for biotic stresses. the Plant Trait Ontology terms, nitrogen sensitivity (TO:0000011), and
phosphorus sensitivity (TO:0000102). The first column contains the
C. Abiotic stress patterns
stressor agent, often a chemical entity. The second column contains
Plants can experience stress from exposure to a multitude of the anatomical plant structure (from the Plant Ontology) affected by
different chemical elements, and the process of experiencing the stress process.
stress is dependent on the concentration of said element for a
given species or variety of plant in contrast to a reference
entity. Abiotic stresses are divided into subclasses based on the D. Biotic stress patterns
excess and deficient states of the stressor element. Stresses
The Biotic stress class has two subclasses: herbivory, and
caused by exposure to an experimental condition containing too
much of an element fall under the “excess” pattern, whereas plant disease. The Herbivory stress pattern is under
stresses caused by exposure to an experimental condition that is development, and the plant disease stress pattern results in the
deficient/lacking a particular element are said to be following axiom.
“deficient”. The pattern returns an ontology term with the Disease pattern:
axioms in Manchester syntax [5] as follows: "'plant disease process' and ('has
participant' some HOST) and 'causally
Excess pattern: downstream of' some ('plant treatment' and
"'abiotic plant stress' and 'causally 'has exposure stimulus' some PATHOGEN) and
downstream of' some ('plant treatment' and 'occurs in' some PLANT STRUCTURE"
'has exposure stimulus' some (ELEMENT and
'has quality' some 'increased amount')) and Defining diseases as processes allows the annotation of
'occurs in' some PLANT STRUCTURE" stage-specific disease symptoms as infection occurs. Plant
Deficiency pattern: diseases are defined by three object classes: host, pathogen,
and the plant structure where infection occurs. This pattern
"'abiotic plant stress' and 'causally defines a host as some participant in the process, whereas the
downstream of' some ('plant treatment' and pathogen is said to be an exposure stimulus in an environment
'has exposure stimulus' some (ELEMENT and containing the pathogen. The disease process is said to occur
'has quality' some 'decreased amount')) and in some plant structure (PO:0009011). This additional
'occurs in' some PLANT STRUCTURE"
requirement allows root diseases to be defined separately from
shoot diseases in the case that both are caused by the same
In the above axioms, the ‘ELEMENT’ is defined by some pathogen (Table 2). Identification and treatment of diseases
entity which is the agent responsible for the stress. This element depends on the location of the infection. In the cases where the
can be anything, but is typically some chemical entity, defined pathogen infection is systemic, whole plant (PO:0000003) is
using Chemical Entities of Biological Interest (ChEBI [6]). The used as the plant structure.
‘PLANT STRUCTURE’ is where the stress occurs or is Unlike abiotic stresses, plant diseases are processes that are
observed, typically defined by a plant anatomy term from the specific to their host plant. It is understood that certain plant
plant ontology (PO [2]), which can be a specific plant part (eg: pathogens are capable of infecting multiple hosts [9], and this
root (PO:0009005), or vascular leaf (PO:0009025)), but is often can cause some term inflation within the ontology. This is an
more generally defined as the whole plant (PO:0000003). acceptable side effect of describing plant stress in as
Examples of the tabular list needed to generate both excess stress unambiguous terms as possible. Currently, both hosts and
terms and deficiency stress terms can be seen in Table 1.
pathogens (including pests) are defined by their NCBI taxon ID
and are grouped by their taxonomic clade. This allows filtering
of diseases based on host, or causal agent (eg: viral diseases vs.
bacterial diseases, or potato diseases vs Solanaceae diseases).
This will allow potato breeders to filter out all diseases that do
not affect potato, or potentially gain insight into resistance
mechanisms by expanding the filters to include diseases
ICBO 2018 August 7-10, 2018 2
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 3
affecting all solanaceous crops. Examples of the tabular format Given that the APS pages used to extract information were
needed to generate plant disease terms can be seen in Table 2. designed for consumption by humans, the structure of the
information is not consistent. By providing a rapid, automated
process to extract, correct and publish a machine-readable
Host Pathogen Plant datasets, we put in place a repeatable process in which
Structure corrections can be made relatively quickly by avoiding
unnecessary manual inputs. For instance, a change in a name
Oryza sativa Xanthomonas oryzae whole plant mapping file in Samara will automatically trigger a new scrape
(NCBITaxon:4530) pv. Oryzicola (PO:0000003) of the APS resource using a Jenkins job running on a server
(NCBITaxon:1080340) provided by the Berkeley BBOP [8]. A new dataset will become
available less than 20 minutes after that name mapping change
Oryza sativa Xanthomonas oryzae vascular leaf is made. Also, dataset archives produced by this automated
(NCBItaxon:4530) pv. Oryzicola (PO:0009025) process are regularly ingested by Global Biotic Interactions
(NCBITaxon:1080340) (GloBI, https://globalbioticinteractions.org) to further increase
the visibility of the APS dataset and the OOPS to stimulate re-
Table 2: Example rows from the flat list of entities used to generate use and make it easier to detect suspicious data records.
plant disease terms in OOPS. Three entities are needed: host,
pathogen, and plant structure. Both host and pathogen come from
NCBI Taxonomy hierarchy, and the plant structure entity affected by
the plant disease is from the Plant Ontology.
E. Initial term population
The initial set of abiotic stresses were determined by
extracting all of the abiotic plant traits from the Plant Trait
Ontology. Any time a plant trait was defined as the response to
a chemical entity (ChEBI), two stress terms were created: one
each for the excess and deficient state of the said chemical
entity.
F. Samara’s APS web scrape
To collect plant disease names, the American
Phytopathology Society (APS) web publication "Common
Names of Plant Diseases" [7], was scraped by the Samara tool
[3]. Samara is a command-line tool implement in scala
(https://scala-lang.org) that extracts plant trait data from open
data sources like APS and USDA-GRIN (www.apsnet.org,
www.grin-global.org).
To convert human readable pages from APS’s "Common
Name of Plant Diseases" resource, an automated process was Figure 2: Example differentiation of three plant diseases that
implemented. The first step of this process is to extract all previously would be indistinguishable by using only those
disease names, source citations, host plant and pathogen from diseases’ common names. By combining the taxonomy of
individual host disease pages. The second step corrects both host and pathogen, we can create unique labels to
troublesome names using a version controlled name map differentiate between similarly named diseases with
(i.e., nameMap.tsv). The third step links host and pathogen completely different causal species.
names to NCBI Taxonomy, OBO Relations Ontology (e.g.,
pathogen of, http://purl.obolibrary.org/obo/RO_0002556) and
Plant Ontology for other entities such as host parts (e.g., leaf or III. DISCUSSION
root). The relationship, or interaction type, is inferred from the The constant arms race between plant hosts, and the
context of the resource and the host parts were extracted from pathogens that infect them is guided by evolution - the resulting
the common name for the disease using a word matching inference being genes that share similar sequence or domains
algorithm. The final step exports the results into a tab- often share similar functions. OOPS utilizes the relatedness of
separated-value file to make the results available for plant stress participants (host and pathogen in the case of
downstream processing. This process is then repeated to disease, and chemical entity in abiotic stress), and will give
optimize the quality of the name mapping and linking scientists improved accuracy when forming hypothesis about
methods. gene function, or candidate genes that may be linked to plant
traits of interest. Standardizing the definition of plant stresses,
ICBO 2018 August 7-10, 2018 3
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 4
and using this standard vocabulary in the annotation of genes, used to produce robust versioning of stress term edits.
genomes, QTL, mutants, and the data gathered via field books Reaching out to subject matter experts, such as CGIAR
from plant breeding or field trial experiments can help in Research Centers will be key to accurate plant disease
building common semantic queries for hypothesis generation, descriptions. Reaching out to APS will be important for
and provide accuracy in the annotation process. Using existing widespread adoption, and community efforts needed to stay up
taxonomic hierarchies, and ontologies, researchers can leverage to date on plant disease nomenclature, and identification. For
relatedness between both plant hosts, causative pathogens, and instance, we imagine a collaboration in which APS updates the
even chemical entities to more accurately predict targets for Common Names of Plant Diseases [7] pages such that
molecular markers, and identify candidate stress responsive taxonomic terms (host, pathogen) and diseases are linked to
gene functions. These standards will also help aggregate existing
NCBI Taxonomy and OOPS respectively, and make them
data, and assist in future-proofing new data to ensure that the
massive amounts of both phenotypic and genotypic data being available in formats that are friendly to humans (e.g., html) and
generated can be interoperable instead of being used for an machines (e.g., tsv, rdf). In addition, after the release of a stable
singular task, and dumped into a repository to collect dust. OOPS, the intent is to link it to the Plant Trait Ontology by
using OOPS terms within TO stress responsivity traits. This
The real innovation and advancement of this work is the way, TO, PO, NCBITaxonomy, and ChEBI can all be linked
emphasis on automation. Much of the accuracy of the disease together, to form a more robust knowledge graph within
terms require information from a subject matter expert. These Planteome.
experts are often not familiar with ontologies and various
formats like OWL and ontology editing tools, and would require ACKNOWLEDGMENT
extensive training and guidance in order to
This work was supported by IOS:1340112 from the National
contribute. Therefore, the use of design patterns to automate
ontology development, term addition, and edits, allows curators, Science Foundation.
and contributors to maintain OOPS using just a flat list. This REFERENCES
lowered bar for ontology curation reduces effort in training new
contributors, additional curators, and the overall overhead for
maintenance. Efforts to simplify the construction and [1] Federhen S. The NCBI Taxonomy database. Nucleic Acids Research.
2012;40(Database issue):D136-D143. doi:10.1093/nar/gkr1178.
maintenance will also improve community involvement and
[2] Cooper L, Meier A, Laporte M-A, Elser JL, Mungall C, Sinn BT,
adoption. Cavaliere D, Dunn NA, Smith B, Qu B et al.. 2018. The Planteome
database: an integrated resource for reference ontologies, plant genomics
Construction of an ontology requires expert domain and phenomics. Nucleic Acids Research. 10.1093/nar/gkx1152. Vol
knowledge to ensure accuracy of the resulting hierarchy. OOPS 46:D1168-1180
is no exception. Plant stress spans the entirety of the plant [3] Jorrit Poelen, & Marie-Angélique Laporte. (2018, May 7).
science field, and a single person cannot hope to understand and jhpoelen/samara v0.2.0 (Version v0.2.0). Zenodo.
capture all of the instances of plant stress. That is part of the http://doi.org/10.5281/zenodo.1243234 ).
benefits of using these automated tools for developing an [4] Osumi-Sutherland D, Courtot M, Balhoff J.P., Christopher Mungall C.
ontology; when issues arise, or additional parental classes are Dead simple OWL design patterns. Journal of Biomedical Semantics
needed to further group stress, they can simply be added to the 2017 8:18. h ttps://doi.org/10.1186/s13326-017-0126-0
upper level hierarchy list, and the reasoner can place child terms [5] Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S,
(eds).OWL2 Web Ontology Language: Primer: W3C Recommendation;
using the appropriate pattern. 2009. Available at http://www.w3.org/TR/owl2-primer/.
As it currently stands, OOPS is available on GitHub [6] Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V,
(https://github.com/Planteome/ontology-of-plant-stress). Turner S, Swainston N, Mendes P, Steinbeck C. (2016). ChEBI in 2016:
Improved services and an expanding collection of metabolites. Nucleic
However, it is under construction, and no stable release is Acids Res.
available at this time. [7] Common names of plant diseases : American Phythopathology Society.
http://www.apsnet.org/publications/commonnames/Pages/default.aspx
IV. FUTURE DIRECTION
[8] http://build.berkeleybop.org/view/Planteome/job/extract-apsnet-diseases
Community involvement is key to ontology utility. To [9] Gilbert and Webb, Phylogenetic signal in plant pathogen–host range
make OOPS more robust and functional, we are planning to PNAS 2007. 104 (12) 4979-4983
implement a table editing tool that will be accessible to the [10] Cooper and Jaiswal, The Plant Ontolgy: A Tool for Plant Genomics.
public. Some form of version control (likely GitHub) will be Methods in Molecular Biology. Vol 1373
ICBO 2018 August 7-10, 2018 4