<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Semantics and NLP in Experimental Protocols</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olga Giraldo</string-name>
          <email>ogiraldo@fi.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Garcia</string-name>
          <email>alexgarciac@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Figueredo</string-name>
          <email>jfigueredofortes@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <email>ocorcho@fi.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology Engineering Group, Universidad Polit ́ecnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present SMART Protocols, a semantic and NLP -based infrastructure for processing and enacting experimental protocols. Our contribution is twofold; on the one hand, SMART Protocols delivers a semantic layer that represents the knowledge encoded in experimental protocols. On the other hand, it builds the groundwork for making use of such semantics within an NLP framework. We emphasize the semantic and NLP components, namely the SMART Protocols (SP) Ontology, the Sample Instrument Reagent Objective (SIRO) model and the text mining integrative architecture GATE. The SIRO model defines an extended layer of metadata for experimental protocols; SIRO is also a Minimal Information (MI) model conceived in the same realm as the Patient Intervention Comparison Outcome (PICO) model that supports search, retrieval and classification purposes. By combining comprehensive vocabularies with NLP rules and gazetteers, we identify meaningful parts of speech in experimental protocols. Moreover, in cases for which SIRO is not available, our NLP automatically extracts it; also, searching for queries such as: What bacteria have been used in protocols for persister cells isolation is possible.</p>
      </abstract>
      <kwd-group>
        <kwd>semantic web</kwd>
        <kwd>graph theory</kwd>
        <kwd>biomedical ontologies</kwd>
        <kwd>natural language processing</kwd>
        <kwd>knowledge representation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Experimental protocols are fundamental information structures that support the
description of the processes by means of which results are generated in experimental
research [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Experimental protocols describe how the data was produced, the steps
undertaken and conditions under which these steps were carried out. Biomedical
experiments often rely on sophisticated laboratory protocols, comprising hundreds of
individual steps; for instance, the protocol for chromatin immunoprecipitation on a
microarray (Chip-chip) has 90 steps and uses over 30 reagents and 10 different devices
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Protocols are written in natural language; they are often presented in a ”recipe”
style; they are meant to make it possible for researchers to reproduce experiments.
      </p>
      <p>
        In this paper, we present the semantic and Natural Language Processing (NLP)
components for SMART Protocols (SP); we want to facilitate the semantic
representation and natural language processing for these documents. Our NLP layer makes use
of various ontologies as well as of the Sample Instrument Reagent Objective (SIRO)
model for minimal information (MI) that we have defined. Our work is based on an
exhaustive analysis of over 400 published experimental protocols1 (molecular biology,
cell and developmental biology, biochemistry) and guidelines for authors from
repositories and journals, e.g. Nature Protocols2, Plant Methods (Methodology)3, Cold
Spring Harbor Protocols4. Moreover, the SP ontology and SIRO are built upon
experiences such as those under the BioSharing umbrella, e.g. the MIBBI project [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], OBO
foundry [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. We have also considered the ISA-TAB because it is a general framework
with which to collect and communicate complex metadata (i.e. sample characteristics,
technologies used, type of measurements made) from omics-based experiments [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        We have carefully considered and reused several ontologies, for instance: i) The
Ontology for Biomedical Investigations (OBI) aims to provide a representation of
biomedical investigations. OBI builds upon BFO and structures the ontology by
using ”occurrences” (processes) and ”continuants” (materials, instruments, qualities,
roles, functions) relevant to the biomedical domain [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. ii) The Information Artifact
Ontology (IAO)5 is an ontology that represents information entities such as
documents, file formats and specifications. iii) The ontology of experiments (EXPO) aims
to formalize domain-independent knowledge about the planning, execution and
analysis of scientific experiments. This ontology includes the class ”Experimental Protocol ”
and defines some of its properties: ”has applicability ”, ”has goal ”, ”has plan” [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. iv)
The LABORS ontology (LABoratory Ontology for Robot Scientists) addresses the
problem of representing the information required by robots to carry out experiments;
LABORS is an extension of EXPO and defines concepts such as ”investigation”,
”study ”, ”test ”, ”trial ” and ”replicate” [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. v) The ontology of experimental actions
(EXACT) provides a terminology for the description of protocols in the biomedical
domain. The core of this vocabulary is a hierarchical classification of verbs currently
used in experimental protocols. These verbs are divided into three groups according
to their goal (separation, transformation and combination) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>Unlike other approaches, the SP ontology is an application ontology designed
to support NLP over experimental protocols, publish protocols as Linked Open Data
(LOD) and annotate and classify these documents according to particularities in their
workflows. The SP-document delivers an structured vocabulary for representing a
specific type of document, the protocol. We extend the IAO to provide a structured
vocabulary of concepts to represent the information that is necessary and sufficient
for describing and experimental protocol as a document. In addition, the SP ontology
also considers the protocol as an executable element to be carried out and maintained
by humans it may also be transformed to workflow languages used by enactors such
as robots or machines per se, the SP ontology is not a workflow language but a model.</p>
      <p>
        For the representation of instructions in the SP-workflow module, we expand the
class ”experiment action” from EXACT and reuse classes from OBI, the BioAssay
Ontology (BAO) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], The Experimental Factor Ontology (EFO) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], eagle-i resource
ontology (ERO) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], NCBI taxonomy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Chemical Entities of Biological
Interest (ChEBI) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that are related to ”instruments”, ”reageants/chemical compounds”,
”organisms” and ”sample/specimen”. The property, sp:has instruction, is used to
define the instructions involved in protocols; instructions have actions and are the
units for the workflow in the SP-workflow module. The order in which these
instructions should be executed is captured by the BFO6 property ”is preceded by ” and
”precedes”.
      </p>
      <p>
        SIRO has been conceived in a way similar to that of the Patient Intervention
Comparison Outcome (PICO) model; it supports information retrieval and provides
an anchor for the records [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. SIRO extends the document metadata and delivers
the semantics for the registry of a protocol. SIRO facilitates classification and
retrieval without exposing the content of the document. In this way, publishers and
laboratories may keep the content private, exposing information that describes the
sample, instruments, reagent and objective of the protocol. The combination between
NLP and semantics in SMART Protocols makes it possible to answer queries such as
What bacteria have been used in protocols for persister cells isolation?, What
imaging analysis software is used for quantitative analysis of locomotor movements, buccal
pumping and cardiac activity on X. tropicalis?, How to prepare the stock solutions of
the H2DCF and DHE dyes?.
      </p>
      <sec id="sec-1-1">
        <title>5 https://github.com/information-artifact-ontology/IAO/ 6 http://ifomis.uni-saarland.de/bfo/</title>
        <p>
          For convenience, in this paper we use the protocol ”Extraction of total RNA from
fresh/frozen tissue (FT)” [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] as a running example. We model this protocol with the
SP ontology and SIRO; we also use this protocol to illustrate how the NLP component
is using the ontologies and facilitating information retrieval. We are using GATE [
          <xref ref-type="bibr" rid="ref3 ref4">3,
4</xref>
          ] as the NLP engine; the information extraction system is ANNIE (A Nearly-New
Information Extraction), and extraction rules are coded in JAPE (Java Annotation
Patterns Engine). This paper is organized as follows: We start by presenting the
SMART Protocols ontology and the SIRO model for minimal information, section 2.
We then introduce our NLP component, section 3. Discussion and conclusions are
then presented.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Semantics in SMART Protocols</title>
      <p>
        The development of the SP ontology, was the first step [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], then the SIRO model
followed. Use cases making use of semantics, NLP and information retrieval guided
the process. Both the ontology and SIRO benefited from the continuous use of NLP
techniques in support of harvesting terminology and identifying meaningful parts of
speech (PoS) such as actions in the narratives. NLP was also used to semantically
enrich the protocols based on the identified terminology. The gazetteers and rules of
extraction were developed iteratively; as terminology and PoS were identified and
validated manually, rules were being defined, tested, and validated against the accuracy
of extracted protocols.
      </p>
      <sec id="sec-2-1">
        <title>2.1 The SP Ontology</title>
        <p>
          The SMART Protocols (SP) ontology is an application ontology designed to
facilitate the representation of experimental protocols in two ways, as a document and
as a workflow [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Our ontology reuses the Basic Formal Ontology (BFO). We are
also reusing the ontology of relations (RO) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] to characterize concepts. In
addition, each term from the SP ontology is represented by annotation properties
imported from OBI Minimal metadata7. The classes, properties and individuals are
represented by their respective labels to facilitate the readability. The prefix
indicates the provenance for each term. The document module of SP (henceforth
SPdocument)8 document) aims to provide a structured vocabulary of concepts to
represent information for recording and reporting an experimental protocol. The class
iao:information content entity and its subclasses iao:document, iao:document
part, iao:textual entity and iao:data set were imported from IAO. This
module also represents metadata such as sp:title of the protocol, sp:purpose of
the protocol, sp:application of the protocol, sp:reagent list,
sp:equipment and supplies list, sp:manufacturer, sp:catalog number and sp:storage
conditions.
        </p>
        <p>The workflow module9 represents the ”steps/instructions”, ”actions” and
experimental inputs such as ”reagents”, ”instruments”, and ”samples/speciments” for
enacting the workflow. The representation of executable elements of a protocol
(instructions or steps), is modeled by reusing and expanding ”experimental actions” from
EXACT; in addition, we reused and expanded terminology related to ”instruments”,
”reagents/chemical compounds”, ”organisms” and ”sample/specimen” from
ontologies such as: OBI, BAO, EFO, ERO, NCBI Taxonomy and ChEBI. The
representation of steps/instructions is modeled with the class sp:protocol instruction, The
property sp:has instruction, is used to define instructions involved in protocols.
The order in which these instructions should be executed is captured by the property
”is preceded by” and ”precedes” from BFO.</p>
        <p>
          Our running example, ”Extraction of total RNA from fresh/frozen tissue (FT)”
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], is illustrated in figures 1 and 2 as an SMART Protocol. The application of
        </p>
        <sec id="sec-2-1-1">
          <title>7 http://obi-ontology.org/page/OBI Minimal metadata 8 http://vocab.linkeddata.es/SMARTProtocols/sp-documentV2.0.htm 9 http://vocab.linkeddata.es/SMARTProtocols/sp-workflowV2.0.htm</title>
          <p>the SP-Document module is presented in table 1 and Fig. 1; metadata elements are
organized in SP-Document as ”textual entities”. Modeling the workflow aspects,
SPWorkflow module is presented in table 2 and Fig. 2; the first column in table 2 includes
frequently used instructions that should be executed in protocols for the extraction of
nucleic acids. The second column includes instructions extracted from the protocol.
Olga Giraldo domain expert and author of this papersemi automatically extracted
the information for each metadata element and identified instructions frequently used
in different types of protocolsmainly in molecular biology.</p>
          <p>Extraction of total RNA from fresh/frozen tissue
(FT)
Kim M. Linton, Yvonne Hey, Sian Dibben, Crispin
J. Miller, Anthony J. Freemont, John A. Radford,
and Stuart D. Pepper</p>
          <p>DOI:10.2144/000113260
sp:title of the protocol
sp:author name
sp:protocol identifier
Descriptive metadata
sp:application of the protocol
”Methods comparison for high-resolution
transcriptional analysis of archival material on
Affymetrix Plus 2.0 and Exon 1.0 microarrays”
”The extraction method (steps 221) is taken from
sp:provenance of the protocol the method supplied with TRIzol reagent
(Invitrogen, Paisley, UK).”
Metadata about the materials used
sp:specimen name ”tumor tissue”
sp:reagent name ”TRIzol”
sp:manufacturer ”Invitrogen”
name
sp:reagent name ”Chloroform” sp:manufacturer
”Sigmaname Aldrich”
sp:reagent name ”Ethyl alcohol” sp:manufacturer
”Sigmaname Aldrich”
sp:reagent name ”Isopropyl alcohol” sp:manufacturer
”Sigmaname Aldrich”
sp:equipment or supplies name ””FToisrscueepss”t,or”aSgcealcpoenl”ta,i”nSecra”l,p”eHlohmolodgeern”izer blades”,</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 The Sample Instrument Reagent Objective (SIRO) model</title>
        <p>SIRO represents the minimal information for describing an experimental protocol. In
doing so, it serves two purposes. Firstly, it extends and structures available metadata
for experimental protocols, for instance, author, title, date, journal, abstract, and
other properties that are available for published experimental protocols usually as
PDFs. SIRO extends this layer of metadata by aggregating information about Sample,
Instrument, Reagent and Objective hence the name. If this information is part of the
abstract or the full content, SIRO extracts and structures it as Linked Open Data
(LOD). Secondly, SIRO, in combination with NLP and semantics, provides an anchor
and structure for the minimal common data elements in experimental protocols. This
makes it possible to find specific information about the protocol; if the owner of the
protocol chooses not to expose the full content, as in the case of publishers and/or
laboratories, SIRO may be exposed without compromising the full content of the
document.</p>
        <p>
          SIRO was developed after the SP ontology; Fig. 3 (step 2) illustrates the
development process. The identification of common elements involved the following activities.
Our ”kick-off ” phase started by redefining the use cases focusing on the identification
of commonalities; it also entailed preparing the material to be used, e.g., ontologies,
protocols and planning. Our main input was the SP Ontology and our corpus of
documents. We then started to manually identify commonalities across protocols and,
map these to the SP ontology as well as to ontologies in Bioportal10, and OntoBee11.
This Domain Analysis and Knowledge Acquisition (DAKA) phase allowed us to
gather common terminology with a raw classification. Our Linguistic and Semantic
Analysis (LISA) was carried out in parallel with DAKA. LISA allowed us to
automatically classify and identify the terminology we were gathering; LISA was extensively
supported by GATE [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ]. The outcome allowed us to determine higher abstractions
to which the terminology thus gathered could be mapped, e.g., ”sample”, ”reagent ”
and ”instrument ”. It also allowed us to recognize that, although the description of the
objective was a common element, it was scattered throughout the narrative without
an anchor.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Natural Language Processing for SMART Protocols</title>
      <p>The SMART Protocols ontology models the execution of the workflow and relates
reagents and instruments to steps in the workflow. Ontologies provide the gazetteers
with the necessary knowledge for annotating beyond entity recognition. The gazetteers
10 http://bioportal.bioontology.org/
11 http://www.ontobee.org/index.php
make it possible to identify SIRO elements in the narrative; they are structured with
information such as definition, URIs, provenance, synonyms, etc. Gazetteers based on
ontologies have context; rules making use of these gazetteers find meaningful parts
of speech in the text. GATE uses the gazetteers and the rules for annotating the
documents. In this way, it is possible to differentiate between ”centrifuge” as an
action and ”centrifuge” as an instrument. Furthermore, the rule engine in GATE, in
combination with the gazetteers, make it possible to find statements related to, for
example, ”precipitation instructions” that include words related to an action such as
”centrifuge” and a reagent like ”isopropanol ” (used to facilitate the precipitation of
DNA) see Fig. 6 for a more complete example.</p>
      <sec id="sec-3-1">
        <title>3.1 Gazetteers and rules for NLP</title>
        <p>The development of the SEmantic GAzetteers (SEGA) was a very complex and
domain knowledge intensive activity. Developing the gazetteers entailed the
identification of ontologies with terminology related to ”sample/specimen” (including
organisms), ”instruments” and ”reagents”. Ontology repositories and their corresponding
Application Programming Interfaces (APIs) were reviewed so that the process could
be automated see Fig. 3, step 3 ”Review of Ontologies”. The ontologies identified
during the development of SP ontology were then more carefully inspected; overlaps were
identified, and availability of metadata for each term, object properties and complexity
in the classification were addressed. For ”organisms” (related to sample/specimen),
the NCBI Taxonomy was chosen. For ”instrument ”, the choices included EFO, ERO,
OBI and SP ontology. For ”reagents” and ”chemical compounds”, ChEBI and SP
were selected Fig. 3, step 3 ”Selection of ontologies”. During the stage ”Extraction
of Terms” see Fig. 3, step 3, we focused on enriching the terminology; depending on
the limitations of the SPARQL endpoints for OntoBee and Bioportal, we were using
either SPARQL queries or locally parsing the ontologies. The terminology was
gathered with the corresponding annotation properties. Axioms and annotation properties
were used to, for instance, discriminate if a term is synonymous with another term
due to a case of acronyms or common name.</p>
        <p>
          At this point, we had some gazetteers with over half a million terms. For quality
control, we then started the depuration of the terminology. We removed the terms
that had comments from the curators about the suitability of the terms in specific
sub-domains. For instance, the class ”cell harvester ” (OBI 0001119) has a specific
comment ”A device that is used to harvest cells from microplates and deposit samples
on a filter mat. NOT AN INSTRUMENT ”. We also removed terminology that was
reused across ontologies. For instance, the OBI class ”thermal cicler ” is reused by SP
and EFO. In this particular case, we use the term only once and from the original
source -OBI. Classes with the same label represented in several ontologies with
different axioms were conserved. For instance, SP reuses from the Sequence Ontology (SO)
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] the class ”forward primer ”; OBI also includes a class ”forward PCR primer ”
(alternative term: forward primer). Once the terminology was cleaned, we then started
the Generation of the Gazetteers; the gazetteers are used by GATE and together
with the rules they support the NLP. GATE is based on a pipeline architecture,
composed by Processing Resources (PR). Each PR has a specific function within the text
processing (e.g. to create tokens and to tag PoS). We used ANNIE (A Nearly-New
Information Extraction) as our information extraction system. We used the default
ANNIE Gazetteer to build the gazetteers with less than 1 million terms per ontology
and subdomain (Fig. 4); the gazetteers were configured as non case sensitive. For
terms with synonyms, each synonym was added as an independent term, including
features such as labels and URIs. To facilitate the recognition of terms varying from
the corresponding roots, e.g. singular and plural, the gazetteers were nested into a
Flexible Gazetteer (Fig. 4); this allows the extraction of the root for each token to be
analyzed by a Morphological Analyzer. We also used a large KB Gazetteer to store
sets of over 1 million terms related to organisms (Fig. 4). To facilitate data storage,
we used a non-relational database and connected it to GATE.
        </p>
        <p>For Testing the Gazetteers, we followed a manual process against our corpus of
documents. Documents were loaded into GATE, and words related to SIRO elements
were identified and annotated. We evaluated the following aspects, i) execution time,
ii) correctness in the annotation of the terms and their synonyms, iii) failures in the
recognition of terms in the texts, and iv) identification of terms incorrectly annotated,
namely a word with different meaning For example, the word ”cat ” is a term from
NCBI Taxonomy used to represent the common name of ”Felis catus”, but cat (or
cat., Cat, CAT) also represents the short word for ”catalog ”. From the gazetteers,
linguistic patterns were identified so that The Iterative Rule Writing (see Fig. 5) step
could commence.</p>
        <p>We are using JAPE (Java Annotation Patterns Engine) to code the rules. In this
stage, we are designing rules to automate the identification of meaningful elements in
the narrative. This step runs iteratively with previous stages; as linguistic structures
and meaningful PoS, e.g. instructions, are characterized, then rules are written, tested
and improved. Ontologies and domain terminology is mapped to the corresponding
vocabularies.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Discussion and Conclusions</title>
      <p>
        We have presented our approach to the Semantics for representing experimental
protocols, the SP ontology and the SIRO model. The SP ontology is composed of two
modules, namely SP-document and SP-workflow. In this way, we represent the
workflow, document and domain knowledge implicit in experimental protocols. Actions,
as presented by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], are important descriptors for biomedical protocols; however, in
order for actions to be meaningful, attributes such as measurement units and
material entities (e.g., sample, instrument, reagents, personnel involved, etc.) are also
necessary. Modularization, as it has been implemented in SP, facilitates specializing
the ontology with more specific formalisms; this makes it easier for laboratories to
adapt the ontology to their needs. For instance, reagents, instruments and
experimental steps (actions), could be specialized based on the activities carried out by a
particular laboratory. The document module facilitates archiving; the structure also
allows to have fully identified reusable components.
      </p>
      <p>The SIRO model for minimal information breaks down the protocol in key
elements that we have found to be common to our corpus of experimental protocols: i)
Sample/ Specimen (S), ii) Instruments (I), iii) Reagents (R) and iv) Objective (O).
For the sample, it is considered the strain, line or genotype, developmental stage,
organism part, growth conditions, pre-treatment of the sample and volume/mass of
sample. For the instruments, it is considered the commercial name, manufacturer and
identification number. For the reagents, it is considered the commercial name,
manufacturer and identification number; it is also important to know the storage conditions
for the reagents in the protocol. Identifying the objective or goal of the protocol helps
readers to make a decision about the suitability of the protocol for their
experimental problem. SIRO and the SP Ontology facilitate the generation of a self-describing
document with structured annotation.</p>
      <p>Our NLP layer makes use of the semantics we have defined. We currently have
six gazetteers with over 1.400.000 terms in all; these terms will be further refined and
then added to the SP ontology. The gazetteers are currently reusing terminology from
EFO, ERO, OBI, NCBI Taxonomy and ChEBI; we will continue adding terminology
from other ontologies and also adding more documents to our corpus. We are
making use of existing infrastructure provided by BioPortal and OntoBee. For managing
large ontologies, we are not using their respective SPARQL endpoints but locally
parsing them, e.g. NCBI Taxonomy and ChEBI. Our Semantics plus NLP infrastructure
makes it possible to retrieve information where specifics from the protocols are used
to construct the query. Our NLP layer is able to extract SIRO automatically; we have
encountered issues with the free narrative often used for describing the objectives.</p>
      <p>Experimental protocols are meant to capture a complex and nested set of roles
actions, derivations of original plans, actions executed by personnel, robots taking care
of some specific steps in the workflow, computational workflows often used in support
of laboratory work, data being produced at every step of the workflow, etc.
Representing and enacting all of these is not a simple task; laboratories require flexibility
in their conceptual models so that parameterizing their own workflows won’t become
an overwhelming task. The laboratories only carry out a limited set of actions over a
limited set of samples; high-level abstractions for general process models are needed.
These could be made more concrete as workflow constructs, samples, roles, actions,
reagents, instruments, etc. are aggregated. Representing the execution requires the
confluence of metadata that allows tracking of everything that has occurred, who has
done it, how, where, etc. Our ontology model may easily be extended and adapted
to these realities. The metadata schemata to represent laboratory protocols should
be kept independent from the workflow enactors; robots will surely have their own
procedural languages. The descriptive schemata should interoperate with the
workflow enactors. The SP ontology was conceived considering all of these; our use cases
are incrementally becoming more complex as we are moving from protocols published
in journals to those registered in laboratory notebooks needless to say, that gaining
access to laboratory notebooks is not easy.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>LG.</given-names>
            <surname>Acevedo</surname>
          </string-name>
          et al. “
          <article-title>Genome-scale ChIP-chip analysis using 10,000 human cells”</article-title>
          .
          <source>In: Biotechniques 43.6</source>
          (
          <issue>2007</issue>
          ), pp.
          <fpage>791</fpage>
          -
          <lpage>797</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A</given-names>
            <surname>Booth</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Brice</surname>
          </string-name>
          . Formulating answerable questions. Ed. by Editor.
          <source>A.B. Booth A (Eds)</source>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          et al. “
          <article-title>GATE - a General Architecture for Text Engineering”</article-title>
          .
          <source>In: Computers and the Humanities</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          et al. “
          <article-title>Getting more out of biomedical documents with GATE's full lifecycle open source text analytics”</article-title>
          .
          <source>In: PLoS Comput Biol 9.2</source>
          (
          <issue>2013</issue>
          ),
          <year>e1002854</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Federhen</surname>
          </string-name>
          . “
          <article-title>Type material in the NCBI Taxonomy Database”</article-title>
          .
          <source>In: Nucleic Acids Res</source>
          <volume>43</volume>
          (
          <year>2015</year>
          ), pp.
          <fpage>D1086</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Giraldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Oscar</surname>
          </string-name>
          . “SMART Protocols:
          <article-title>SeMAntic RepresenTation for Experimental Protocols”</article-title>
          .
          <source>In: 4th Workshop on Linked Science 2014- Making Sense Out of Data (LISC2014)</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hastings</surname>
          </string-name>
          et al. “
          <article-title>The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013”</article-title>
          .
          <source>In: Nucleic Acids Res</source>
          <volume>41</volume>
          (
          <year>2013</year>
          ), pp.
          <fpage>D456</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Malone</surname>
            <given-names>J.</given-names>
          </string-name>
          et al. “
          <article-title>Modeling sample variables with an Experimental Factor Ontology”</article-title>
          .
          <source>In: Bioinformatics 26.8</source>
          (
          <issue>2010</issue>
          ), pp.
          <fpage>1112</fpage>
          -
          <lpage>1118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.D.</given-names>
            <surname>King</surname>
          </string-name>
          et al. “
          <article-title>On the formalization and reuse of scientific research”</article-title>
          .
          <source>In: J R Soc Interface</source>
          <volume>8</volume>
          .63 (
          <year>2011</year>
          ), p.
          <fpage>1440</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>R.D. King</surname>
          </string-name>
          et al. “
          <article-title>The automation of science”</article-title>
          .
          <source>In: Science 324.5923</source>
          (
          <year>2009</year>
          ), p.
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>K.M. Linton</surname>
          </string-name>
          et al. “
          <article-title>Extraction of total RNA from fresh/frozen tissue (FT)”</article-title>
          . In: The
          <source>International Journal of Life Science Methods</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rocca-Serra</surname>
          </string-name>
          et al.
          <source>Release candidate 1</source>
          ,
          <string-name>
            <surname>ISA-TAB</surname>
          </string-name>
          <year>v1</year>
          .
          <article-title>0 specification document</article-title>
          , version
          <year>24th</year>
          .
          <year>2008</year>
          , p.
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Brinkman R.R</surname>
          </string-name>
          . et al. “
          <article-title>Modeling biomedical experimental processes with OBI”</article-title>
          .
          <source>In: Journal of Biomedical Semantics 1.1</source>
          (
          <issue>2010</issue>
          ), p.
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          et al. “
          <article-title>Relations in biomedical ontologies”</article-title>
          .
          <source>In: Genome Biology 6.5</source>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>and others Smith. “The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration”</article-title>
          .
          <source>In: Nature Biotechnology 25.11</source>
          (
          <year>2007</year>
          ), pp.
          <fpage>1251</fpage>
          -
          <lpage>1255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Soldatova</surname>
          </string-name>
          et al. “
          <article-title>The EXACT description of biomedical protocols”</article-title>
          .
          <source>In: Bioinformatics</source>
          <volume>24</volume>
          .13 (
          <year>2008</year>
          ), pp.
          <fpage>i295</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.N.</given-names>
            <surname>Soldatova</surname>
          </string-name>
          and
          <string-name>
            <surname>K.R. D.</surname>
          </string-name>
          “
          <article-title>An ontology of scientific experiments”</article-title>
          .
          <source>In: Journal of the royal society interface 3</source>
          .11 (
          <year>2006</year>
          ), p.
          <fpage>795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.N.</given-names>
            <surname>Soldatova</surname>
          </string-name>
          et al. “
          <article-title>Evolving BioAssay Ontology (BAO): modularization, integration and applications</article-title>
          .”
          <source>In: J Biomed Semantics 5.Suppl 1 Proceedings of the BioOntologies Spec Interest G</source>
          (
          <year>2014</year>
          ),
          <fpage>S5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.F.</given-names>
            <surname>Taylor</surname>
          </string-name>
          et al. “
          <article-title>Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project”</article-title>
          .
          <source>In: Nat Biotechnol 26.8</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>889</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Torniai</surname>
          </string-name>
          et al. “
          <article-title>Developing an Application Ontology for Biomedical Resource Annotation and Retrieval: Challenges and Lessons Learned”</article-title>
          . In: ICBO: International Conference on Biomedical Ontology. Buffalo,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA., pp.
          <fpage>101</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>