<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology representation and ANOVA analysis of vaccine protection investigation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yongqun He</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zuoshuang Xiang</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Todd</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Melanie Courtot</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryan Brinkman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Zheng</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian J. Stoeckert Jr.</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Malone</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Rocca-Serra</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susanna-</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Assunta Sansone</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jennifer Fostel</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Larisa N. Soldatova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bjoern Peters</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aberystwyth University</institution>
          ,
          <addr-line>Wales</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>British Columbia Cancer Agency</institution>
          ,
          <addr-line>Vancouver</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Center for Bioinformatics, Department of Genetics, University of Pennsylvania School of Medicine</institution>
          ,
          <addr-line>Philadelphia, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Global Health Sector, SRA International, Inc</institution>
          ,
          <addr-line>Durham, NC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>La Jolla Institute for Allergy and Immunology</institution>
          ,
          <addr-line>La Jolla, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Science Commons</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>The European Bioinformatics Institute</institution>
          ,
          <addr-line>Cambridge</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>University of Michigan</institution>
          ,
          <addr-line>Ann Arbor</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>It is still challenging to represent statistical analysis of experimental data in a semantic framework. As a first step towards this goal, ontological representation of statistical ANOVA analysis is proposed. In a vaccine protection use case, 151 instance data of Brucella vaccine protection investigation were collected from the literature and analyzed using ANOVA. Out of 16 parameters, 10 were found statistically significant in contributing to the protection. The careful study of these instances led to building and validating an OBI-based semantic framework to formally represent ANOVA. An ontology-based representation and statistical analysis of biomedical data allows data consistency checking and data sharing in the Semantic Web. Contact: yongqunh@med.umich.edu</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Ontology for Biomedical Investigations (OBI) is being developed to address the
need for a common, integrated ontology for the description of biological and clinical
investigations. OBI has been used in experimental investigations in different
communities, for example, Bioinvindex (http://www.ebi.ac.uk/bioinvindex), isa-tools
(http://isatab.sourceforge.net/), and IEDB (http://www.immuneepitope.org/). In our
recent study, we used OBI and other ontologies to represent an investigation of vaccine
protection against influenza viral infection
        <xref ref-type="bibr" rid="ref1">(Brinkman et al, 2010)</xref>
        . The vaccine
protection investigation measures how efficient a vaccine or vaccine candidate induces
protection against virulent pathogen infection in vivo.
      </p>
      <p>
        While ontology representation of experimental assays in terms of material inputs and
data outputs provide a foundation for further data sharing and semantic web studies of
specific domains, it is still challenging to apply semantic frameworks to statistical
analysis of instance data. OntoDM is a newly proposed ontology of data mining
        <xref ref-type="bibr" rid="ref2 ref3">(Panov
et al., 2009)</xref>
        that provides a framework and describes entities from the domain of data
mining and knowledge discovery. OntoDM is aligned with OBI. The updated OBI has
included many statistical terms (e.g., ANOVA, F-test, t-test) and relevant supports that
facilitate statistical analysis.
      </p>
      <p>
        The community-based Vaccine Ontology (VO;
http://www.violinet.org/vaccineontology/) is a biomedical ontology that covers the
vaccine domain
        <xref ref-type="bibr" rid="ref2">(He et al, 2009)</xref>
        . Development of VO has emphasized classification of
vaccines and vaccine components, vaccination investigation, and host responses to
vaccines. The VO development follows the OBO Foundry principles
        <xref ref-type="bibr" rid="ref6">(Smith et al.,
2007)</xref>
        . VO uses the Basic Formal Ontology (BFO) (Grenon et.al, 2004) as the top-level
ontology. OBI is used as another upper level ontology for vaccine investigation. VO
uses relations defined by primarily the Relation Ontology (RO)
        <xref ref-type="bibr" rid="ref5">(Smith et al., 2005)</xref>
        and
also by OBI and the Information Artifact Ontology (IAO) ontologies. The close
association with these ontologies facilitates data integration and automated reasoning.
In this report, we first introduce our ontology representation of the ANOVA statistical
analysis, and then apply it to investigate the Brucella vaccine protection results curated
from the literature. Brucella is an intracellular bacterium that causes brucellosis, the
most common zoonotic disease worldwide. In this study, we hypothesized that some
experimental variables significantly contribute to Brucella vaccine protection efficacy
while others do not. Our study indicates that relying on a semantic framework such as
OBI and OntoDM is a useful approach to support biomedical statistical data analyses.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>The following methods were applied in this study:



</p>
    </sec>
    <sec id="sec-3">
      <title>Ontology representation of ANOVA Statistical analysis: The analysis of</title>
      <p>variance (ANOVA) was modeled primarily in OBI. A design pattern was
generated. The use case in this study is ANOVA in terms of a linear model.</p>
    </sec>
    <sec id="sec-4">
      <title>Ontology-based representation of vaccine protection investigation: All</title>
      <p>variables in this use case are represented using different ontologies as needed.
The main ontologies used include VO, OBI, and IAO.</p>
    </sec>
    <sec id="sec-5">
      <title>Literature curation of individual Brucella vaccine protection data: Peer</title>
      <p>reviewed Brucella vaccine protection research papers were obtained from
PubMed search. These papers were manually curated to identify variables and
extract values taken by these variables potentially important for vaccine
protection efficacy investigation. The data were stored in an OWL file.</p>
    </sec>
    <sec id="sec-6">
      <title>Ontology-based ANOVA analysis of Brucella vaccine protection results:</title>
      <p>ANOVA was applied to study the Brucella vaccine protection investigation
instance data. The results were also represented in an ontology.</p>
    </sec>
    <sec id="sec-7">
      <title>3. Results</title>
      <p>We will first introduce how ANOVA is modeled in OBI. The ontology representation
of vaccine protection investigation using VO and OBI is then described. Using
literature curated data we will last introduce how the vaccine protection results are
analyzed by ANOVA and modeled using ontology.</p>
      <sec id="sec-7-1">
        <title>3.1. Ontology design pattern of ANOVA data analysis</title>
        <p>The analysis of variance (ANOVA) provides a statistical test of whether or not the
means of several groups are all equal. In statistics, ANOVA includes a collection of
statistical models (e.g., linear models), and their associated procedures, in which the
observed variance is partitioned into components due to different explanatory variables.
The ontology-based ANOVA data analysis design pattern is illustrated in Fig. 1.
ANOVA is a subclass of data transformation process in OBI. F-test is part of ANOVA
process. ANOVA has specified input of data item. The individual data items come
from two sources. The data items are possibly the output of individual processes (e.g.,
CFU reduction assay). Alternatively, a data item can be an output of a discretization
process that discretizes non-measurable data (e.g., mouse age) into categorized
measurement data (e.g., 1 for young mouse, 2 for middle-aged mouse, and 3 for old
mouse). One approach to obtain the data items necessary for ANOVA analysis is
through data item extraction from journal article (IAO_0000443). In this case, the input
is journal article, and the output is data. The ANOVA output is a p-value data set,
which includes a set of p-value results for an independent variable data set that is
predefined.</p>
        <p>ANOVA is concretization of ANOVA protocol. The ANOVA protocol includes a
predictive model that specifies a testable hypothesis model (Fig. 1).</p>
      </sec>
      <sec id="sec-7-2">
        <title>3.2. Ontology representation of Brucella vaccine protection investigation</title>
        <p>
          A vaccine protection investigation includes three processes (or steps): vaccination,
pathogen challenge, and vaccine protection efficacy assessment. For those pathogens
that kill a model animal (e.g., mouse), survival assessment is used for assessing vaccine
protection efficacy
          <xref ref-type="bibr" rid="ref1">(Brinkman et al, 2010)</xref>
          . Since virulent Brucella does not kill mice,
the survival of pathogen challenged mice is not a useful method to assess Brucella
vaccine efficacy. Instead, a colony forming unit (CFU) reduction assay is used to
determine the difference of live bacterial recovery from vaccinated mice and
nonvaccinated mice
          <xref ref-type="bibr" rid="ref4">(Schurig et al., 1991)</xref>
          .
        </p>
        <p>
          To prove vaccine protection efficacy, a vaccine protection investigation using a specific
animal model is often required. In this process, many variables may affect the
outcomes. We summarized 17 variables that are described in typical vaccine protection
studies. The ontology terms of these 17 variables are summarized in Table 1.
As an example of this Brucella vaccine protection investigation, Brucella abortus cattle
vaccine RB51 was used in a typical vaccine protection study as reported in reference
          <xref ref-type="bibr" rid="ref4">(Schurig et al., 1991)</xref>
          . In this typical mouse experiment, live RB51 (1 x 108 CFU) was
used to vaccinate Balb/C mice, and the mice were challenged with B. abortus strain
2308 (1 x 105 CFU) 8 weeks later. CFU reduction in mouse spleen was then counted to
determine the vaccine protection. An ontology representation of this example is shown
in Fig. 2.
        </p>
        <p>The experimental hypothesis is “Some experimental variables statistically significantly
contribute to Brucella vaccine protection efficacy”. This hypothesis can be laid out as
an instance of the hypothesis entity text.
3.3. ANOVA analysis of Brucella vaccine protection results from literature curation
Brucella vaccine research is an active research area with more than 1,000
peerreviewed papers stored in PubMed. To determine which variables play significant roles
in changing the Brucella vaccine protection efficacy, more than 40 papers were
manually curated to get instance data that correspond to these variables. In total, 151
instance data were collected from the literature. In this study, we only focused on mice
as the animal model. Different mouse strains were analyzed in our use case
investigation. Each instance of vaccine protection investigation has individual values
for all 17 variables (Table 1).</p>
        <p>To analyze which variables contribute to the vaccine protection, the significance of
vaccine protection (three values: no protection, protection, enhanced protection) is set
as a dependent variable, and the other 16 variables are independent variables. An
ANOVA analysis was performed and indicated that six variables do not statistically
significantly contribute to the protection (p-value &gt; 0.05). These six variables include
IL-12 vaccine adjuvant, mouse sex, vaccination route, mouse age at vaccination,
vaccination-challenge interval, and challenge dose. The other 10 parameters
statistically significantly contribute to the vaccine protection (p-value &lt; 0.05).
The predictive model is “Protection_Significance ~ .” indicating we are testing how
each other variable affects the protection significance. This linear model representation
can be understood and processed by statistical software programs such as R
programming.
Note: The first variable is the dependent variable, and the others are independent
variables. The last six variables did not contribute to the vaccine protection (p-value &lt;
0.05).</p>
        <p>This use case was used to derive an instance level representation based on the formal
semantic representation of ANOVA analysis (Fig. 1 and 2, Table 1). Specifically, to
represent this use case ANOVA data analysis using ontology, we defined a ‘vaccine
protection ANOVA’ (VO_0000572) under ‘ANOVA’. This ANOVA has vaccine
protection efficacy as dependent variable and 16 other independent variables (Table 1).
All values for individual variables were obtained from literature curation. A hypothesis
was also generated as an instance of the ‘hypothesis textual entity’. The 151 instance
data of this use case study was represented in OWL format. Each set of instance data is
defined under an instance of ‘vaccine protection investigation’. The ANOVA output is
a p-value data set that corresponds to a list of p-values for different independent
variables.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4. Discussion</title>
      <p>The advantage of ontology-based statistical analysis is that the results can be
potentially shared and used worldwide through semantic explicit representation. Also,
ontology based approach facilitates data consistency checking. For a specific variable
(e.g., vaccine strain) from a biomedical investigation, specific instances are generated
and match to the variable (e.g., RB51 as an instance of vaccine strain). In our use case,
many subclasses also act as instances for parent class variables. For example, RB51 is a
subclass of vaccine strain. If a vaccine strain instance does not belong to a vaccine
strain, it indicates the data is not right. Existing OWL reasoners, e.g., Pellet
(http://clarkparsia.com/pellet) and FACT++ (http://owl.man.ac.uk/factplusplus/), can
be effectively leveraged to detect inconsistencies in statistical analysis representation.
There are still many challenges in modeling statistical analyses using ontology. For
example, there is, so far, no consistent representation of the null hypothesis in
statistical analysis. However, the example we described in this report provides a first
demonstration that it is feasible and provides more powerful features than traditional
statistical analysis without ontology and semantic support. However, ANOVA has been
chosen in the first place, as it is such an important tool in life science. ANOVA is a
special case of linear model analysis, so experience gained from applying formal
semantics to ANOVA could be beneficial for some more advanced representation of
such linear models.</p>
      <p>Besides the null hypothesis generation using ontology, we also plan to generate
different types of ANOVA (e.g., one-way ANOVA and factorial ANOVA) and
different models (e.g., linear model and randomization-based model) in OBI. Many
free and commercial software packages supporting ANOVA are available in the
Software Ontology (www.ebi.ac.uk/efo/swo). It is desired to include the ANOVA
software programs as part of the proposed ontology. OBI inherently provides
provenance and therefore linkage to an external provenance ontology is not required.
Ontology representation of vaccine protection study provides an advanced approach to
represent and mine vaccine-induced protection experimental processes. More than 400
vaccines and the data of protection studies with these vaccines have been manually
curated and stored in the VIOLIN vaccine database system (Xiang et al., 2008). To
make full use of the VIOLIN vaccine data for advanced query and integration with data
from other data sources, we plan to apply the ontology-based approach learned from
this Brucella study to other vaccine protection data in VIOLIN.</p>
      <p>Our method of ontology-based representation and statistical analysis is applicable for
other ontology-based statistical studies. The logical definitions of the ontology entities
involved allow computers to unambiguously understand and integrate different
biological data with the help of an OWL reasoner. We anticipate that more statistical
analyses will be represented in ontology, and ontology-based statistical methods will be
applied for shared data analysis, data exchange, and automatic reasoning. Various new
software programs will most likely be developed in the future to take advantage of this
novel semantic framework.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>This research is supported by NIH grants R01AI081062 and U54-DA-021519.</p>
      <p>Xiang Z, Todd T, Ku KP, Kovacic BL, Larson CB, et al. (2008) VIOLIN: vaccine
investigation and online information network. Nucleic Acids Res. 36 (Database
issue): D923-8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Brinkman</surname>
            <given-names>RR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courtot</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derom</surname>
            <given-names>D</given-names>
          </string-name>
          , et al. (
          <year>2010</year>
          )
          <article-title>Modeling biomedical experimental processes with OBI</article-title>
          .
          <article-title>Journal of Biomedical Semantics</article-title>
          . In press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>He</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cowell</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diehl</surname>
            <given-names>AD</given-names>
          </string-name>
          , et al. (
          <year>2009</year>
          )
          <article-title>VO: Vacccine Ontology</article-title>
          .
          <source>International Conference on Biomedical Ontology (ICBO)</source>
          ,
          <issue>24</issue>
          <year>July 2009</year>
          .
          <string-name>
            <given-names>Nature</given-names>
            <surname>Precedings</surname>
          </string-name>
          . Available at web site: http://precedings.nature.com/documents/3552/version/1.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Panov</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soldatova</surname>
            <given-names>LN</given-names>
          </string-name>
          , Dzeroski S. (
          <year>2009</year>
          )
          <article-title>Towards an Ontology of Data Mining Investigations</article-title>
          .
          <source>Proceedings of the 12th International Conference on Discovery Science</source>
          , Porto, Portugal.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Schurig</surname>
            <given-names>GG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roop</surname>
            <given-names>RM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagchi</surname>
            <given-names>T</given-names>
          </string-name>
          , et al. (
          <year>1991</year>
          )
          <article-title>Biological properties of RB51; a stable rough strain of Brucella abortus</article-title>
          .
          <source>Vet Micobiol</source>
          ,
          <volume>28</volume>
          (
          <issue>2</issue>
          ) :
          <fpage>171</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Smith</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceusters</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klagges</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohler</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lomax</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neuhaus</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rector</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosse</surname>
            <given-names>C</given-names>
          </string-name>
          (
          <year>2005</year>
          )
          <article-title>Relations in Biomedical Ontologies</article-title>
          .
          <source>Genome Biology</source>
          ,
          <volume>6</volume>
          :
          <fpage>R46</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          et al. (
          <year>2007</year>
          )
          <article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>
          ,
          <source>Nature Biotechnology</source>
          ,
          <volume>25</volume>
          :
          <fpage>1251</fpage>
          -
          <lpage>1255</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>