<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Formalizing the Representation of Immune Exposures for Human Immunology Studies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Randi Vita</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James A. Overton</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kei-Hoi Cheung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven H. Kleinstein</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bjoern Peters</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Emergency Medicine and Yale Center for Medical Informatics, Yale School of Medicine</institution>
          ,
          <addr-line>New Haven, CT</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>La Jolla Institute for Allergy and Immunology</institution>
          ,
          <addr-line>La Jolla, California</addr-line>
          ,
          <country country="US">U.S.A</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>-Human immunology studies typically examine how immune exposures associated with vaccinations, infectious, allergic or autoimmune diseases, or transplantations perturb the immune system with the goal to develop diagnostic tools and therapeutic interventions. While there are established approaches to formally represent the experimental data generated in such studies, which often comprises gene expression data, flow cytometry data, or serology data, the description of the immune exposures themselves is not well standardized. We here present a formal approach to represent immune exposures at a high level of granularity. We capture the exposure process (e.g. 'vaccination' or 'occurrence of allergic disease'), exposure material (e.g. 'Tdap vaccine' or 'House dust mite'), and the associated disease name and stage (e.g. 'allergic rhinitis' and 'chronic'). This representation scheme has been used successfully in the IEDB and an extended version has been adopted by HIPC to capture studies in ImmPort. We are reporting here on this scheme, our ongoing attempts to map the terms used to existing ontologies, and the challenges encountered.</p>
      </abstract>
      <kwd-group>
        <kwd>immune exposure</kwd>
        <kwd>modeling</kwd>
        <kwd>HIPC</kwd>
        <kwd>ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        The Immunology Database and Analysis Portal (ImmPort)
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is the primary resource to capture human immunology
studies funded by the National Institute of Health, Division of
Allergy, Immunology and Transplantation. ImmPort provides
structured data fields to capture a variety of different
experimental data and free-text fields to store meta-data on
cohorts from which subjects where recruited. This free-text
cohort description data typically contains a description of
immune exposures that are expected to perturb the immune
system. While free-text allows for a detailed account how a
given study is conducted and a cohort is defined, without
standardization, such descriptions are difficult to query and
compare across many studies in a large database such as
ImmPort.
      </p>
      <p>
        In particular, ImmPort is the designated repository for data
from studies performed by the Human Immunology Project
Consortium (HIPC) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a collaboration between a number of
3Department of Pathology, Yale School of Medicine,
      </p>
      <p>New Haven, Connecticut, U.S.A. and</p>
      <p>Interdepartmental Program in Computational
Biology and Bioinformatics, Yale University, New</p>
      <p>Haven, Connecticut, USA
of centers aimed at performing large scale human immunology
studies with a focus on profiling the human immune response
to natural infection and vaccination. A key goal of the HIPC
consortium is to cross-compare results from different centers.
To facilitate this, we set out to develop a standardized
representation of immune exposures for HIPC studies that can
be stored in ImmPort to represent their central elements in a
structured format.</p>
      <p>
        The need to represent immune exposures extends beyond
the HIPC program. Most human immunology studies examine
how the immune system responds to perturbations. Subjects are
compared across cohorts and/or at defined time points that are
intended to isolate the effect of immune exposures. The
Immune Epitope Database (IEDB) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] implemented a
structured representation of immune exposures that has been
applied to model over one million experiments in which human
samples were tested for T cell or B cell reactivity to specific
epitopes. The IEDB representation of exposures is decoupled
from the epitope mapping experiments, so we decided to test if
it could be utilized as a basis to describe immune exposures for
the HIPC program. By adapting the IEDB model for HIPC, we
have developed an even more general representation of
immune exposures that can be used by the wider scientific
community.
      </p>
    </sec>
    <sec id="sec-2">
      <title>II. APPROACH</title>
      <sec id="sec-2-1">
        <title>A. Semi-formal Immune Exposure Representation</title>
        <p>All HIPC centers funded by the middle of 2017 were asked
to supply textual descriptions of study designs that they
planned on submitting to ImmPort. We then examined the
immune exposures that were part of these study designs and
how they would be entered into the IEDB format. As a result of
this process, we found that the broader scope of HIPC
compared to the IEDB required extension of the IEDB
structured representation. In the following, we present the
resulting expanded schema to represent immune exposures for
HIPC, of which the IEDB immune exposures are a subset. This
schema has been implemented by adding columns to the
‘Human Subject Template’ spreadsheet that is used to submit
information to ImmPort.</p>
        <p>We consider four elements critical to the description of an
immune exposure, as listed as the column headers in Table I.
The ‘Exposure process’ identifies the type of process through
which a host was exposed and the type of evidence for that
exposure to have happened, which are tightly intertwined. This
is the only element of the four that was deemed mandatory.
Based on the choice made for ‘Exposure process’, other
elements are required or not applicable as listed in Table I. The
‘Exposure material’ describes what substance(s) the host was
exposed to and/or developed immune reactions to as part of the
exposure process. The ‘Disease name’ indicates the specific
disease of the host associated with the exposure being
described and lastly, the ‘Disease stage’ provides a broad
classification of how the disease progressed at the time of the
study.</p>
        <p>To illustrate how this representation was used in practice,
Table II shows three examples of studies by actual HIPC
centers that involved immune exposures, described in free text
(first column to the left), and how these were modeled using
the four elements of the exposure scheme (columns to the
right). These examples illustrate the three main types of
exposure processes, namely ‘administration’, ‘disease’, and
‘exposure without disease’.</p>
        <p>Thus, “Adults receiving a Varicella-zoster shot” would be
the result of a vaccination ‘Exposure process’ which delivered
the ‘Exposure material’ that was the Varicella-zoster virus
vaccine. No disease resulted from this immune exposure.</p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Ontology Mapping</title>
        <p>
          Our intent is to map each of the four data elements
described above to ontology terms with textual and logical
definitions, ideally derived from established ontologies
covering the various domain. For ‘Exposure process’, all
allowed values are listed in the first column of Table I. This
collection of options has been assembled by the IEDB team
over the past 13 years and has been proven to be robust and
stable, with minimal modifications occurring in the last 5
years. Each of the options come with a definition and rules
when it should be applied. These terms will be mapped to
formal external ontology terms, as initiated in Supplementary
Table S1 (https://doi.org/10.6084/m9.figshare.6741791.v1).
The main challenge in this process is that terms for e.g.
‘vaccination’, ‘infectious disease’ and ‘transplantation’ come
from different external ontologies, and presenting users their
definitions side-by-side is not helpful. We are planning to
engage representatives of different ontology communities, and
harmonize their definitions. Until this is done, we proceeded
with implementation of temporary terms for this immune
exposure model in ONTIE [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which we intend on
replacing/merging with new or edited terms in the appropriate
external ontologies.
        </p>
        <p>In addition to the main three categories of immune
exposure (administration, disease, exposure without disease)
and their subtypes, there are two options (no exposure and
unknown) which are not actual types of exposures but rather
values to signify two different reasons why it is not possible or
meaningful to fill out the exposure type for a given study
subject. The value ‘no exposure’ is intended to be used for
subjects that are enrolled as negative controls, and indicates
specifically that these subjects are *not* be exposed to
something. The value ‘unknown’ is used when samples are
from subjects for which no relevant exposure information is
available. This is applicable when, for example, a study utilizes
samples from anonymous blood bank donors in order to
establish a ‘normal range’.</p>
        <p>
          For ‘Exposure material’, the vast majority of HIPC studies
submitted to us required specifying an organism that was either
the causative agent of an infection, exposure without infection,
or utilized to vaccinate to protect against future infection.
Organisms can be specified by the broadly utilized NCBI
Taxonomy [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], which has the key advantage of linking
organism specifications to sequence information in NCBI. All
taxa from the NCBI Taxonomy are valid entries for Exposure
material, and can be looked up at
https://www.ncbi.nlm.nih.gov/taxonomy. One potential
concern with this choice is that NCBI does not assign new taxa
to every organism isolate identified, which in some cases is
desirable, such as in the case of drug resistant M. tuberculosis
isolates, where it is of interest to relate even single nucleotide
differences to efficacy of drug treatments. We expect that
going forward, there will be a developing community
consensus on how to handle this, along the lines of grouping
different isolates based on their NCBI GenBank ID under their
closest parent taxon.
        </p>
        <p>
          Not all ‘Exposure materials’ in HIPC studies submitted to
us were whole organisms. In the case of vaccinations, specific
antigens are often utilized over whole organisms such as in the
case of subunit vaccines. Also, in the case of multi-valent
vaccines, multiple organisms or antigens of organisms are
combined into one vaccine. We plan to specify vaccines
through the Vaccine Ontology
(http://www.violinet.org/vaccineontology/) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It may be
necessary to add new entries to the Vaccine Ontology to
capture new experimental vaccines, but as vaccines
administered to humans have to go through a stringent
approval process, this will not overwhelm the Vaccine
Ontology development team.
        </p>
        <p>
          To specify the ‘Disease name’, the IEDB utilizes values
from the Disease Ontology (DO) (http://disease-ontology.org/)
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], which has the advantage of providing mappings to most of
the other vocabularies that could be considered such as ICD10,
SNOMED CT, MESH and UMLS. The IEDB has been
successful in mapping the disease terms encountered in the
literature to DO terms. In addition, the Disease Ontology is
part of the OBO Foundry [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and thus more compatible with
other basic research ontologies, providing explicit definitions
and links to basic research domains, such as clarifying which
infectious agent is causative for a given disease. Thus, our
immune exposure model will continue to use DO, which was
incorporated into ImmPort submission templates via requiring
submitters to enter DO terms to describe the diseases of the
study subjects.
        </p>
        <p>In terms of ‘Disease stage’, the IEDB has defined three
values that in combination with disease name clarify some
typical major distinctions how a disease manifests in different
study subjects: (1) ‘acute/recent onset’ is utilized for subjects
that currently have symptomatic disease and may or may not
clear it. (2) ‘chronic’ is utilized for subjects that persistently
have a disease and it is not considered highly likely that they
will soon clear the disease without intervention. (3) ‘post’ is
utilized for subjects that have cleared a disease which they had
in the past. So far, these broad categories have proven
sufficient to also describe HIPC needs, although more detailed
description of disease specific stages could be desirable in the
future and we are open to further discussion.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>III. CHALLENGES AND CONCLUSIONS</title>
      <p>
        The ability to formalize what otherwise would be free-text
is a significant accomplishment to improve the integration of
data across HIPC studies. More importantly, as this model was
adopted by HIPC by adding columns to the Human Subject
data submission template, all studies submitted to ImmPort can
now include the same fields to describe immune exposures, the
HIPC studies will be better connected to other studies in
ImmPort. To ease data entry for these fields and others into
ImmPort spreadsheet templates, work is ongoing through the
CEDAR [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] effort and others to create interactive forms that
will ensure that only valid terms are entered.
      </p>
      <p>Now that newly entered data will be formalized, improved
query and comparisons will be possible due to standardized
terminology. We fully expect that as more data gets submitted
to ImmPort using this scheme for HIPC, questions will
continue to arise, and based on our experience with the IEDB,
we expect to handle them by consulting domain expects for the
disease of interest. Controversial cases will be presented to the
Clinical Subcommittee, to ensure that decisions are made
uniformly across the HIPC program. Overall, it has to be
stressed that the structured representation of immune exposures
is not intended to fully represent every nuance of each study,
but rather achieve its intended function to enable a computable
high level comparison of immune exposures across studies.
Reassessment of how well this model meets the needs of the
community and how it improves the quality of the data after
several months of use would be beneficial.</p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was supported by the National Institute of Allergy
And Infectious Diseases of the National Institutes of Health
under Award Number NIH U19 AI118610 and U19AI089992.
It would not have been possible without strong support by the
ImmPort team, and Patrick Dunn in particular.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Andorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gomes</surname>
          </string-name>
          , et al, “
          <article-title>ImmPort: disseminating data to the public for the future of immunology,”</article-title>
          <source>Immunol Res</source>
          .
          <volume>58</volume>
          (
          <issue>2-3</issue>
          ), pp.
          <fpage>234</fpage>
          -
          <lpage>239</lpage>
          , May
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>[2] https://www.immuneprofiling.org/hipc/page/show (accessed 6/1/2018).</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Vita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Overton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Greenbaum</surname>
          </string-name>
          , et al, “
          <article-title>The immune epitope database (IEDB) 3.0,”</article-title>
          <source>Nucleic Acids Res</source>
          .
          <volume>43</volume>
          (Database issue):D, pp.
          <fpage>405</fpage>
          -
          <lpage>412</lpage>
          ,
          <year>October 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandrowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brinkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brochhausen</surname>
          </string-name>
          , et al, “
          <article-title>The Ontology for Biomedical Investigations</article-title>
          ,”
          <source>PLoS One</source>
          <volume>29</volume>
          ;
          <issue>11</issue>
          (
          <issue>4</issue>
          ).
          <source>Apr</source>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Greenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zarebski</surname>
          </string-name>
          , et al, “
          <article-title>ONTology of Immune Epitopes (ONTIE) Representing the Immune Epitope Database in OWL,” The 12th Annual BioOntologies Meeting</article-title>
          , ISMB, pp.
          <fpage>45</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.W.</given-names>
            <surname>Sayers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.A.</given-names>
            <surname>Benson</surname>
          </string-name>
          , et al, “
          <article-title>Database resources of the National Center for Biotechnology Information</article-title>
          ,”
          <source>Nucleic Acids Res</source>
          .
          <volume>37</volume>
          , pp.
          <fpage>D5</fpage>
          -
          <lpage>D15</lpage>
          , May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cowell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.D.</given-names>
            <surname>Diehl</surname>
          </string-name>
          , “VO: Vaccine Ontology,”
          <source>The 1st International Conference on Biomedical Ontology (ICBO</source>
          <year>2009</year>
          ), Buffalo,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA.
          <source>Nature Precedings</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.A.</given-names>
            <surname>Kibbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Arze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Felix</surname>
          </string-name>
          , et al, “
          <article-title>Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data</article-title>
          ,
          <source>” Nucleic Acids Res</source>
          .
          <volume>43</volume>
          (Database issue):D, pp.
          <fpage>1071</fpage>
          -
          <lpage>1078</lpage>
          ,
          <year>January 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ashburner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rosse</surname>
          </string-name>
          , et al, “
          <article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>
          ,
          <source>” Nat Biotechnol</source>
          .
          <volume>25</volume>
          (
          <issue>11</issue>
          ), pp.
          <fpage>1251</fpage>
          -
          <lpage>1255</lpage>
          ,
          <year>November 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Musen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.A.</given-names>
            <surname>Bean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.H.</given-names>
            <surname>Cheung</surname>
          </string-name>
          , et al, “
          <article-title>The center for expanded data annotation and retrieval</article-title>
          ,”
          <source>J Am Med Inform Assoc</source>
          .
          <volume>22</volume>
          (
          <issue>6</issue>
          ), pp.
          <fpage>1148</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>November 2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>