=Paper= {{Paper |id=Vol-2285/ICBO_2018_paper_41 |storemode=property |title=Formalizing the Representation of Immune Exposures for Human Immunology Studies |pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_41.pdf |volume=Vol-2285 |authors=Randi Vita,Bjoern Peters,James Overton,Kei-Hoi Cheung,Steven Kleinstein |dblpUrl=https://dblp.org/rec/conf/icbo/VitaPOCK18 }} ==Formalizing the Representation of Immune Exposures for Human Immunology Studies== https://ceur-ws.org/Vol-2285/ICBO_2018_paper_41.pdf
       Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                     1




Formalizing the Representation of Immune Exposures
          for Human Immunology Studies
 Randi Vita1, James A. Overton1, Kei-Hoi Cheung2,
       Steven H. Kleinstein3, Bjoern Peters1                                   3
                                                                                Department of Pathology, Yale School of Medicine,
                                                                                       New Haven, Connecticut, U.S.A. and
            1
           Division for Vaccine Discovery                                          Interdepartmental Program in Computational
   La Jolla Institute for Allergy and Immunology,                               Biology and Bioinformatics, Yale University, New
             La Jolla, California, U.S.A.                                                   Haven, Connecticut, USA

   2
    Department of Emergency Medicine and Yale
   Center for Medical Informatics, Yale School of
         Medicine, New Haven, CT, USA.


    Abstract—Human immunology studies typically examine how                    of centers aimed at performing large scale human immunology
immune exposures associated with vaccinations, infectious,                     studies with a focus on profiling the human immune response
allergic or autoimmune diseases, or transplantations perturb the               to natural infection and vaccination. A key goal of the HIPC
immune system with the goal to develop diagnostic tools and                    consortium is to cross-compare results from different centers.
therapeutic interventions. While there are established                         To facilitate this, we set out to develop a standardized
approaches to formally represent the experimental data                         representation of immune exposures for HIPC studies that can
generated in such studies, which often comprises gene expression               be stored in ImmPort to represent their central elements in a
data, flow cytometry data, or serology data, the description of the            structured format.
immune exposures themselves is not well standardized. We here
present a formal approach to represent immune exposures at a                       The need to represent immune exposures extends beyond
high level of granularity. We capture the exposure process (e.g.               the HIPC program. Most human immunology studies examine
‘vaccination’ or ‘occurrence of allergic disease’), exposure                   how the immune system responds to perturbations. Subjects are
material (e.g. ‘Tdap vaccine’ or ‘House dust mite’), and the                   compared across cohorts and/or at defined time points that are
associated disease name and stage (e.g. ‘allergic rhinitis’ and                intended to isolate the effect of immune exposures. The
‘chronic’). This representation scheme has been used successfully              Immune Epitope Database (IEDB) [3] implemented a
in the IEDB and an extended version has been adopted by HIPC                   structured representation of immune exposures that has been
to capture studies in ImmPort. We are reporting here on this
                                                                               applied to model over one million experiments in which human
scheme, our ongoing attempts to map the terms used to existing
                                                                               samples were tested for T cell or B cell reactivity to specific
ontologies, and the challenges encountered.
                                                                               epitopes. The IEDB representation of exposures is decoupled
   Keywords—immune exposure; modeling; HIPC; ontology                          from the epitope mapping experiments, so we decided to test if
                                                                               it could be utilized as a basis to describe immune exposures for
                        I. INTRODUCTION                                        the HIPC program. By adapting the IEDB model for HIPC, we
                                                                               have developed an even more general representation of
    The Immunology Database and Analysis Portal (ImmPort)
                                                                               immune exposures that can be used by the wider scientific
[1] is the primary resource to capture human immunology
                                                                               community.
studies funded by the National Institute of Health, Division of
Allergy, Immunology and Transplantation. ImmPort provides                                                II. APPROACH
structured data fields to capture a variety of different
experimental data and free-text fields to store meta-data on                   A. Semi-formal Immune Exposure Representation
cohorts from which subjects where recruited. This free-text                        All HIPC centers funded by the middle of 2017 were asked
cohort description data typically contains a description of                    to supply textual descriptions of study designs that they
immune exposures that are expected to perturb the immune                       planned on submitting to ImmPort. We then examined the
system. While free-text allows for a detailed account how a                    immune exposures that were part of these study designs and
given study is conducted and a cohort is defined, without                      how they would be entered into the IEDB format. As a result of
standardization, such descriptions are difficult to query and                  this process, we found that the broader scope of HIPC
compare across many studies in a large database such as                        compared to the IEDB required extension of the IEDB
ImmPort.                                                                       structured representation. In the following, we present the
   In particular, ImmPort is the designated repository for data                resulting expanded schema to represent immune exposures for
from studies performed by the Human Immunology Project                         HIPC, of which the IEDB immune exposures are a subset. This
Consortium (HIPC) [2], a collaboration between a number of                     schema has been implemented by adding columns to the



       ICBO 2018                                                   August 7-10, 2018                                                  1
         Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                                                                 2


‘Human Subject Template’ spreadsheet that is used to submit                                                                    Thus, “Adults receiving a Varicella-zoster shot” would be
information to ImmPort.                                                                                                    the result of a vaccination ‘Exposure process’ which delivered
                                                                                                                           the ‘Exposure material’ that was the Varicella-zoster virus
    We consider four elements critical to the description of an                                                            vaccine. No disease resulted from this immune exposure.
immune exposure, as listed as the column headers in Table I.
The ‘Exposure process’ identifies the type of process through                                                              B. Ontology Mapping
which a host was exposed and the type of evidence for that                                                                     Our intent is to map each of the four data elements
exposure to have happened, which are tightly intertwined. This                                                             described above to ontology terms with textual and logical
is the only element of the four that was deemed mandatory.                                                                 definitions, ideally derived from established ontologies
Based on the choice made for ‘Exposure process’, other                                                                     covering the various domain. For ‘Exposure process’, all
elements are required or not applicable as listed in Table I. The                                                          allowed values are listed in the first column of Table I. This
‘Exposure material’ describes what substance(s) the host was                                                               collection of options has been assembled by the IEDB team
exposed to and/or developed immune reactions to as part of the                                                             over the past 13 years and has been proven to be robust and
exposure process. The ‘Disease name’ indicates the specific                                                                stable, with minimal modifications occurring in the last 5
disease of the host associated with the exposure being                                                                     years. Each of the options come with a definition and rules
described and lastly, the ‘Disease stage’ provides a broad                                                                 when it should be applied. These terms will be mapped to
classification of how the disease progressed at the time of the                                                            formal external ontology terms, as initiated in Supplementary
study.                                                                                                                     Table S1 (https://doi.org/10.6084/m9.figshare.6741791.v1).
     Exposure process                Exposure material           Disease name             Disease stage                    The main challenge in this process is that terms for e.g.
    administration                           required                      X                          X
                                                                                                                           ‘vaccination’, ‘infectious disease’ and ‘transplantation’ come
      vaccination                            required                      X                          X
                                                                                                                           from different external ontologies, and presenting users their
      infectious challenge                   required                  optional                       X
                                                                                                                           definitions side-by-side is not helpful. We are planning to
      transplant/transfusion                 required                      X                          X
    disease                                        X                   required                required                    engage representatives of different ontology communities, and
      infectious disease                     required                  required                required                    harmonize their definitions. Until this is done, we proceeded
      allergic disease                       required                  required                required                    with implementation of temporary terms for this immune
      autoimmune disease                           X                   required                required
       cancer                                      X                   required                required
                                                                                                                           exposure model in ONTIE [5], which we intend on
    exposure(without
                                             required                      X                          X
                                                                                                                           replacing/merging with new or edited terms in the appropriate
    disease)
                asymptomatic
                                                                                                                           external ontologies.
                                             required                      X                          X
    infection/colonization
       exposure with immune
                                             required                      X                          X
                                                                                                                               In addition to the main three categories of immune
    reactivity
            exposure       with
                                                                                                                           exposure (administration, disease, exposure without disease)
                                             required                      X                          X
    documentation                                                                                                          and their subtypes, there are two options (no exposure and
             exposure        to
    endemic/ubiquitous                       required                      X                          X                    unknown) which are not actual types of exposures but rather
    agent
                                                                                                                           values to signify two different reasons why it is not possible or
    no exposure                                    X                       X                          X
                                                                                                                           meaningful to fill out the exposure type for a given study
    unk nown                                       X                       X                          X
                                                                                                                           subject. The value ‘no exposure’ is intended to be used for
                                                                                                                           subjects that are enrolled as negative controls, and indicates
   TABLE I. Four structured elements to describe immune                                                                    specifically that these subjects are *not* be exposed to
exposures.                                                                                                                 something. The value ‘unknown’ is used when samples are
                                                                                                                           from subjects for which no relevant exposure information is
    To illustrate how this representation was used in practice,                                                            available. This is applicable when, for example, a study utilizes
Table II shows three examples of studies by actual HIPC                                                                    samples from anonymous blood bank donors in order to
centers that involved immune exposures, described in free text                                                             establish a ‘normal range’.
(first column to the left), and how these were modeled using
the four elements of the exposure scheme (columns to the                                                                       For ‘Exposure material’, the vast majority of HIPC studies
right). These examples illustrate the three main types of                                                                  submitted to us required specifying an organism that was either
exposure processes, namely ‘administration’, ‘disease’, and                                                                the causative agent of an infection, exposure without infection,
‘exposure without disease’.                                                                                                or utilized to vaccinate to protect against future infection.
                                                                                                                           Organisms can be specified by the broadly utilized NCBI
    Free-text description
    of immune exposure
                                 Exposure process        Exposure material         Disease name       Disease stage
                                                                                                                           Taxonomy [6], which has the key advantage of linking
      “Adults receiving a
                                     vaccination
                                                         Varicella-zoster virus                                            organism specifications to sequence information in NCBI. All
     Varicella-zoster shot”                             vaccine (VO:0000669)
                                                                                                                           taxa from the NCBI Taxonomy are valid entries for Exposure
     “Hospitalized patients                                                           Dengue
       with Hemorraghic           infectious disease
                                                            Dengue virus
                                                         (NCBITaxon:12637)
                                                                                  hemorrhagic fever
                                                                                                          acute / recent
                                                                                                              onset
                                                                                                                           material,      and       can        be      looked      up      at
        Dengue fever”                                                               (DOID:12206)

    “Subjects from endemic
                                                                                                                           https://www.ncbi.nlm.nih.gov/taxonomy.          One      potential
    area that tested positive
     for antibodies against
                                exposure with immune       Dengue virus 2                                                  concern with this choice is that NCBI does not assign new taxa
                                      reactivity         (NCBITaxon:11060)
      Dengue 2 based on
            serology”
                                                                                                                           to every organism isolate identified, which in some cases is
                                                                                                                           desirable, such as in the case of drug resistant M. tuberculosis
                                                                                                                           isolates, where it is of interest to relate even single nucleotide
    TABLE II. Three examples of immune exposures modeled                                                                   differences to efficacy of drug treatments. We expect that
in this schema.                                                                                                            going forward, there will be a developing community
                                                                                                                           consensus on how to handle this, along the lines of grouping




         ICBO 2018                                                                                        August 7-10, 2018                                                         2
      Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                      3


different isolates based on their NCBI GenBank ID under their                     Now that newly entered data will be formalized, improved
closest parent taxon.                                                         query and comparisons will be possible due to standardized
                                                                              terminology. We fully expect that as more data gets submitted
    Not all ‘Exposure materials’ in HIPC studies submitted to                 to ImmPort using this scheme for HIPC, questions will
us were whole organisms. In the case of vaccinations, specific                continue to arise, and based on our experience with the IEDB,
antigens are often utilized over whole organisms such as in the               we expect to handle them by consulting domain expects for the
case of subunit vaccines. Also, in the case of multi-valent                   disease of interest. Controversial cases will be presented to the
vaccines, multiple organisms or antigens of organisms are                     Clinical Subcommittee, to ensure that decisions are made
combined into one vaccine. We plan to specify vaccines                        uniformly across the HIPC program. Overall, it has to be
through             the           Vaccine            Ontology                 stressed that the structured representation of immune exposures
(http://www.violinet.org/vaccineontology/) [7]. It may be                     is not intended to fully represent every nuance of each study,
necessary to add new entries to the Vaccine Ontology to                       but rather achieve its intended function to enable a computable
capture new experimental vaccines, but as vaccines                            high level comparison of immune exposures across studies.
administered to humans have to go through a stringent                         Reassessment of how well this model meets the needs of the
approval process, this will not overwhelm the Vaccine                         community and how it improves the quality of the data after
Ontology development team.
                                                                              several months of use would be beneficial.
     To specify the ‘Disease name’, the IEDB utilizes values
from the Disease Ontology (DO) (http://disease-ontology.org/)                                        ACKNOWLEDGEMENTS
[8], which has the advantage of providing mappings to most of                 This work was supported by the National Institute of Allergy
the other vocabularies that could be considered such as ICD10,                And Infectious Diseases of the National Institutes of Health
SNOMED CT, MESH and UMLS. The IEDB has been                                   under Award Number NIH U19 AI118610 and U19AI089992.
successful in mapping the disease terms encountered in the                    It would not have been possible without strong support by the
literature to DO terms. In addition, the Disease Ontology is                  ImmPort team, and Patrick Dunn in particular.
part of the OBO Foundry [9] and thus more compatible with
other basic research ontologies, providing explicit definitions                                            REFERENCES
and links to basic research domains, such as clarifying which
infectious agent is causative for a given disease. Thus, our
immune exposure model will continue to use DO, which was                      [1]  S. Bhattacharya, S. Andorf, L. Gomes, et al, “ImmPort:
incorporated into ImmPort submission templates via requiring                       disseminating data to the public for the future of
                                                                                   immunology,” Immunol Res. 58(2-3), pp. 234-239, May
submitters to enter DO terms to describe the diseases of the                       2014.
study subjects.                                                               [2] https://www.immuneprofiling.org/hipc/page/show
    In terms of ‘Disease stage’, the IEDB has defined three                        (accessed 6/1/2018).
values that in combination with disease name clarify some                     [3] R. Vita, J.A. Overton, J.A. Greenbaum, et al, “The
typical major distinctions how a disease manifests in different                    immune epitope database (IEDB) 3.0,” Nucleic Acids
                                                                                   Res. 43(Database issue):D, pp. 405-412, October 2014.
study subjects: (1) ‘acute/recent onset’ is utilized for subjects             [4] A. Bandrowski, R. Brinkman, M. Brochhausen, et al,
that currently have symptomatic disease and may or may not                         “The Ontology for Biomedical Investigations,” PLoS One
clear it. (2) ‘chronic’ is utilized for subjects that persistently                 29;11(4). Apr 2016.
have a disease and it is not considered highly likely that they               [5] J.A. Greenbaum, R. Vita, L. Zarebski, et al, “ONTology
will soon clear the disease without intervention. (3) ‘post’ is                    of Immune Epitopes (ONTIE) Representing the Immune
utilized for subjects that have cleared a disease which they had                   Epitope Database in OWL,” The 12th Annual Bio-
in the past. So far, these broad categories have proven                            Ontologies Meeting, ISMB, pp. 45–48, 2009.
sufficient to also describe HIPC needs, although more detailed                [6] E.W. Sayers, T. Barrett, D.A. Benson, et al, “Database
                                                                                   resources of the National Center for Biotechnology
description of disease specific stages could be desirable in the                   Information,” Nucleic Acids Res. 37, pp. D5–D15, May
future and we are open to further discussion.                                      2009.
                                                                              [7] Y. He, L. Cowell, A.D. Diehl, “VO: Vaccine Ontology,”
            III. CHALLENGES AND CONCLUSIONS                                        The 1st International Conference on Biomedical Ontology
    The ability to formalize what otherwise would be free-text                     (ICBO 2009), Buffalo, NY, USA. Nature Precedings.
                                                                                   2009.
is a significant accomplishment to improve the integration of
                                                                              [8] W.A. Kibbe, C. Arze, V. Felix, et al, “Disease Ontology
data across HIPC studies. More importantly, as this model was                      2015 update: an expanded and updated database of human
adopted by HIPC by adding columns to the Human Subject                             diseases for linking biomedical knowledge through
data submission template, all studies submitted to ImmPort can                     disease data,” Nucleic Acids Res. 43(Database issue):D,
now include the same fields to describe immune exposures, the                      pp. 1071-1078, January 2015.
HIPC studies will be better connected to other studies in                     [9] B. Smith, M. Ashburner, C. Rosse, et al, “The OBO
ImmPort. To ease data entry for these fields and others into                       Foundry: coordinated evolution of ontologies to support
ImmPort spreadsheet templates, work is ongoing through the                         biomedical data integration,” Nat Biotechnol. 25(11), pp.
                                                                                   1251-1255, November 2007.
CEDAR [10] effort and others to create interactive forms that                 [10] M.A. Musen, C.A. Bean, K.H. Cheung, et al, “The center
will ensure that only valid terms are entered.                                     for expanded data annotation and retrieval,” J Am Med
                                                                                   Inform Assoc. 22(6), pp. 1148-52, November 2015.




      ICBO 2018                                                   August 7-10, 2018                                                   3