=Paper=
{{Paper
|id=Vol-2285/ICBO_2018_paper_41
|storemode=property
|title=Formalizing the Representation of Immune Exposures for Human Immunology Studies
|pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_41.pdf
|volume=Vol-2285
|authors=Randi Vita,Bjoern Peters,James Overton,Kei-Hoi Cheung,Steven Kleinstein
|dblpUrl=https://dblp.org/rec/conf/icbo/VitaPOCK18
}}
==Formalizing the Representation of Immune Exposures for Human Immunology Studies==
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 1
Formalizing the Representation of Immune Exposures
for Human Immunology Studies
Randi Vita1, James A. Overton1, Kei-Hoi Cheung2,
Steven H. Kleinstein3, Bjoern Peters1 3
Department of Pathology, Yale School of Medicine,
New Haven, Connecticut, U.S.A. and
1
Division for Vaccine Discovery Interdepartmental Program in Computational
La Jolla Institute for Allergy and Immunology, Biology and Bioinformatics, Yale University, New
La Jolla, California, U.S.A. Haven, Connecticut, USA
2
Department of Emergency Medicine and Yale
Center for Medical Informatics, Yale School of
Medicine, New Haven, CT, USA.
Abstract—Human immunology studies typically examine how of centers aimed at performing large scale human immunology
immune exposures associated with vaccinations, infectious, studies with a focus on profiling the human immune response
allergic or autoimmune diseases, or transplantations perturb the to natural infection and vaccination. A key goal of the HIPC
immune system with the goal to develop diagnostic tools and consortium is to cross-compare results from different centers.
therapeutic interventions. While there are established To facilitate this, we set out to develop a standardized
approaches to formally represent the experimental data representation of immune exposures for HIPC studies that can
generated in such studies, which often comprises gene expression be stored in ImmPort to represent their central elements in a
data, flow cytometry data, or serology data, the description of the structured format.
immune exposures themselves is not well standardized. We here
present a formal approach to represent immune exposures at a The need to represent immune exposures extends beyond
high level of granularity. We capture the exposure process (e.g. the HIPC program. Most human immunology studies examine
‘vaccination’ or ‘occurrence of allergic disease’), exposure how the immune system responds to perturbations. Subjects are
material (e.g. ‘Tdap vaccine’ or ‘House dust mite’), and the compared across cohorts and/or at defined time points that are
associated disease name and stage (e.g. ‘allergic rhinitis’ and intended to isolate the effect of immune exposures. The
‘chronic’). This representation scheme has been used successfully Immune Epitope Database (IEDB) [3] implemented a
in the IEDB and an extended version has been adopted by HIPC structured representation of immune exposures that has been
to capture studies in ImmPort. We are reporting here on this
applied to model over one million experiments in which human
scheme, our ongoing attempts to map the terms used to existing
samples were tested for T cell or B cell reactivity to specific
ontologies, and the challenges encountered.
epitopes. The IEDB representation of exposures is decoupled
Keywords—immune exposure; modeling; HIPC; ontology from the epitope mapping experiments, so we decided to test if
it could be utilized as a basis to describe immune exposures for
I. INTRODUCTION the HIPC program. By adapting the IEDB model for HIPC, we
have developed an even more general representation of
The Immunology Database and Analysis Portal (ImmPort)
immune exposures that can be used by the wider scientific
[1] is the primary resource to capture human immunology
community.
studies funded by the National Institute of Health, Division of
Allergy, Immunology and Transplantation. ImmPort provides II. APPROACH
structured data fields to capture a variety of different
experimental data and free-text fields to store meta-data on A. Semi-formal Immune Exposure Representation
cohorts from which subjects where recruited. This free-text All HIPC centers funded by the middle of 2017 were asked
cohort description data typically contains a description of to supply textual descriptions of study designs that they
immune exposures that are expected to perturb the immune planned on submitting to ImmPort. We then examined the
system. While free-text allows for a detailed account how a immune exposures that were part of these study designs and
given study is conducted and a cohort is defined, without how they would be entered into the IEDB format. As a result of
standardization, such descriptions are difficult to query and this process, we found that the broader scope of HIPC
compare across many studies in a large database such as compared to the IEDB required extension of the IEDB
ImmPort. structured representation. In the following, we present the
In particular, ImmPort is the designated repository for data resulting expanded schema to represent immune exposures for
from studies performed by the Human Immunology Project HIPC, of which the IEDB immune exposures are a subset. This
Consortium (HIPC) [2], a collaboration between a number of schema has been implemented by adding columns to the
ICBO 2018 August 7-10, 2018 1
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 2
‘Human Subject Template’ spreadsheet that is used to submit Thus, “Adults receiving a Varicella-zoster shot” would be
information to ImmPort. the result of a vaccination ‘Exposure process’ which delivered
the ‘Exposure material’ that was the Varicella-zoster virus
We consider four elements critical to the description of an vaccine. No disease resulted from this immune exposure.
immune exposure, as listed as the column headers in Table I.
The ‘Exposure process’ identifies the type of process through B. Ontology Mapping
which a host was exposed and the type of evidence for that Our intent is to map each of the four data elements
exposure to have happened, which are tightly intertwined. This described above to ontology terms with textual and logical
is the only element of the four that was deemed mandatory. definitions, ideally derived from established ontologies
Based on the choice made for ‘Exposure process’, other covering the various domain. For ‘Exposure process’, all
elements are required or not applicable as listed in Table I. The allowed values are listed in the first column of Table I. This
‘Exposure material’ describes what substance(s) the host was collection of options has been assembled by the IEDB team
exposed to and/or developed immune reactions to as part of the over the past 13 years and has been proven to be robust and
exposure process. The ‘Disease name’ indicates the specific stable, with minimal modifications occurring in the last 5
disease of the host associated with the exposure being years. Each of the options come with a definition and rules
described and lastly, the ‘Disease stage’ provides a broad when it should be applied. These terms will be mapped to
classification of how the disease progressed at the time of the formal external ontology terms, as initiated in Supplementary
study. Table S1 (https://doi.org/10.6084/m9.figshare.6741791.v1).
Exposure process Exposure material Disease name Disease stage The main challenge in this process is that terms for e.g.
administration required X X
‘vaccination’, ‘infectious disease’ and ‘transplantation’ come
vaccination required X X
from different external ontologies, and presenting users their
infectious challenge required optional X
definitions side-by-side is not helpful. We are planning to
transplant/transfusion required X X
disease X required required engage representatives of different ontology communities, and
infectious disease required required required harmonize their definitions. Until this is done, we proceeded
allergic disease required required required with implementation of temporary terms for this immune
autoimmune disease X required required
cancer X required required
exposure model in ONTIE [5], which we intend on
exposure(without
required X X
replacing/merging with new or edited terms in the appropriate
disease)
asymptomatic
external ontologies.
required X X
infection/colonization
exposure with immune
required X X
In addition to the main three categories of immune
reactivity
exposure with
exposure (administration, disease, exposure without disease)
required X X
documentation and their subtypes, there are two options (no exposure and
exposure to
endemic/ubiquitous required X X unknown) which are not actual types of exposures but rather
agent
values to signify two different reasons why it is not possible or
no exposure X X X
meaningful to fill out the exposure type for a given study
unk nown X X X
subject. The value ‘no exposure’ is intended to be used for
subjects that are enrolled as negative controls, and indicates
TABLE I. Four structured elements to describe immune specifically that these subjects are *not* be exposed to
exposures. something. The value ‘unknown’ is used when samples are
from subjects for which no relevant exposure information is
To illustrate how this representation was used in practice, available. This is applicable when, for example, a study utilizes
Table II shows three examples of studies by actual HIPC samples from anonymous blood bank donors in order to
centers that involved immune exposures, described in free text establish a ‘normal range’.
(first column to the left), and how these were modeled using
the four elements of the exposure scheme (columns to the For ‘Exposure material’, the vast majority of HIPC studies
right). These examples illustrate the three main types of submitted to us required specifying an organism that was either
exposure processes, namely ‘administration’, ‘disease’, and the causative agent of an infection, exposure without infection,
‘exposure without disease’. or utilized to vaccinate to protect against future infection.
Organisms can be specified by the broadly utilized NCBI
Free-text description
of immune exposure
Exposure process Exposure material Disease name Disease stage
Taxonomy [6], which has the key advantage of linking
“Adults receiving a
vaccination
Varicella-zoster virus organism specifications to sequence information in NCBI. All
Varicella-zoster shot” vaccine (VO:0000669)
taxa from the NCBI Taxonomy are valid entries for Exposure
“Hospitalized patients Dengue
with Hemorraghic infectious disease
Dengue virus
(NCBITaxon:12637)
hemorrhagic fever
acute / recent
onset
material, and can be looked up at
Dengue fever” (DOID:12206)
“Subjects from endemic
https://www.ncbi.nlm.nih.gov/taxonomy. One potential
area that tested positive
for antibodies against
exposure with immune Dengue virus 2 concern with this choice is that NCBI does not assign new taxa
reactivity (NCBITaxon:11060)
Dengue 2 based on
serology”
to every organism isolate identified, which in some cases is
desirable, such as in the case of drug resistant M. tuberculosis
isolates, where it is of interest to relate even single nucleotide
TABLE II. Three examples of immune exposures modeled differences to efficacy of drug treatments. We expect that
in this schema. going forward, there will be a developing community
consensus on how to handle this, along the lines of grouping
ICBO 2018 August 7-10, 2018 2
Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 3
different isolates based on their NCBI GenBank ID under their Now that newly entered data will be formalized, improved
closest parent taxon. query and comparisons will be possible due to standardized
terminology. We fully expect that as more data gets submitted
Not all ‘Exposure materials’ in HIPC studies submitted to to ImmPort using this scheme for HIPC, questions will
us were whole organisms. In the case of vaccinations, specific continue to arise, and based on our experience with the IEDB,
antigens are often utilized over whole organisms such as in the we expect to handle them by consulting domain expects for the
case of subunit vaccines. Also, in the case of multi-valent disease of interest. Controversial cases will be presented to the
vaccines, multiple organisms or antigens of organisms are Clinical Subcommittee, to ensure that decisions are made
combined into one vaccine. We plan to specify vaccines uniformly across the HIPC program. Overall, it has to be
through the Vaccine Ontology stressed that the structured representation of immune exposures
(http://www.violinet.org/vaccineontology/) [7]. It may be is not intended to fully represent every nuance of each study,
necessary to add new entries to the Vaccine Ontology to but rather achieve its intended function to enable a computable
capture new experimental vaccines, but as vaccines high level comparison of immune exposures across studies.
administered to humans have to go through a stringent Reassessment of how well this model meets the needs of the
approval process, this will not overwhelm the Vaccine community and how it improves the quality of the data after
Ontology development team.
several months of use would be beneficial.
To specify the ‘Disease name’, the IEDB utilizes values
from the Disease Ontology (DO) (http://disease-ontology.org/) ACKNOWLEDGEMENTS
[8], which has the advantage of providing mappings to most of This work was supported by the National Institute of Allergy
the other vocabularies that could be considered such as ICD10, And Infectious Diseases of the National Institutes of Health
SNOMED CT, MESH and UMLS. The IEDB has been under Award Number NIH U19 AI118610 and U19AI089992.
successful in mapping the disease terms encountered in the It would not have been possible without strong support by the
literature to DO terms. In addition, the Disease Ontology is ImmPort team, and Patrick Dunn in particular.
part of the OBO Foundry [9] and thus more compatible with
other basic research ontologies, providing explicit definitions REFERENCES
and links to basic research domains, such as clarifying which
infectious agent is causative for a given disease. Thus, our
immune exposure model will continue to use DO, which was [1] S. Bhattacharya, S. Andorf, L. Gomes, et al, “ImmPort:
incorporated into ImmPort submission templates via requiring disseminating data to the public for the future of
immunology,” Immunol Res. 58(2-3), pp. 234-239, May
submitters to enter DO terms to describe the diseases of the 2014.
study subjects. [2] https://www.immuneprofiling.org/hipc/page/show
In terms of ‘Disease stage’, the IEDB has defined three (accessed 6/1/2018).
values that in combination with disease name clarify some [3] R. Vita, J.A. Overton, J.A. Greenbaum, et al, “The
typical major distinctions how a disease manifests in different immune epitope database (IEDB) 3.0,” Nucleic Acids
Res. 43(Database issue):D, pp. 405-412, October 2014.
study subjects: (1) ‘acute/recent onset’ is utilized for subjects [4] A. Bandrowski, R. Brinkman, M. Brochhausen, et al,
that currently have symptomatic disease and may or may not “The Ontology for Biomedical Investigations,” PLoS One
clear it. (2) ‘chronic’ is utilized for subjects that persistently 29;11(4). Apr 2016.
have a disease and it is not considered highly likely that they [5] J.A. Greenbaum, R. Vita, L. Zarebski, et al, “ONTology
will soon clear the disease without intervention. (3) ‘post’ is of Immune Epitopes (ONTIE) Representing the Immune
utilized for subjects that have cleared a disease which they had Epitope Database in OWL,” The 12th Annual Bio-
in the past. So far, these broad categories have proven Ontologies Meeting, ISMB, pp. 45–48, 2009.
sufficient to also describe HIPC needs, although more detailed [6] E.W. Sayers, T. Barrett, D.A. Benson, et al, “Database
resources of the National Center for Biotechnology
description of disease specific stages could be desirable in the Information,” Nucleic Acids Res. 37, pp. D5–D15, May
future and we are open to further discussion. 2009.
[7] Y. He, L. Cowell, A.D. Diehl, “VO: Vaccine Ontology,”
III. CHALLENGES AND CONCLUSIONS The 1st International Conference on Biomedical Ontology
The ability to formalize what otherwise would be free-text (ICBO 2009), Buffalo, NY, USA. Nature Precedings.
2009.
is a significant accomplishment to improve the integration of
[8] W.A. Kibbe, C. Arze, V. Felix, et al, “Disease Ontology
data across HIPC studies. More importantly, as this model was 2015 update: an expanded and updated database of human
adopted by HIPC by adding columns to the Human Subject diseases for linking biomedical knowledge through
data submission template, all studies submitted to ImmPort can disease data,” Nucleic Acids Res. 43(Database issue):D,
now include the same fields to describe immune exposures, the pp. 1071-1078, January 2015.
HIPC studies will be better connected to other studies in [9] B. Smith, M. Ashburner, C. Rosse, et al, “The OBO
ImmPort. To ease data entry for these fields and others into Foundry: coordinated evolution of ontologies to support
ImmPort spreadsheet templates, work is ongoing through the biomedical data integration,” Nat Biotechnol. 25(11), pp.
1251-1255, November 2007.
CEDAR [10] effort and others to create interactive forms that [10] M.A. Musen, C.A. Bean, K.H. Cheung, et al, “The center
will ensure that only valid terms are entered. for expanded data annotation and retrieval,” J Am Med
Inform Assoc. 22(6), pp. 1148-52, November 2015.
ICBO 2018 August 7-10, 2018 3