Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 1 Formalizing the Representation of Immune Exposures for Human Immunology Studies Randi Vita1, James A. Overton1, Kei-Hoi Cheung2, Steven H. Kleinstein3, Bjoern Peters1 3 Department of Pathology, Yale School of Medicine, New Haven, Connecticut, U.S.A. and 1 Division for Vaccine Discovery Interdepartmental Program in Computational La Jolla Institute for Allergy and Immunology, Biology and Bioinformatics, Yale University, New La Jolla, California, U.S.A. Haven, Connecticut, USA 2 Department of Emergency Medicine and Yale Center for Medical Informatics, Yale School of Medicine, New Haven, CT, USA. Abstract—Human immunology studies typically examine how of centers aimed at performing large scale human immunology immune exposures associated with vaccinations, infectious, studies with a focus on profiling the human immune response allergic or autoimmune diseases, or transplantations perturb the to natural infection and vaccination. A key goal of the HIPC immune system with the goal to develop diagnostic tools and consortium is to cross-compare results from different centers. therapeutic interventions. While there are established To facilitate this, we set out to develop a standardized approaches to formally represent the experimental data representation of immune exposures for HIPC studies that can generated in such studies, which often comprises gene expression be stored in ImmPort to represent their central elements in a data, flow cytometry data, or serology data, the description of the structured format. immune exposures themselves is not well standardized. We here present a formal approach to represent immune exposures at a The need to represent immune exposures extends beyond high level of granularity. We capture the exposure process (e.g. the HIPC program. Most human immunology studies examine ‘vaccination’ or ‘occurrence of allergic disease’), exposure how the immune system responds to perturbations. Subjects are material (e.g. ‘Tdap vaccine’ or ‘House dust mite’), and the compared across cohorts and/or at defined time points that are associated disease name and stage (e.g. ‘allergic rhinitis’ and intended to isolate the effect of immune exposures. The ‘chronic’). This representation scheme has been used successfully Immune Epitope Database (IEDB) [3] implemented a in the IEDB and an extended version has been adopted by HIPC structured representation of immune exposures that has been to capture studies in ImmPort. We are reporting here on this applied to model over one million experiments in which human scheme, our ongoing attempts to map the terms used to existing samples were tested for T cell or B cell reactivity to specific ontologies, and the challenges encountered. epitopes. The IEDB representation of exposures is decoupled Keywords—immune exposure; modeling; HIPC; ontology from the epitope mapping experiments, so we decided to test if it could be utilized as a basis to describe immune exposures for I. INTRODUCTION the HIPC program. By adapting the IEDB model for HIPC, we have developed an even more general representation of The Immunology Database and Analysis Portal (ImmPort) immune exposures that can be used by the wider scientific [1] is the primary resource to capture human immunology community. studies funded by the National Institute of Health, Division of Allergy, Immunology and Transplantation. ImmPort provides II. APPROACH structured data fields to capture a variety of different experimental data and free-text fields to store meta-data on A. Semi-formal Immune Exposure Representation cohorts from which subjects where recruited. This free-text All HIPC centers funded by the middle of 2017 were asked cohort description data typically contains a description of to supply textual descriptions of study designs that they immune exposures that are expected to perturb the immune planned on submitting to ImmPort. We then examined the system. While free-text allows for a detailed account how a immune exposures that were part of these study designs and given study is conducted and a cohort is defined, without how they would be entered into the IEDB format. As a result of standardization, such descriptions are difficult to query and this process, we found that the broader scope of HIPC compare across many studies in a large database such as compared to the IEDB required extension of the IEDB ImmPort. structured representation. In the following, we present the In particular, ImmPort is the designated repository for data resulting expanded schema to represent immune exposures for from studies performed by the Human Immunology Project HIPC, of which the IEDB immune exposures are a subset. This Consortium (HIPC) [2], a collaboration between a number of schema has been implemented by adding columns to the ICBO 2018 August 7-10, 2018 1 Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 2 ‘Human Subject Template’ spreadsheet that is used to submit Thus, “Adults receiving a Varicella-zoster shot” would be information to ImmPort. the result of a vaccination ‘Exposure process’ which delivered the ‘Exposure material’ that was the Varicella-zoster virus We consider four elements critical to the description of an vaccine. No disease resulted from this immune exposure. immune exposure, as listed as the column headers in Table I. The ‘Exposure process’ identifies the type of process through B. Ontology Mapping which a host was exposed and the type of evidence for that Our intent is to map each of the four data elements exposure to have happened, which are tightly intertwined. This described above to ontology terms with textual and logical is the only element of the four that was deemed mandatory. definitions, ideally derived from established ontologies Based on the choice made for ‘Exposure process’, other covering the various domain. For ‘Exposure process’, all elements are required or not applicable as listed in Table I. The allowed values are listed in the first column of Table I. This ‘Exposure material’ describes what substance(s) the host was collection of options has been assembled by the IEDB team exposed to and/or developed immune reactions to as part of the over the past 13 years and has been proven to be robust and exposure process. The ‘Disease name’ indicates the specific stable, with minimal modifications occurring in the last 5 disease of the host associated with the exposure being years. Each of the options come with a definition and rules described and lastly, the ‘Disease stage’ provides a broad when it should be applied. These terms will be mapped to classification of how the disease progressed at the time of the formal external ontology terms, as initiated in Supplementary study. Table S1 (https://doi.org/10.6084/m9.figshare.6741791.v1). Exposure process Exposure material Disease name Disease stage The main challenge in this process is that terms for e.g. administration required X X ‘vaccination’, ‘infectious disease’ and ‘transplantation’ come vaccination required X X from different external ontologies, and presenting users their infectious challenge required optional X definitions side-by-side is not helpful. We are planning to transplant/transfusion required X X disease X required required engage representatives of different ontology communities, and infectious disease required required required harmonize their definitions. Until this is done, we proceeded allergic disease required required required with implementation of temporary terms for this immune autoimmune disease X required required cancer X required required exposure model in ONTIE [5], which we intend on exposure(without required X X replacing/merging with new or edited terms in the appropriate disease) asymptomatic external ontologies. required X X infection/colonization exposure with immune required X X In addition to the main three categories of immune reactivity exposure with exposure (administration, disease, exposure without disease) required X X documentation and their subtypes, there are two options (no exposure and exposure to endemic/ubiquitous required X X unknown) which are not actual types of exposures but rather agent values to signify two different reasons why it is not possible or no exposure X X X meaningful to fill out the exposure type for a given study unk nown X X X subject. The value ‘no exposure’ is intended to be used for subjects that are enrolled as negative controls, and indicates TABLE I. Four structured elements to describe immune specifically that these subjects are *not* be exposed to exposures. something. The value ‘unknown’ is used when samples are from subjects for which no relevant exposure information is To illustrate how this representation was used in practice, available. This is applicable when, for example, a study utilizes Table II shows three examples of studies by actual HIPC samples from anonymous blood bank donors in order to centers that involved immune exposures, described in free text establish a ‘normal range’. (first column to the left), and how these were modeled using the four elements of the exposure scheme (columns to the For ‘Exposure material’, the vast majority of HIPC studies right). These examples illustrate the three main types of submitted to us required specifying an organism that was either exposure processes, namely ‘administration’, ‘disease’, and the causative agent of an infection, exposure without infection, ‘exposure without disease’. or utilized to vaccinate to protect against future infection. Organisms can be specified by the broadly utilized NCBI Free-text description of immune exposure Exposure process Exposure material Disease name Disease stage Taxonomy [6], which has the key advantage of linking “Adults receiving a vaccination Varicella-zoster virus organism specifications to sequence information in NCBI. All Varicella-zoster shot” vaccine (VO:0000669) taxa from the NCBI Taxonomy are valid entries for Exposure “Hospitalized patients Dengue with Hemorraghic infectious disease Dengue virus (NCBITaxon:12637) hemorrhagic fever acute / recent onset material, and can be looked up at Dengue fever” (DOID:12206) “Subjects from endemic https://www.ncbi.nlm.nih.gov/taxonomy. One potential area that tested positive for antibodies against exposure with immune Dengue virus 2 concern with this choice is that NCBI does not assign new taxa reactivity (NCBITaxon:11060) Dengue 2 based on serology” to every organism isolate identified, which in some cases is desirable, such as in the case of drug resistant M. tuberculosis isolates, where it is of interest to relate even single nucleotide TABLE II. Three examples of immune exposures modeled differences to efficacy of drug treatments. We expect that in this schema. going forward, there will be a developing community consensus on how to handle this, along the lines of grouping ICBO 2018 August 7-10, 2018 2 Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 3 different isolates based on their NCBI GenBank ID under their Now that newly entered data will be formalized, improved closest parent taxon. query and comparisons will be possible due to standardized terminology. We fully expect that as more data gets submitted Not all ‘Exposure materials’ in HIPC studies submitted to to ImmPort using this scheme for HIPC, questions will us were whole organisms. In the case of vaccinations, specific continue to arise, and based on our experience with the IEDB, antigens are often utilized over whole organisms such as in the we expect to handle them by consulting domain expects for the case of subunit vaccines. Also, in the case of multi-valent disease of interest. Controversial cases will be presented to the vaccines, multiple organisms or antigens of organisms are Clinical Subcommittee, to ensure that decisions are made combined into one vaccine. We plan to specify vaccines uniformly across the HIPC program. Overall, it has to be through the Vaccine Ontology stressed that the structured representation of immune exposures (http://www.violinet.org/vaccineontology/) [7]. It may be is not intended to fully represent every nuance of each study, necessary to add new entries to the Vaccine Ontology to but rather achieve its intended function to enable a computable capture new experimental vaccines, but as vaccines high level comparison of immune exposures across studies. administered to humans have to go through a stringent Reassessment of how well this model meets the needs of the approval process, this will not overwhelm the Vaccine community and how it improves the quality of the data after Ontology development team. several months of use would be beneficial. To specify the ‘Disease name’, the IEDB utilizes values from the Disease Ontology (DO) (http://disease-ontology.org/) ACKNOWLEDGEMENTS [8], which has the advantage of providing mappings to most of This work was supported by the National Institute of Allergy the other vocabularies that could be considered such as ICD10, And Infectious Diseases of the National Institutes of Health SNOMED CT, MESH and UMLS. The IEDB has been under Award Number NIH U19 AI118610 and U19AI089992. successful in mapping the disease terms encountered in the It would not have been possible without strong support by the literature to DO terms. In addition, the Disease Ontology is ImmPort team, and Patrick Dunn in particular. part of the OBO Foundry [9] and thus more compatible with other basic research ontologies, providing explicit definitions REFERENCES and links to basic research domains, such as clarifying which infectious agent is causative for a given disease. Thus, our immune exposure model will continue to use DO, which was [1] S. Bhattacharya, S. Andorf, L. Gomes, et al, “ImmPort: incorporated into ImmPort submission templates via requiring disseminating data to the public for the future of immunology,” Immunol Res. 58(2-3), pp. 234-239, May submitters to enter DO terms to describe the diseases of the 2014. study subjects. [2] https://www.immuneprofiling.org/hipc/page/show In terms of ‘Disease stage’, the IEDB has defined three (accessed 6/1/2018). values that in combination with disease name clarify some [3] R. Vita, J.A. Overton, J.A. Greenbaum, et al, “The typical major distinctions how a disease manifests in different immune epitope database (IEDB) 3.0,” Nucleic Acids Res. 43(Database issue):D, pp. 405-412, October 2014. study subjects: (1) ‘acute/recent onset’ is utilized for subjects [4] A. Bandrowski, R. Brinkman, M. Brochhausen, et al, that currently have symptomatic disease and may or may not “The Ontology for Biomedical Investigations,” PLoS One clear it. (2) ‘chronic’ is utilized for subjects that persistently 29;11(4). Apr 2016. have a disease and it is not considered highly likely that they [5] J.A. Greenbaum, R. Vita, L. Zarebski, et al, “ONTology will soon clear the disease without intervention. (3) ‘post’ is of Immune Epitopes (ONTIE) Representing the Immune utilized for subjects that have cleared a disease which they had Epitope Database in OWL,” The 12th Annual Bio- in the past. So far, these broad categories have proven Ontologies Meeting, ISMB, pp. 45–48, 2009. sufficient to also describe HIPC needs, although more detailed [6] E.W. Sayers, T. Barrett, D.A. Benson, et al, “Database resources of the National Center for Biotechnology description of disease specific stages could be desirable in the Information,” Nucleic Acids Res. 37, pp. D5–D15, May future and we are open to further discussion. 2009. [7] Y. He, L. Cowell, A.D. Diehl, “VO: Vaccine Ontology,” III. CHALLENGES AND CONCLUSIONS The 1st International Conference on Biomedical Ontology The ability to formalize what otherwise would be free-text (ICBO 2009), Buffalo, NY, USA. Nature Precedings. 2009. is a significant accomplishment to improve the integration of [8] W.A. Kibbe, C. Arze, V. Felix, et al, “Disease Ontology data across HIPC studies. More importantly, as this model was 2015 update: an expanded and updated database of human adopted by HIPC by adding columns to the Human Subject diseases for linking biomedical knowledge through data submission template, all studies submitted to ImmPort can disease data,” Nucleic Acids Res. 43(Database issue):D, now include the same fields to describe immune exposures, the pp. 1071-1078, January 2015. HIPC studies will be better connected to other studies in [9] B. Smith, M. Ashburner, C. Rosse, et al, “The OBO ImmPort. To ease data entry for these fields and others into Foundry: coordinated evolution of ontologies to support ImmPort spreadsheet templates, work is ongoing through the biomedical data integration,” Nat Biotechnol. 25(11), pp. 1251-1255, November 2007. CEDAR [10] effort and others to create interactive forms that [10] M.A. Musen, C.A. Bean, K.H. Cheung, et al, “The center will ensure that only valid terms are entered. for expanded data annotation and retrieval,” J Am Med Inform Assoc. 22(6), pp. 1148-52, November 2015. ICBO 2018 August 7-10, 2018 3