Towards a Common Semantic Representation of Informed Consent for Biobank Specimens Frank J. Manion1*, Yongqun He2, Elizabeth Eisenhauer3, Yu Lin2, Alla Karnovsky4, Marcelline R. Harris3 1 Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, MI 48109 2 Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA 3 Division of Systems Leadership and Effectiveness Science, University of Michigan School of Nursing, Ann Arbor, MI 48109 4 Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109 Abstract — Biospecimen-based research is rapidly growing in the may be collected under different models of consent. A typical post genomic era, and includes the need to retrieve specimens from scenario might read something like this: distributed biobanks of various size and complexity in a fashion that ethically preserves the expressed wishes of specimen donors as “For   my   study,   I   want to use samples from my represented by the informed consent process and its artifacts. This organizations’s  biobank, collected under a blanket biobank paper briefly describes existing work along these lines, presents some informed consent form. I discover that I will need more challenges unique to biobanks, and presents our own work on an samples, so I contact   another   organization’s   biobank   to   ontology of informed consent. determine if they hold relevant and available specimens. That   organization’s   samples   were   collected   under   a   tiered   Keywords— BioBank, Informed consent; ontology; ICO; OBO biobank informed consent form. While some samples are Foundry, Basic Formal Ontology (BFO), OBI ontology shared with me, I still need more samples to address the requirements of my study. I then collect additional samples I. INTRODUCTION using a consent form specific  to  my  study.” Research in the post-genomic era requires access to high quality biospeciments, often annotated with or linked to In this example there are three informed consents forms to clinical data. Many groups at varying levels of institutional account for – a blanket consent, a tiered consent, and the complexity, ranging from small scale individual laboratories to investigator’s single study consent. In an effort to support the distributed international collaboratives, have established and expressed wishes of the donors, informed consent documents operate biorepositories (also refered to by various names such impose a series of legal and ethical restrictions, obligations, as biobanks, biolibraries, and even collections). Often, there and permissions to biobank operators and research teams using are needs to share data and specimens among multiple the specimens and data collected in these banks. Often these biobanks [1-3]. The act of requesting specimens from a rights, obligations, and permissions accrue from multiple biorespository may demand a complex series of transactions, sources of authority and are represented in multiple legal each of which in turn may convey a series of rights, obligations, documents. Consequently, the biobanking domain presents a and permissions for access to specimens and data. Despite over series of modeling challenges, including: a decade of experience incorporating biospecimens in the The operational model of the biobank. A biobank can be a research process, formal models that describe the use of single, dedicated resource that provides samples to single or biorepositories in human research are a relatively recent closely allied groups of studies using a common consent development. Without a common formal model of consent and model. It might be a virtual or distributed biorepository using the associated permissions on collection and distribution of precoordinated consent models. Another organization structure specimens and data, integration of data across the translational might be that of a shared biobank facility containing multiple spectrum, or from multiple banks and institutions will remain a sets of tissues from multiple projects and attempting to difficult, manually intensive problem. maximize use of these tissue resources by making them In this paper we briefly review current efforts toward such available to requestors. models, describe our own work toward a formal model for The consent model used for the biobank. This can be opt-in informed consent, and describe what we consider challenges or opt-out. In the case of an opt-in consent model, a tiered and opportunities for supporting biorespository-based research consent may be used to present the participant or volunteer with ontologies. A simple example that provides a motivation with choices of the type of data the participant may want for this effort follows. shared, and for what types of research or other constraints. II. EXAMPLE OF THE CHALLENGE The protocol model the bank operates under. Typically a biobank serving more than one project would operate under Clinical or translational research often involves the one or more Institutional Review Board (IRB)-approved extraction and usage of biospecimen from humans. Different collection protocols and Health Insurance Portability and biospecimens may be stored and processed differently, and Accountability Act (HIPAA) authorizations. Researchers * Corresponding author.  61   subsequently requesting specimens and data would operate IV. THE INFORMED CONSENT ONTOLOGY (ICO) under separate IRB-approved protocols, and depending on this Development of ICO, a BFO-based ontology represented in protocol, separate consent and HIPAA authorization may be the Web Ontology Language (OWL2) [12], follows OBO required for use of a previously banked specimen. Such a Foundry principles of openness and collaboration. ICO is model is sometimes called a two-protocol model [4]. aligned with the BFO [13], making it possible to align and Rights, obligations, and permissions accrue from multiple integrate with other BFO-based ontologies. The initial release sources and must be consistent across time. Properly modeling of the ontology focuses on modeling informed consent the decisions typically made by human review boards and documents. As for Aug. 14, 2014, ICO contains 471 terms regulatory personnel considering sample and data distribution including 137 ICO-specific terms and other terms imported for research requires modeling not just the consent documents, from other BFO-aligned ontologies. Detailed ICO statistics can but the protocols, data use agreements, and possibly other be found on the Ontobee ICO web page: information artifacts used in both depositing samples into a http://www.ontobee.org/ontostat.php?ontology=ICO. ICO is biobank, and withdrawing them for subsequent research. released under an open Creative Commons 3.0 License. In a research oriented university such as the University of The ontology was developed using a combination of top- Michigan, thousands of informed consent forms have been down and bottom-up approaches. Protégé-OWL 4.2 was used generated, and there are over 100 biobanks in the Medical for the ontology authoring and editing. To build the OBI-based School alone. Queries supporting appropriate use of banked framework of ICO we manually identified informed consent biospecimens and data must be linked to the signed informed concepts from existing OBO Foundry library ontologies. These consent agremments with the biospecimen donor. were imported to ICO using Ontodog [14] and OntoFox [15] which allowed for recursive inclusion of all defined axioms III. EXISTING EFFORTS and related terms. The results were then manually reviewed for Several current efforts are evident, focused on modeling final approval before inclusion in the ICO framework. aspects of the biobanking domain. At least two BFO-aligned Bottom up construction proceeded by manually identifying ontologies relate to biobanking. The Ontologized Minimum and extracting a list of candidate terms from two informed Information About BIobank data Sharing (OMIABIS) consent templates used at the University of Michigan (one expresses data concepts in an ontology of biobank from the Medical School Institutional Review Board, another administration [5]. OMIABIS is based on work by Norlin and from the Health Sciences and Behavioral Sciences Institutional colleagues [6] to develop a minimum data set for eight Review Board). We also identified terms from a consent form countries participating in the EU Biobanking and Biomolecular used for the University of Michigan Medical School Resources Research Infrastructure project. Limitations of this biorepository, and from World Health Organization (WHO) effort are that it is intended to serve only as a description of a informed consent templates. The candidate terms identified biobank contents, and does not describe collection critera, from these templates were then enriched with metadata consenting, and protocol provenance of individual specimens. including definitions, concept identifiers, preferred terms, A group at the University of Pennsylvania is developing an synonyms, and URIs extracted from three ontology ontology for the representation of biobanks, although the work repositories: the National   Library   of   Medicine’s   Unified is in early stages [7]. Similarly, we are aware that a group at Medical Language System (UMLS®) Metathesaurus [16]; the Duke University is working on a collaborative effort to develop National Center for Biomedical Ontology (NCBO) BioPortal a normative set of data elements and terms to recommend as [17]; and Ontobee [18]. When textual definitions were not best practice to the International Society for Biological and provided, other sources such as clinical research glossaries or Environmental Repositories (ISBER), although this work is not the current literature were used. These enriched candidate yet published [8, 9]. terms were manually mapped to several pre-identified There are also non-BFO aligned ontologies in related areas, resources containing terms and definitions developed and including a Permission Ontology used for development and vetted by the United States regulatory community. This process evaluation of software tools for reasoning about consent yielded candidate preferred terms contining definitions permission, published by a group at the University of accepted as robust and well defined by that community. California San Diego (UCSD) [10]. Related work to build a Resources used in this step included the National Cancer Research Permission Management System was done at the Institute Thesaurus (NCIt), the Biomedical Research Integrated Medical University of South Carolina (MUSC) to support a Domain Group (BRIDG) [19], the Ontology of Clinical statewide research network [11]. A search of the term Research (OCRe) [20], the Consumer Health Vocabulary “consent”   in   the   NCBO   biportal   identified   the   notion   of   (CHV) and the University of California San Diego permission informed consent at the class level in 19 different systems ontology [10]. (http://bioportal.bioontology.org/search). The pool of enriched candidate terms was organized into Our efforts to develop a BFO-aligned informed consent categories of like terms according to their definitions. For ontology (ICO) emphasizes the broad domain of informed example, the category   ‘authorization’   included the terms consent. Although motivated by a biobanking use case, initial ‘authorization for medical records release’,   ‘authorization development reported here is not restricted to that domain. documentation’   or   ‘authorization’. Enriched candidate terms grouped by categories formed to-be-included terms in ICO. The final set of categories (or modeling units) was then  62   mapped to branches of BFO. For example, terms categorized REFERENCES under   ‘authorization’   were considered to be subclasses of [1] A. Cambon-Thomsen, E. Rial-Sebbag, and B. M. Knoppers, "Trends in BFO:process. Informed consent workflows in a typical clinical ethical and legal frameworks for the use of human biobanks," Eur research study were modeled as three processes: (i) pre- Respir J, vol. 30, pp. 373-82, Aug 2007. informed consent processes, (ii) obtaining informed consent [2] R. E. Hewitt, "Biobanking: the foundation of personalized medicine," Curr Opin Oncol, vol. 23, pp. 112-9, Jan 2011. processes, and (iii) processes after signing informed consent [3] G. E. Henderson, R. J. Cadigan, T. P. Edwards, I. Conlon, A. G. Nelson, documents. Relations between entities involved in the above J.  P.  Evans,  A.  M.  Davis,  C.  Zimmer,  and  B.  J.  Weiner,  “Characterizing   processes were defined. Finally, all terms and relations were biobank   organizations   in   the   U.S.:   results   from   a   national   survey,”   aligned with BFO. Genome Medicine, vol. 5, no. 1, p. 3, Jan. 2013. [4] F. J. Manion, R. J. Robbins, W. A. Weems, and R. S. Crowley, V. DISCUSSION AND CONCLUSIONS "Security and privacy requirements for a multi-institutional cancer research data grid: an interview-based study," BMC Med Inform Decis Modeling informed consent is a necessary but not sufficient Mak, vol. 9, p. 31, 2009. part of the modeling needed to support responsible use of [5] M. Brochhausen, M. N. Fransson, N. V. Kanaskar, M. Eriksson, R. biospecimens and data in research. Biospecimen and data Merino-Martinez, R. A. Hall, L. Norlin, S. Kjellqvist, M. Hortlund, U. Topaloglu, W. R. Hogan, and J.-E. Litton, "Developing a semantically release is complex, and informed consent plays a major role in rich ontology for the biobank-administration domain," J Biomed the regulatory and scientific governance used by Semantics, vol. 4, p. 23, 2013. biorepositories to release specimens and data. In follow on [6] Norlin, M. N. Fransson, M. Eriksson, R. Merino-Martinez, M. work we plan to examine the specific area of specimen and Anderberg, S. Kurtovic, and J.-E.   Litton,   “A   Minimum   Data   Set   for   Sharing   Biobank   Samples,   Information,   and   Data:   MIABIS,”   data release involving the longitudinal agreements of rights, Biopreservation and Biobanking, vol. 10, no. 4, pp. 343–348, Aug. permissions, and obligations. Other work is needed in the 2012. complex areas of protocol representation, data use agreements [7] J. Zheng, Personal Communication to Yongqun He, 2014. and material transfer agreements. [8] H. Ellis, Personal Communication to Frank J. Manion, 2014. [9] ISBER. (2014). ISBER - International Society for Biological and Limitations of our preliminary work will inform further Environmental Repositories. Available: http://www.isber.org/ development efforts toward a robust Informed Consent [10] A. Grando and R. Schwab, "Building and evaluating an ontology-based Ontology. First, the ICO is admittedly preliminary work and is tool for reasoning about consent permission," AMIA Annu Symp Proc, vol. 2013, pp. 514-23, 2013. currently focused on informed consent documents and [11] J. S. Obeid, K. Gerken, K. C. Madathil, D. Rugg, C. E. Alstad, K. Fryar, processes. More work is needed to validate the coverage and R. Alexander, A. K. Gramopadhye, J. Moskowitz, and I. C. Sanderson, completeness in the domain. Concepts from the US Common “Development   of   an   Electronic   Research   Permissions   Management   Rule and the EU Prior Informed Consent legislation need to be System to Enhance Informed Consents and Capture Research Authorizations   Data,”   AMIA Summits Transl Sci Proc, vol. 2013, pp. included. Our current models of informed consent processes 189–193, 18 2013. likely lack the richness and complexity of real-life informed [12] W3C, "OWL 2 Web Ontology Language document overview," pp. consent processes, and they need validation with research study http://www.w3.org/TR/2009/REC-owl2-overview-20091027/. Accessed teams from a variety of domain areas. Aspects of rights, on March 1, 2014, 2009. obligations, permissions, and ethics must be modeled and used [13] P. Grenon and B. Smith, "SNAP and SPAN: Towards dynamic spatial ontology," Spatial Cognition and Computation, vol. 4, pp. 69–103, to extend the ontology. Finally, axioms must be developed and 2004. competency validation of the ICO must be conducted using a [14] J.  Zheng,  Z.  Xiang,  C.  J.  Stoeckert,  and  Y.  He,  “Ontodog:  a  web-based series of still to be defined use case derived competency ontology  community  view  generation  tool,”  Bioinformatics, vol. 30, no. questions. 9, pp. 1340–1342, May 2014. [15] Z. Xiang, M. Courtot, R. R. Brinkman, A. Ruttenberg, and Y. He, We have described our work on ICO, a preliminary “OntoFox:  web-based  support  for  ontology  reuse,”  BMC Res Notes, vol. ontology of informed consent that provides general 3, p. 175, Jun. 2010. [16] NLM, "U.S. National Library of Medicine (NLM). Unified Medical classification of content contained in general informed consent Language System (UMLS®) Metathesaurus [version documents. It requires expansion, revisions and collaboration 2013AA][Internet]. Bethesda (MD): National Library of Medicine," to build a robust model, and to move toward a representation of 2013. the complex area of biobank data sharing and specimen [17] P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander, C. Nyulas, T. release. We hope to collaborate with the broader community in Tudorache,   and   M.   A.   Musen,   “BioPortal:   enhanced   functionality   via   new Web services from the National Center for Biomedical Ontology to this effort. access  and  use  ontologies  in  software  applications,”   Nucleic Acids Res, vol. 39, no. Web Server issue, pp. W541–W545, Jul. 2011. ACKNOWLEDGMENT [18] Z.   Xiang,  C.  Mungall,   A.   Ruttenberg,   and   Y.   He,   “Ontobee:   A   Linked   This research was supported by a University of Michigan Data   Server   and   Browser   for   Ontology   Terms.,”   in   Proceedings of the 2nd International Conference on Biomedical Ontologies (ICBO), interdisciplinary research award (MCubed) and by the National Buffalo, NY, USA, 2011, pp. 279–281. Center for Advancing Translational Sciences of the National [19] D.   B.   Fridsma,   J.   Evans,   and   C.   N.   Mead,   “The   BRIDG   Project:   A   Institutes of Health under Award Number 2UL1TR000433-06. Technical   Report,”   Journal of the American Medical Informatics The content is solely the responsibility of the authors and does Association, vol. 15, no. 2, p. 130, 2008. [20] I. Sim, S. W. Tu, S. Carini, H. P. Lehmann, B. H. Pollock, M. Peleg, and not necessarily represent the official views of the National K.   M.   Wittkowski,   “The   Ontology   of   Clinical   Research   (OCRe):   An   Institutes of Health. informatics   foundation   for   the   science   of   clinical   research,”   Journal of Biomedical Informatics, Nov 13 2013.  63