=Paper= {{Paper |id=Vol-1747/IT705_ICBO2016 |storemode=property |title=A Realist Representation of Social Identity Data |pdfUrl=https://ceur-ws.org/Vol-1747/IT705_ICBO2016.pdf |volume=Vol-1747 |authors=Amanda Hicks |dblpUrl=https://dblp.org/rec/conf/icbo/Hicks16 }} ==A Realist Representation of Social Identity Data == https://ceur-ws.org/Vol-1747/IT705_ICBO2016.pdf
        A Realist Representation of Social Identity Data
                                                               Amanda Hicks, Ph.D.
                                                   Department of Health Outcomes and Policy
                                                            University of Florida
                                                               Gainesville, USA
                                                               aehicks@ufl.edu

   Abstract—Social identities merit special treatment in realist                identities. Section Four describes a framework for
ontologies. Their ontological status is unsettled, so we should                 ontologically representing social identities in OMRSE to
model them in a manner that is agnostic with respect to their                   support semantic integration of demographic data. Section Five
ontological status. Nevertheless, there is a clear criterion for                describes the results of the validation of our representation with
determining whether a specific person has a particular identity,                competency questions. Section Six discusses results and future
namely, whether that person asserts that they do. This social act               work.
forms the basis for a realist representation, not of social identities
themselves, but of data about social identities. We report the
representation of social identities in the Ontology of Medically                       II.    BACKGROUND ASSUMPTIONS AND HYPOTHESES
Related Social Entities and show that it supports data integration                  [2] notes that demographic data are about a heterogeneous
and retrieval.                                                                  group of things; they may include data about preferred
                                                                                language, biological sex, gender identity, race, date of birth,
   Keywords—data integration; demographic information;                          and marital status. The ontological status of some of these
ethnicity; gender identity; identity; Ontology of Medically Related
                                                                                entities is clear. Biological sex is a quality of an organism [7];
Social Entities; race
                                                                                date of birth is a time interval; and marital status is the result of
                                                                                a contractual act. However, the ontological status of race,
                          I.    INTRODUCTION                                    ethnicity, and gender identity is controversial [8, 9]. For this
    Demographic information is widely used in information                       reason, this paper does not attempt to answer the question,
systems. In medical and health information systems they                         what kind of things are race, ethnicity, and gender identities?
support a variety of biomedical and informatics tasks such as                   Instead, it places the process of asserting an identity at the
cohort discovery, statistical comparison of groups of people,                   center of a realist represention of social identity data in
and record linkage [2]. Common demographic data collected in                    OMRSE.
medical settings include birth date, preferred language, race,                      We begin our work with the assumption that there is a
ethnicity and sex or gender. In 2011 the Institute of Medicine                  difference between demographic data such as gender identity,
recommended collecting information on sexual orientation and                    race, ethnicity, on the one hand, and sex, birth date, and marital
gender identity (as distinct from biological sex) in electronic                 status on the other. Although the latter group is heterogeneous,
health records [3], and Stage 3 for Meaningful Use requires                     its members do share something significant in common;
that electronic health records (EHR) certified for meaningful                   statements about each can be verified as inter-subjective facts
use have fields for collecting information on sexual identity by                about the world. Although we often gather data about a person
2018 [4-6]. It is, therefore, increasingly important to                         by asking questions such as Are you male or female?, What is
semantically represent gender identity and other social                         your birth date?, and Are you married?, biological sex, birth
identities coherently to support data retrieval and integration.                date, and marital status refer to inter-subjective features of the
[2] discusses previous work on realist representations of                       world. If by ‘sex’ we mean karyotypic or phenotypic sex, we
demographic information in general in the Ontology of                           can perform genetic testing to determine a person’s karyotype
Medically Related Social Entities (OMRSE).                                      or a physical examination to determine phenotype. While we
    This paper describes social identities as a special subset of               cannot directly observe the date of a person’s birth, once the
demographic information and describes a realist representation                  event is completed, a birth date is something that multiple
of social identities to support data retrieval and data                         people observe and come to consensus on. We can determine
integration. This representation supports integration and                       that a person is married by producing a marriage certificate; if
retrieval of data about people according to their social                        there is no marriage certificate, there is no marriage. In this
identities. For the purpose of this paper, social identities                    sense, reports of one’s own sex, birth date, and marital status
include (but are not be limited to) race, ethnicity, and gender                 are corrigible in the face of facts about the inter-subjective
identity.                                                                       world. However, reports of one’s own gender identity, race,
                                                                                and ethnicity are not similarly corrigible. That is, if Jane says
    Section Two describes the background assumptions, and                       that she is a black, Latina, woman, she has already provided all
hypothesis of this paper. Section Three provides background                     the information we can hope to acquire to determine and verify
on data collection for gender identity, sexual orientation, race                her race, ethnicity, and gender identity. There is nothing in
and ethnicity, drawing important distinctions for understanding                 either the physical or social the world that we can consult to
the semantics of terms used to describe these types of social
This work was supported in part by the NIH/NCATS Clinical and Translational Science Awards to the University of Florida UL1 TR000064 and by
award CDRN-1501-26692 from the Patient Centered Outcomes Research Institute (PCORI). The content is solely the responsibility of the authors and does not
necessarily represent the official views of NIH/NCATS or PCORI.
verify the truth of these claims unless it is to return to Jane      subjective report of their identity rather than an objective or
herself and ask her to verify these statements.                      inter-subjective criterion.
    Nevertheless, it seems that it is possible for Jane to provide       Gender identity does not refer to biological and
misinformation about at least some aspects of her identity. For      physiological characteristics since it is distinct from biological
example, one might object that if Jane has white, non-Latino         sex. Furthermore, gender identity cannot be ascertained or
parents who insist that Jane herself is neither black nor Latina,    verified by gender expression. Consider two cases. 1) Some
that this constitutes intrasubjective evidence that her claims are   trans individuals have not socially transitioned to their
false. This scenario underscores the importance of the context       perceived identity. A biological male who lives as a man but
of data collection for determining the meaning of the data           has a subjective sense of being a woman may have a masculine
collected. As we will see in the next section, the race and          gender expression that would not be indicative of their
ethnicity data collection practices and guidelines prevalent in      feminine gender identity. 2) Some people adopt the cultural
U.S. healthcare system explicitly rule out defining race and         norms associated with a particular gender expression, but
ethnicity in terms of “blood” quotas or other inclusion criteria.    identify differently. For example, a non-binary person may
Furthermore, the definitions that do exist for these terms are       have a masculine gender expression without identifying as a
seldom presented to respondets. The result is that the data that     man.
are currently, routinely collected only tell us how the person
actually identifies themselves. Notice how this affects the case     B. Race and Ethnicity
where Jane’s parents are white, non-Latino. In the absence of            The Office of Management and Budget (OMB) has defined
clear inclusion and exclusion criteria for “white” and “Latino”,     a minimal set of categories for collecting data on race and
all we know is that Jane’s parents identify themselves as white      ethnicity in the U.S. Census. These categories are also used in
and non-Latino. This does not rule out Jane having reasons to        health care settings and health research in the U.S. [11, 12]. It
identify some other way. Finally, we may be concerned that           is important to note that, while the OMB defines the minimum
Jane has deliberately provided misinformation about her              race and ethnicity categories partially in terms of genealogy,
identity. There are two things to note about this scenario. First,   they explicitly do not regard the categories as naturalistic,
no ontology can get around the problem of potential dishonesty       anthropological, or scientific, but instead as social-constructs.
or bad data collection practices, nor are they intended to.          Furthermore, they encourage self-identification in the data
Second, even in the broader context of data management we do         collection process wherever possible [11].
not regard this as a pressing issue since, we have no reason to
suspect that providing deliberately misleading inforamtion
about one’s identity is a common enough pratice to effect the        TABLE I.          DEFINITIONS FROM THE IOM 2011 REPORT ON THE HEALTH
                                                                                                  OF LGBT PEOPLE
results of data quality and data analysis significantly.
                                                                           TERM                                DEFINITION
    Our hypothesis was that representing social identity data
with respect to the process of identifying rather than in terms of   Sex                    a biological construct, referring to the genetic,
identities themselves can support data integration and retrieval                            hormonal,       anatomical,      and      physiological
                                                                                            characteristics on whose basis one is labeled at birth
in a realist framework while avoiding controversial ontological                             as either male or female
commitments.                                                         Gender                 the cultural meanings of patterns of behavior,
                                                                                            experience, and personality that are labeled masculine
                                                                                            or feminine
 III.   DATA COLLECTION FOR GENDER IDENTITY, RACE, AND
                                                                     Gender Expression      the manifestation of characteristics in one’s
                      ETHNICITY                                                             personality, appearance, and behavior that are
    For the purpose of this work we have adopted the definition                             culturally defined as masculine or feminine
and characterization of gender identity in [1]. For race and         Gender Identity        a person’s subjective sense of his or her gender
ethnicity we use the Office of Management and Budget (OMB)
definitions and guidelines[10] since this standard is already            The OMB definitions for race characterize racial categories
widely used in biomedicine. Most medical terminologies,              on the basis of their descent from the original peoples of some
coding schemes, and surveys use terms that are intended to           geographic region (Table 2). This characterization poses
comply with the Office of Management and Budget (OMB)                problems for a realist representation. First, the criterion is
minimum set of categories for race and ethnicity [11, 12].           ambiguous insofar as it does not define ‘original peoples’. At
                                                                     what point in human history are original peoples determined?
A. Gender identity                                                   Second, the criterion is not applied consistently. ‘American
    Table 1 contains definitions of terms related to sex and         Indian or Alaska Native’ is defined as a person who has origins
gender as presented in [1]. These definitions have been              in any of the original peoples of North and South America
influential in shaping the discussion of the collection of data      (including Central America), and maintains tribal affiliation or
about gender identity [11] and conform to standard usage             community attachment (emphasis added). This is the only race
where the distinctions between (a) sex and gender and (b)            category that has the extra requirement of a social relationship,
gender expression and gender identity are observed.                  which renders the categories not exhaustive. For example,
                                                                     Mexican-Americans who have origins in the original peoples
    By examining these definitions we can see that the               of South or Central America but do not maintain a tribal
verification criteria for gender identity is the individual’s own
affiliation or community attachment do not fit any of OMB                            •     “Respect for individual dignity should guide the
categories for race.                                                                       processes and methods for collecting data on race and
                                                                                           ethnicity; ideally, respondent self-identification should
    However, despite the genealogical criterion in the
                                                                                           be facilitated to the greatest extent possible, recognizing
definitions of these terms, the OMB guidelines stress
                                                                                           that in some data collection systems observer
interpreting statements about race as socio-cultural
                                                                                           identification is more practical.”
characteristics that involve ancestry rather than as biological or
genetic characteristics. This connection to ancestry suggests                        •     “do not establish criteria or qualifications (such as
that the verification criterion for an OMB-based statement                                 blood quantum levels) that are to be used in
about racial identity is about a historical fact since ancestry is                         determining a particular individual's racial or ethnic
determined by inter-subjective criteria. However, this contrasts                           classification.” (original emphasis)
with additional guidelines for data collection that indicate that
that the verification criteria are the subject’s response to OMB                     •     “do not tell an individual who he or she is, or specify
questions about race.                                                                      how an individual should classify himself or herself.”
                                                                                           (original emphasis) [11].




                       TABLE II.        DEFINITIONS FOR THE OFFICE OF MANAGEMENT AND BUDGET MINIMUM CATEGORIES FOR RACE
           OMB CATEGORY                                                                    OMB DEFINITIONS
American Indian or Alaska Native             A person having origins in any of the original peoples of North and South America (including Central America),
                                             and who maintains tribal affiliation or community attachment.
Asian                                        A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent
                                             including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands,
                                             Thailand, and Vietnam.
Black or African American                    A person having origins in any of the black racial groups of Africa. Terms such as “Haitian” or “Negro” can be
                                             used in addition to “Black or African American.”
Native Hawaiian or Other Pacific Islander    A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
White                                        A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.
                                                                                     In short, the ontological types of things that a race and
    Similarly to race, the OMB’s definition of ethnicity also                    ethnicity datum might be about are heterogeneous, and to make
invokes genealogy. The term ‘Hispanic’ refers to persons who                     matters worse, there is often not a single type that is common
trace their origin or descent to Mexico, Puerto Rico, Cuba,                      to all of them that would provide either necessary or sufficient
Central and South America, and other Spanish cultures.                           conditions. Furthermore, these categories are not historically
                                                                                 stable and stem from contingent circumstances. Even if an
    However, the same caveats that were discussed for race                       ontologist were confident that there are universals for social
apply to ethnicity, namely, 1) ‘ethnicity’ should not to be                      identities, the historical contingency of identity categories
interpreted as referring to biological or genetic characteristics,               makes ontologically representing these social identities as
but rather as referring to ancestry, and 2) the verification                     stable universals impractical. Nevertheless, ontologists can
criterion for OMB-based statements about ethnicity is the                        provide a realistic representation of how people actually
subject’s response to OMB-based questions about ethnicity.                       identify when asked to do so. The lack of inter-subjective
    Finally, we should not expect existing data on race and                      verification criteria for identity statements in tandem with the
ethnicity to reflect a consistent, genealogical criterion since                  stress on self-identification in the instructions provides a
most patients are not presented with definitions of racial and                   principled basis for representing social identity data differently
ethnic terms during the intake process at a clinic or on a survey                from data with an inter-subjective or objective verification
and because the language used to describe these categories may                   criterion such as birth date and diagnosis.
vary at the discretion and preference of the person(s) designing
the form. For example, ‘black’, ‘African American’, and ‘black                           IV.   A REALIST REPRESENTATION OF IDENTIFICATION
or African American’ can all be used to describe the same                                           PROCESSES AND IDENTITY DATA
racial category.
                                                                              categories as long as they are extensions of and mappable to
                                                                              the OMB minimum categories, i.e., as long as they do not
                                                                              introduce new categories but are equivalent or subcategories to
                                                                              those in the minimal set [10]. In cases where the expanded set
                                                                              includes subcategories of OMB classes, corresponding identity
                                                                              data can be introduced as a subclass of the appropriate OMB
                                                                              datum. For example, Fig. 3 shows CDC Spanish Basque datum
                                                                              as      a
                                                                              subclass
                                                                              of OMB                                                 Racial&
                                                                              Hispanic                  Racial&Iden+ty&
                                                                                                           Datum&
                                                                                                                                 Iden+fica+on&
                                                                                                                                    Process&
                                                                              or
 Fig. 1. Representation of Identification Data and Identification Processes   Latino
 In OMRSE.                                                                    datum.
    In light of the fact that it is not clear what kinds of things
                                                                                                              OMB&Racial&                    OMB&Racial&
identities are, OMRSE does not model identities as such.                                                       Iden+ty&                     Iden+fica+on&
However, we do know how identity data are collected and that                  B. Inte                           Datum&      has&specified&      Process&
                                                                                                                              &output&
their verification criterion involves the process of identifying.                 grati
                                                                                                    PCORnet&    OMB&Asian&
For this reason, we make the processes of asserting an identity                   ng                 Racial&     Iden+ty&
                                                                                                    Iden+ty&                                      PCORnet&
central to representing social identity data, rather than identities              Hete               Datum&
                                                                                                                  Datum&
                                                                                                                                                    Racial&
                                                                                                                                                Iden+fica+on&
themselves. An identification process is a planned process that                   roge                                     has&specified&&
                                                                                                                              output&              Process&
might utilize a specific vocabulary or common data model,                         neou
such as the OMB minimal categories for race and ethnicity.                        s              PCORnet&Asian&
                                                                                                 Iden+ty&Datum&
However, some identification processes might not use a                            Data
common vocabulary or common data elements. For example,                           Desp
some may only utilize a free text field. Identification processes,            ite the       Fig. 2. An example of how to represent heterogeneous social identity
as we represent them here, are planned process that record an                               data using
                                                                              similar
identity statement about an individual person. They should not                categori
be confused with the private and internal mental or emotional                 es and identical definitions, the PCORnet CDM and the OMB
process that involve or give rise to a subject sense of one’s                 racial categories describe different classes of people. The OMB
identity. Identification processes, as we describe them here, are             guidelines allow people to select more than one race [14].
planned, social, and result in identity data. OMRSE represents                PCORnet CDM does not. Instead, the PCORnet CDM has a
these data as information content entities that are the outputs of            class for multiple race. Consider a person who identifies as
identification processes. Conversely, all identity data are the               both Black and Asian according to the OMB definitions.
specified outputs of an identification processes. Fig. 1                      According the OMB guidelines in which a person can select
illustrates the representation of identity data and identity                  more than one race, someone could identify as both Black and
processes in OMRSE.                                                           as Asian, and that person would be retrieved by a query for
    Subclasses of identification process include racial                       people who identified as Black, people who identified as
identification process, ethnic identification process, and gender             Asian, and people who identified as both. If the same person
identification process. Identification processes that use a                   were filling out a medical intake form using the PCORnet
particular set of terms or coding scheme can be the basis of                  CDM guidelines, they would be instructed to choose only one
further descendent classes of identification process. For                     race. They could, therefore, choose either Black or Asian or
example, OMB racial identification process and PCORnet                        multiple race, but they could not choose both Black and Asian.
racial identification process are subclasses of racial                        With OMB standards, the classes of people who identify as
identification process (Fig. 2). The latter represents racial                 Black and who identify as Asian can overlap. For the PCORnet
identification used in the PCORnet Common Data Model                          CDM, they are disjoint. Therefore, the class of people who can
(CDM), a data standard for representing clinical patient data                 identify with OMB Asian is not identical with the class of
from clinical sites across the US for use in the National Patient-            people who can identify PCORnet Asian but is actually a
Centered Clinical Research Network (PCORnet) [13].                            superclass class. It is worth noting that transforming OMB
                                                                              compliant racial data into the PCORnet CDM results in an
    Table 3 contains definitions related to representing OMB’s                irretrievable loss of information. Namely, persons who have
categories related to OMB Asian as an example of how                          identified with multiple OMB races will be indicated as
identities that employ a common data model or common                          identifying with the semantically less rich category “multiple
vocabulary are represented with this approach.                                races” in the PCORnet CDM. This loss of information is
                                                                              revealed by accurately representing the semantics of these
A. Extended categories                                                        coding schemes, but, in such cases of loss of information, not
    The OMB guidelines for race and ethnicity allow data                      even a good ontology can not recover information that has not
collectors to use a larger number of race and ethnicity                       been stored.
                                                  OMB$Hispanic$
                                                   or$La3no$
                                                    datum$


                                                                                                                                    racial identity categories actually have a different meaning
                                                                                               CDC$Ethnic$                          from the OMB racial identity categories, it would be
                  Homo$                           CDC$Spanish$
                 sapiens$                        Basque$datum$
                                                                                              Iden3fica3on$                          inappropriate to use subclass relations to connect them. We are
                                                                                                 Process$
                                                                                                                                    currently     considering      using      SKOS:broader      and
                                                                                                                                    SKOS:narrower to describe the relations between the
                                                           EI1$
                                                                                                                                    intentional meanings of the terms, but it is not clear that this
                            is$about$
                                                                                 is$specified$
                                                                                  output$of$
                                                                                                                                    will support data retrieval.

                   HS1$                                                                          USCSP1$
                                                                                                                                                                    V.          VALIDATION AND RESULTS
                                                      has$
                                                   par3cipant$
                                                                                                                                         Competency questions are frequently used to validate
                                                                                                                                    modeling decisions in ontologies. They are questions that
   Fig. 3. Representation of Instance Level Social Identity Data                                                                    reflect the needs of the end user and that the ontology ought to
                                                                                                                                    be able to support. We partially validated the suitability of this
    We developed a strategy for representing social identity
                                                                                                                                    representation for data retrieval and data integration with the
data that supports integrating OMB and PCORnet CMD data.
                                                                                                                                    following competency questions below. This validation is only
This strategy is not idiosyncratic to these data models, but is
                                                                                                                                    partial since there are outstanding competency questions that
generalizable. This representation involves articulating the
                                                                                                                                    require additional modelling decisions. We generated an OWL
relations among classes of people who identify with OMB
                                                                                                                                    file with synthetic individuals and constructed Description
Asian and those who identify with PCORnet Asian, as an
                                                                                                                                    Logic queries that answered three out of four of the
example. The OMB category Asian means the person has
                                                                                                                                    competency questions. These queries in Manchester syntax are
declared some Asian descent. The PCORnet CDM category
                                                                                                                                    listed below. The OWL file with synthetic individuals is
Asian means the person has declared only Asian descent. Fig. 2
                                                                                                                                    available at https://github.com/ufbmi/socid.
illustrates how identification processes and identification data
that result from these two heterogeneous coding schemes are                                                                            1. Which people are racially identified as Asian
related. Notice that PCORnet racial identity datum is not a                                                                         according to the OMB criteria?
subclass of OMB racial identity datum. Since the PCORnet

                                                             TABLE III.                 SAMPLE DEFINITIONS FOR REPRESENTING RACIAL IDENTITY DATA
                                                                                                            Ontological Definitions
OMB	
  racial	
  identity	
  datum	
                          A	
   racial	
   identity	
   that	
   is	
   the	
   output	
   of	
   a	
   racial	
   identification	
   process	
   that	
   uses	
   OMB	
   terminology	
   for	
   race	
   or	
  
                                                              terminology	
  that	
  is	
  mapped	
  the	
  OMB	
  race	
  terms.	
  
OMB	
  Asian	
  identity	
  datum	
                           An	
   OMB	
   racial	
   identity	
   datum	
   about	
   a	
   person	
   who	
   is	
   identified	
   as	
   having	
   origins	
   in	
   any	
   of	
   the	
   original	
   peoples	
   of	
  
                                                              the	
  Far	
  East,	
  Southeast	
  Asia,	
  or	
  the	
  Indian	
  subcontinent.	
  
Subject	
   of	
   an	
   OMB	
   Asian	
   identity	
        A	
  human	
  being	
  who	
  is	
  the	
  subject	
  of	
  an	
  OMB	
  Asian	
  identity	
  datum	
  
datum	
  
Subject	
   of	
   a	
   self-­‐identified	
   OMB	
          A	
  human	
  being	
  who	
  is	
  the	
  subject	
  of	
  an	
  OMB	
  Asian	
  identity	
  datum	
  and	
  who	
  is	
  the	
  agent	
  of	
  the	
  planned	
  process	
  
Asian	
  identity	
                                           for	
  which	
  that	
  identity	
  is	
  a	
  specified	
  output.	
  


      inverse 'is about' some 'Asian identity'                                                                                        We have included this representation of identity data in
                                                                                                                                    OMRSE, available at www.github.com/ufbmi/omrse.
    2. Which people are racially identified with multiple
races according to OMB criteria?
                                                                                                                                                                                   VI.         DISCUSSION
      inverse 'is about' min 2 'OMB racial identity'
                                                                                                                                        This proposal diverges from traditional realist approaches
   3. Which people are racially identified with more than                                                                           insofar as it advocates representing social identities in terms of
one race in either OMB or PCORnet CDM?                                                                                              their verification criteria rather than according to their
                                                                                                                                    ontological properties. This approach has the advantage of
   inverse 'is about' min 2 'OMB racial identity' or inverse 'is
                                                                                                                                    supporting data integration and retrieval according to realist
about' some 'PCORnet multiple race identity
                                                                                                                                    principles, without making dubious ontological commitments.
   4. Which people are racially identified only as Asian                                                                            It also does not sacrifice clear semantics, interoperability of
according to OMB or PCORnet criteria?                                                                                               data, or data retrieval. While our competency questions only
                                                                                                                                    address racial identity, they do show that different types of
    Competency Question 4 requires indicating that each of the                                                                      social identity data that have been gathered according to
OMB race categories are different. For example, we must                                                                             different criteria can be adequately represented according to the
decide whether the classes OMB Asian identity datum and                                                                             general ontological principles described in this paper.
OMB Alaska Native or Native American datum are disjoint.                                                                            Analogous questions involving ethnicity and gender identity
Adding a disjointness axiom would rule out the possibility of a                                                                     can be expected to be handled by this approach since they have
single identity datum item that indicates that person has both                                                                      the same logical form.
identities, but may support this competency question. Future
work will focus on the best way to represent this situation.                                                                            Future work includes representing relations between types
                                                                                                                                    of identity data to handle the remaining competency question,
developing a set of gender identity terms to include in                           foundation for better understanding. Washington (DC): National
OMRSE, and query real patient data to assess the impact of                        Academies Press (US); Buffalo, New York: 2011.
this representation on cohort discovery tasks that include race              [2] Hogan WR, Garimalla S, Tariq SA, editors. Representing the reality
                                                                                  underlying demographic data. International Conference on Biomedical
and ethnicity.                                                                    Ontologies (ICBO); 2011.
                                                                             [3] Institute of Medicine (US) Committee on Lesbian G, Bisexual, and
                         VII. CONCLUSIONS                                         Transgender Health Issues and Research Gaps and Opportunities.
                                                                                  Collecting sexual orientation and gender identity data in electronic
    Our hypothesis was that representing social identity data                     health records: Workshop summary. Washington DC: The National
with respect to processes of identifying rather than identities                   Academies Press, 2013 0309268044 9780309268042.
themselves can support data integration and retrieval in a                   [4] Cahill SR, Baker K, Deutsch MB, Keatley J, Makadon HJ. Inclusion of
realist framework while avoiding controversial ontological                        sexual orientation and gender identity in Stage 3 Meaningful Use
commitments.                                                                      Guidelines: A huge step forward for LGBT health. LGBT health. 2015.
                                                                             [5] Department of Helath and Human Services CfMaMS. 42 cfr parts 412
    We have produced a BFO-based representation of race and                       and 495, [cms-3310-fc and cms-3311-fc], rins 0938-as26 and 0938-as58.
ethnicity identities and developed strategies for semantically                    Medicare and Medicaid programs; Electronic Health Record Incentive
integrating social identity data that have been collected using a)                Program—Stage 3 and modifications to Meaningful Use in 2015
                                                                                  through 2017. 2015 October 7.
the OMB minimal categories for race and ethnicity, b)
                                                                             [6] Department of Helath and Human Services CfMaMS. 45 cfr part 170,
extensions of the OMB minimal categories for race and                             rin 0991-ab93. 2015 edition health information technology (healthit)
ethnicity, and c) common data models such as the PCORnet                          certification criteria, 2015 edition based electronic health record (ehr)
CDM whose semantics differ from the OMB minimum                                   definition, and onc health it certification program modification. 2015
categories due to pick one/pick many discrepancies. We have                       October 6, 2015.
added this representation to the OMRSE and produced a                        [7] Smith B, Ceusters W. Ontological realism: A methodology for
synthetic data set in an OWL file to test our competency                          coordinated evolution of scientific ontologies. Applied Ontology.
                                                                                  2010;5(3-4):139.
questions. Our representation to date handles three out of four
of our competency questions.                                                 [8] James M. Race 2016 [updated March 16, 2016; cited 2016 April 19].
                                                                                  Available from: http://plato.stanford.edu/archives/spr2016/entries/race/.
                                                                             [9] Mikkola M. Feminist perspectives on sex and gender 2016 [updated
                        ACKNOWLEDGMENTS                                           January 29, 2016; cited 2016 April 19]. Available from:
                                                                                  http://plato.stanford.edu/archives/spr2016/entries/feminism-gender/.
    Thanks to William R. Hogan for reviewing and
                                                                             [10] Revisions to the standards for the classification of federal data on race
commenting on the manuscript and to the Clinical and                              and ethnicity, (1997).
Translational Science Ontology Group for providing feedback                  [11] Helsing K, editor Capturing social and behavioral domains and
on a presentation of earlier work at the Charleston, SC meeting                   measures in electronic health records. 143rd APHA Annual Meeting and
in September 2015.                                                                Exposition (October 31-November 4, 2015); 2015: APHA.
                                                                             [12] Racial and ethnic categories and definitions for NIH diversity programs
                                                                                  and for other reporting purposes, NOT-OD-15-089 (2015).
                             REFERENCES
                                                                             [13] PCORnet Common Data Model (cdm) [updated Last updated on
[1]   Institute of Medicine (US) Committee on Lesbian G, Bisexual, and            December 18, 2015 cited 2015 April 25]. Available from:
      Transgender Health Issues and Research Gaps and Opportunities,. The         http://www.pcornet.org/pcornet-common-data-model/.
      health of lesbian, gay, bisexual, and transgender people: Building a