<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discrimination-aware Data Transformations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Accinelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advised by: prof. Barbara Catania University of Genoa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A deep use of people-related data in automated decision processes might lead to an amplification of inequities already implicit in the real world data. Nowadays, the development of technological solutions satisfying nondiscriminatory requirements is therefore one of the main challenges for the data management and data analytics communities. Nondiscrimination can be characterized in terms of diferent properties, like fairness, diversity, and coverage, and many approaches have been proposed so far for guaranteeing nondiscrimination through the satisfaction of such properties during specific steps of the data processing pipeline. In this PhD project, we are interested in investigating the impact of coverage-based constraints on data transformations. Coverage aims at guaranteeing that the input dataset includes enough examples for each (protected) category of interest, thus increasing diversity with the aim of limiting the introduction of bias during the next analytical steps. We propose coverage-based queries as a mean to achieve coverage constraint satisfaction on the result of data transformations defined in terms of selection-based queries. Both precise and approximate algorithms are designed to guarantee a good compromise between eficiency and accuracy. The applicability of the approach is evaluated by integrating it in a data processing Python toolkit.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;nondiscrimination</kwd>
        <kwd>data transformation</kwd>
        <kwd>coverage</kwd>
        <kwd>rewriting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        result: the sooner you spot the problem fewer problems
you will get in the last analytical steps of the chain (see,
Nowadays, we are surrounded by data that are increas- e.g., the Google’s gorilla classification incident [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]).
ingly exploited to make decisions that might impact peo- In this PhD project, we are interested in investigating
ple’s lives. It is therefore very important to understand the impact of coverage-based constraints on the, possibly
the nature of that impact at the social level and take re- intermediate, datasets generated through data
preparasponsibility for them. The design of data-driven decision- tion, with a special focus on data transformations. This
support systems ensuring a responsible and ethical use of topic is relevant since any data preparation step that
data is therefore a must and it has been recognized that transforms the input datasets might lead to a violation
both data management and data analytic communities of the coverage of protected categories, afecting
subshould contribute [
        <xref ref-type="bibr" rid="ref1 ref21">1, 21</xref>
        ]. Such systems should ensure sequent analytical tasks. Notice that the input dataset
on one hand transparency and interpretability, making can correspond to either raw data that have not been
the process and the decisions easy to understand, and transformed yet (and in this case, solutions like those
on the other nondiscrimination with respect to all the proposed in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] can be used to determine how to
modreference groups of individuals, usually defined in terms ify the input dataset and collect new data) or the result
of sensitive attributes, like, e.g., gender. of, potentially many, data transformation queries. We
      </p>
      <p>
        Nondiscrimination can be characterized in terms of are interested in this second case.
diferent properties like fairness, i.e., lack of bias [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], As an example, suppose you are interested in analyzing
diversity, i.e., the degree to which diferent kinds of ob- data of the well known Adult dataset1 (e.g., predicting
jects are represented in a dataset [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and coverage [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], through classification which individuals make over 50k a
guaranteeing a suficient representation of any category year), after filtering it according to specific criteria (e.g.,
of interest in a dataset. As first pointed out in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and only senior job positions should be considered, qualified
remarked in, e.g., [
        <xref ref-type="bibr" rid="ref11 ref21">11, 21</xref>
        ], such properties should be in terms of selection conditions over age, weekly working
achieved through a holistic approach, incrementally en- hours, and education level). Suppose you would like to
forcing nondiscrimination constraints along all the stages guarantee nondiscrimination with respect to the gender
of the data processing life-cycle, through individually in- by training a model whose accuracy does not deeply
dependent choices rather than as a constraint on the final depend on this attribute. It has already been recognized
that the quality of the classifier might depend, among the
others, also on the number of instances, i.e., the coverage,
of each group in the dataset [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Thus, if the selection
query returning senior job positions includes few female,
      </p>
      <sec id="sec-1-1">
        <title>1https://archive.ics.uci.edu/ml/datasets/Adult</title>
        <p>the result of the classifier can be biased. defined criteria, i.e., a new selection-condition, is used to</p>
        <p>
          In order to solve this problem without going back to retrieve the new result, guaranteeing at the same time
the data collection step, additional female individuals transparency. In this respect, coverage-based queries
difcould be added to the dataset generated through query fer from other similarity-based query approaches, like
execution. fuzzy queries [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Other rewriting-based approaches
        </p>
        <p>
          We tackled this issue by defining and processing have been proposed so far to tackle discrimination issues
coverage-based queries, i.e., selection-based queries that, defined in terms of other properties and queries.
Rewritgiven a set of coverage constraints, always return a result ing has been used for OLAP queries and causal fairness
satisfying the input constraints while staying close to the in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and, more recently, for range queries and fairness
original request. In order to avoid disparate treatment in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. As far as we know and according to [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], no other
discrimination [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and guarantee transparency, the initial solutions addressing coverage-based rewriting in the
conquery is rewritten into a new one, satisfying the coverage text of selection-based queries have been proposed so
constraint while staying close to the original request. far.
        </p>
        <p>
          The main research questions addressed by the PhD
project can be summarized as follows:
(RQ1) How can coverage-based queries be defined and 3. Coverage-based queries
(hRoQw2c)anHothweycabne cchoavrearcatgeeri-zbeads?ed queries be eficiently dPartealsiemtsin(ea.rgi.,esr.elaWtioencsoinnsiaderreldaatitoansatlodreadtabinastea, bdualatar
processed? frames in the Pandas environment). We assume that
(RQ3) How can coverage-based queries be integrated in
tdiaotTnah2pecrrooemcmepsasainirnedgseroeuonfrvtwihrooenrpkmawpeenirtthiss?ootrhgearneixziesdtiansgfoapllporwosa.cSheecs-. isrnoapmcueet)ddsaiisntaccrseeetttheaervyeaioldufeepndatiraftytitcrpuirbloautretiecnstteedreg=srto(uep.1gs,.a,..ng.de,nadreeocrfaaltlnhedde
Coverage-based queries are defined in Section 3 (RQ1) sensitive attributes. We focus on selection-based data
and solutions developed so far for their processing are jtorainnesfdoromraatgiognrse(goarteqdu)erdiaesta)soevtesr, sintoarendaloyrticcoamlppurotecdes(see.gs,
described in Section 4 (RQ2) (some preliminary results that might alter the representation (i.e., the coverage) of
on (RQ1) and (RQ2) can be found in [
          <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
          ]). Details specific groups of interests, defined in terms of sensitive
apbroopuotsaedPtyetchhonniqduaetsainprPoacnedsasisn2 g(RtQoo3l)kciatninbteefgoruantidnignt[h3e] adtattraibsulitceinvgaloupeesr(aet.igo.,nSsQinLPsaenledcatiso,2nCsoolvuemr nreTlraatinosnfoalrmdaetras,
but are not presented in the paper for space constraints. in Scikit-Learn).3
Finally, Section 5 concludes and presents some directions We consider boolean combinations of atomic
selecfor further developments.
tion conditions  ≡  ,  ∈  ,  ∈ {=, &lt;, ≤
, ≥ &gt;},  numeric attribute,  ̸=  ,  = 1, . . . , ,
2. Related work that do not refer to, as usually assumed, sensitive
attributes, i.e.,  ̸∈ . A selection-based query  is thus
Discrimination-aware approaches have been proposed denoted by ⟨1, ..., ⟩ or ⟨⟩,  ≡ (1, ..., ), and
both with reference to data analysis (e.g., OLAP  is called selection vector. A coverage constraint has the
queries [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], set selection [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], ranking [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]) and data form ↓11,,......,,ℎℎ ≥  and specifies that the minimum
preparation (e.g, dataset repair during data acquisition number of instances with sensitive attribute  equal
[
          <xref ref-type="bibr" rid="ref16 ref7">7, 16</xref>
          ], with a special focus on coverage in [
          <xref ref-type="bibr" rid="ref12 ref7 ref8">7, 8, 12</xref>
          ], data to  ,  = 1, ..., ℎ, in a query result has to be . As
cleaning [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and data integration [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]). an example, ↓gfeenmdaelre≥ 10 specifies that the result should
        </p>
        <p>
          Similarly to [
          <xref ref-type="bibr" rid="ref12 ref7 ref8">7, 8, 12</xref>
          ], we consider coverage as a mean include at least 10 female individuals. The group referred
to limit discrimination. However, rather than checking by a coverage constraint is called protected group.
coverage over raw datasets and repair them in case of cov- Definition of coverage-based queries . Let  be a set
erage unsatisfaction through new data acquisitions, we of coverage constraints over a set of sensitive attributes
guarantee coverage satisfaction along data transforma-  and ⟨⟩ a selection-based query. A coverage-based
tion chains defined in terms of selection-based queries. query   for  and ⟨⟩ is a selection-based query that,
        </p>
        <p>Coverage-based queries, presented in this paper, given a dataset , stretches the result ⟨⟩() as little as
change the result of an input query through rewriting possible so that the result satisfies the constraints in .
rather than through the usage of ad-hoc query execution More precisely, when considering a dataset : (i)  
realgorithms. This avoids a disparate treatment discrimina- turns the result of a query ⟨⟩ over ; ⟨⟩ is obtained
tion during selection-based query execution since a well from  by only changing the selection constants that
2https://pandas.pydata.org/pandas-docs/stable/getting_started/
intro_tutorials/03_subset_data.html</p>
      </sec>
      <sec id="sec-1-2">
        <title>3https://scikit-learn.org/stable/modules/generated/sklearn.</title>
        <p>compose.ColumnTransformer.html
algorithms, for each , return one minimal solution4 and
do not rely on any index data structure, so that both
stored and computed datasets can be considered.</p>
        <p>A grid-based approximate approach. The first
approach is approximate because, for each dataset , it
relies on a discretized search space and a sample-based
approach for cardinality estimation, needed for constraint
(a) Input and induced queries (b) Skyline points and minimality checking (property P3). It can be applied
over any dataset for which a sample is available or can
Figure 1: Coverage-based query properties be easily computed on the fly.</p>
        <p>
          The discretized search space is generated by
considering the intersection points of a grid obtained by
dismight depend on ; (ii) ∀ ⟨⟩() ⊆   (); (iii) all cov- cretizing each axis (one for each selection attribute in
erage constraints are satisfied by   (); (iv) ⟨⟩ is min- , from the corresponding selection value in the query
imal. Minimality means that any other query ′ satisfy- to the maximum value in the dataset), using standard
ing conditions (i)–(iii) is such that either (′()) &gt; binning approaches (e.g., equi-width and equi-depth).
(⟨⟩()) or (′()) = (⟨⟩()) and Each point on the grid corresponds to a selection-based
⟨⟩ is syntactically closer than ′ to ⟨⟩, according to query of type ⟨⟩, thus satisfying conditions (i) and
the Euclidean distance (defined in a unit space) between (ii) of the reference problem. ⟨⟩ is then determined
selection vectors. by visiting the discretized search space starting from ,
Properties. A coverage-based query   satisfies the one point after the other, at increasing distance from 
following properties: (algorithm ). The properties of the discretized
(P1) It can be represented in a canonical form in which search space and the canonical form are considered for
each selection condition has the form  ≤  or pruning the space (algorithm  ), possibly
in &lt; ; ⟨⟩ is then represented as point  in the creasing the number of points to be visited at diferent
-dimensional space defined by selection attributes (see iterations (algorithms  and  ).
 in Figure 1(a)). Details on all the algorithm versions and an
exhaus(P2) Let   () ≡ ⟨⟩(). We proved that  coincides tive experimental evaluation, on both synthetic and
realwith the upper right vertex of the minimum bounding world datasets, have been presented in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The obtained
box of at most  distinct points in , , and the origin results depend on the density of the search space and
of the space (see the green triangle in Figure 1(a)) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. show that: (i) equi-depth guarantees better performance
Such vertices are called induced points and the set of over non-uniformly distributed data; (ii) a multi-level
all induced points corresponds to the search space for processing approach, like , greatly helps in
coverage-based queries. reducing the curse of dimensionality when the query
(P3) There is a relationship between  and the skyline of contains a high number of selection conditions; (iii) the
induced points corresponding to queries that, when exe- processing performance linearly depends on the number
cuted over , satisfy ; the dominance relation, needed of coverage constraints; (iv) a good level of accuracy can
for the skyline computation, is defined over selection be obtained with relatively small samples; (v) coverage
attributes, assuming the lower the better (see Figure 1(b)). constraint satisfaction has an obvious impact on the rate
It can be proved that  coincides with the skyline point of diferent groups of protected instances, i.e., on fairness.
corresponding to the query with the minimal cardinality An iteration-based precise approach. More recently,
at the lowest distance from . Thus,  can be identified we started from the naïve approach, derived from P1, P2,
by combining skyline and top-1 computations (possibly and P3, to design a family of algorithms for the precise
mixed, as pointed out in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). computation of coverage-based queries. The designed
algorithms rely on the following considerations: (i) the
induced query space can be computed in up to 
itera4. Coverage-based query tions; the computation of new points at iteration  can be
processing pruned by considering only points obtained at iteration
 − 1 that do not satisfy ; (ii) the iterated computation
of induced points and skyline dominance checks can be
interleaved so that the considered space at each iteration
is further reduced; (iii) minimality can be checked either
Properties P1, P2, and P3 suggest a naïve but ineficient
approach for processing coverage-based queries, due to
the size of the search space and skyline computation. We
therefore improved such basic strategy under two
directions, briefly described in the following. The designed
4The proposed algorithms can be easily customized to return all
minimal solutions or a specific one, according to some further
optimality criteria.
during the skyline computation, to reduce the number
of dominance comparisons, or after the skyline has been
computed, limiting in this way the number of cardinality
estimations; (iv) the grid-based approximate approach
can be used as a filtering step, for further reducing the
space before applying one precise algorithm. The
proposed algorithms are currently under evaluation, on both
synthetic and real datasets.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Conclusions and further developments</title>
      <p>In this PhD project, we investigate the impact of
coverage constraints on data transformations, as a mean for
limiting bias in the next analytical steps. After defining
coverage-based queries, we designed and experimentally
evaluated both approximate and precise algorithms for
their processing. The proposed solutions rely on query
rewriting, a key approach for enforcing specific
nondiscrimination constraints while guaranteeing transparency
and avoiding disparate treatment discrimination.</p>
      <p>
        Future work includes the integration of the proposed
queries in a relational DBMS and the extension of the
proposed solutions to consider further nondiscrimination
constraints. To this aim, an interesting approach is to rely
on a constraint-based optimization approach for
specifying diferent types of constraints, possibly inherently
diferent, as coverage and fairness [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and determining
the best data transformation rewriting.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          et al.
          <article-title>Research directions for principles of data management</article-title>
          .
          <source>Dagstuhl Manifestos</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Accinelli</surname>
          </string-name>
          .
          <article-title>Discrimination-aware data transformations (doctoral dissertation</article-title>
          , in preparation). University of Genoa, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Accinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          , G. Guerrini, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Minisi</surname>
          </string-name>
          .
          <article-title>covRew: a Python toolkit for pre-processing pipeline rewriting ensuring coverage constraint satisfaction</article-title>
          .
          <source>In Proc. EDBT</source>
          , pages
          <fpage>698</fpage>
          -
          <lpage>701</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Accinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          , G. Guerrini, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Minisi</surname>
          </string-name>
          .
          <article-title>The impact of rewriting on coverage constraint satisfaction</article-title>
          .
          <source>In Proc. EDBT/ICDT Workshops</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Accinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          , G. Guerrini, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Minisi</surname>
          </string-name>
          .
          <article-title>A coverage-based approach to nondiscriminationaware data transformation</article-title>
          .
          <source>ACM J. Data Inf. Qual.</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Accinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Minisi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Catania</surname>
          </string-name>
          .
          <article-title>Coveragebased rewriting for data preparation</article-title>
          .
          <source>In Proc. EDBT/ICDT Workshops</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Asudeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>Assessing and remedying coverage for a given dataset</article-title>
          .
          <source>In Proc. ICDE</source>
          , pages
          <fpage>554</fpage>
          -
          <lpage>565</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Asudeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shahbazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>Identifying insuficient data coverage for ordinal continuous-valued attributes</article-title>
          .
          <source>In Proc. SIGMOD</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>141</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Barocas</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Selbst</surname>
          </string-name>
          .
          <article-title>Big data's disparate impact</article-title>
          . Calif. L. Rev.,
          <volume>104</volume>
          :
          <fpage>671</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Börzsönyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Stocker</surname>
          </string-name>
          .
          <article-title>The skyline operator</article-title>
          .
          <source>In Proc. ICDE</source>
          , pages
          <fpage>421</fpage>
          -
          <lpage>430</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Drosou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          , E. Pitoura, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          .
          <article-title>Diversity in big data: A review</article-title>
          .
          <source>Big Data</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>73</fpage>
          -
          <lpage>84</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Asudeh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>Identifying insuficient data coverage in databases with multiple relations</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>13</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2229</fpage>
          -
          <lpage>2242</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z. M.</given-names>
            <surname>Ma</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <article-title>A literature overview of fuzzy database models</article-title>
          .
          <source>J. Inf. Sci. Eng</source>
          .,
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <fpage>189</fpage>
          -
          <lpage>202</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mazilu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. W.</given-names>
            <surname>Paton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Konstantinou</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. A. A.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          .
          <article-title>Fairness in data wrangling</article-title>
          .
          <source>In Proc. of the Int. Conf. on Information Reuse and Integration for Data Science, IRI</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Salimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Bias in OLAP queries: Detection, explanation, and removal</article-title>
          .
          <source>In Proc. SIGMOD</source>
          , pages
          <fpage>1021</fpage>
          -
          <lpage>1035</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Salimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Howe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Database repair meets algorithmic fairness</article-title>
          .
          <source>SIGMOD Rec</source>
          .,
          <volume>49</volume>
          (
          <issue>1</issue>
          ):
          <fpage>34</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schelter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Khilnani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          . Fairprep:
          <article-title>Promoting data to a first-class citizen in studies on fairness-enhancing interventions</article-title>
          .
          <source>In Proc. of the Int. Conf. on Extending Database Technology, EDBT 2020</source>
          , pages
          <fpage>395</fpage>
          -
          <lpage>398</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shahbazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Asudeh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>A survey on techniques for identifying and resolving representation bias in data</article-title>
          .
          <source>CoRR, abs/2203.11852</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shetiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. P.</given-names>
            <surname>Swift</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Asudeh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Das</surname>
          </string-name>
          .
          <article-title>Fairness-aware range queries for selecting unbiased data</article-title>
          .
          <source>In Proc. ICDE</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Simonite</surname>
          </string-name>
          .
          <article-title>When it comes to gorillas, Google photos remains blind</article-title>
          . Wired, Jan.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Howe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>Responsible data management</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>13</volume>
          (
          <issue>12</issue>
          ):
          <fpage>3474</fpage>
          -
          <lpage>3488</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>Online set selection with fairness and diversity constraints</article-title>
          .
          <source>In Proc. EDBT</source>
          , pages
          <fpage>241</fpage>
          -
          <lpage>252</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zehlike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          .
          <article-title>Fairness in ranking, part I: score-based ranking</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>55</volume>
          (
          <issue>6</issue>
          ):
          <volume>118</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>118</lpage>
          :
          <fpage>36</fpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>