<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ProxiMix: Enhancing Fairness with Proximity Samples in Subgroups</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jingyu Hu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Hong</string-name>
          <email>jun.hong@uwe.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mengnan Du</string-name>
          <email>mengnan.du@njit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>WeiruLiu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Group Fairness</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bias Mitigations</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mixup</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data Augmentation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>New Jersey Institute of Technology</institution>
          ,
          <addr-line>323 Dr Martin Luther King Jr Blvd, Newark</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bristol</institution>
          ,
          <addr-line>Beacon House, Queens Rd, Bristol</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of the West of England</institution>
          ,
          <addr-line>Coldharbour Ln, Stoke Giford, Bristol</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many bias mitigation methods have been developed for addressing fairness issues in machine learning. We have found that using linear mixup alone, a data augmentation technique, for bias mitigation, can still retain biases present in dataset labels. Research presented in this paper aims to address this issue by proposing a novel pre-processing strategy in which both an existing mixup method and our new bias mitigation algorithm can be utilized to improve the generation of labels of augmented samples, hence being proximity aware. Specifically, we propose ProxiMix which keeps both pairwise and proximity relationships for fairer data augmentation. We have conducted thorough experiments with three datasets, three ML models, and diferent hyperparameters settings. Our experimental results show the efectiveness of ProxiMix from both fairness of predictions and fairness of recourse perspectives.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>To bridge the research gap, in this work, we propose ProxiMix to address the issue of biased
labels in pre-processing for bias mitigation. Motivated by the relabeling the discrimination
method [12], which assigns labels to instances based on their K-nearest neighbors to ensure
that similar individuals have similar labels, our proposed approach adds proximity samples for
re-auditing mixed labels to mitigate potential bias in mixup. The intuition is that compared
with focusing on pairwise labels, considering the labels of proximity samples as latent label
relationships can reduce the probability of generating biased labels. We have conducted
experiments to compare the existing pairwise mixup with the proposed proximity-aware mixup
on multiple models and datasets. The results show that our ProxiMix achieves higher fairness,
particularly when the original labels in the dataset are highly biased.</p>
      <p>
        Our main contributions can be summarised as follows: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) We propose a new bias mitigation
algorithm to address the label bias retainment issue in the current mixup method; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Subgroup
preference analysis: we explore how diferent subgroups perform during the sampling process;
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Trade-of analysis: we explore the tradeof between using our proximity-based strategy
and the traditional mixup; (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) Validation: we validate the efectiveness of our method using
prediction-based metrics and the cost of counterfactual explanations from an XAI perspective.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>The fairness problem can be divided into individual and group levels. Individual fairness
measures the bias by checking if similar predictions can be made for similar individuals. Group
fairness compares the treatments of fairness in unprivileged and privileged groups. Fairness
is achieved when the treatments are equal between groups. Prediction-based fairness and
recourse-based fairness are two perspectives for evaluating model fairness. In this paper, we
focus on group fairness in machine learning.</p>
      <p>Fairness of Prediction Outcomes Most fairness metrics are based on predicted outcomes.
Demographic Parity (DP)1[3] based metrics use predicted outcomes to assess whether diferent
demographic groups are equally favored by the model. It aims at having equal proportions of
positive outcomes across subgroups. The DP diference between groups is called Statistical
Parity Diference (SP), and DP ratio between groups is called Disparate Impact (DI). In addition to
depending on predictions only, there are some fairness met1r4ic]st[hat consider both predicted
and actual outcomes. Equality of Opportunity (EO) measures the True Positive Rate (TPR) of
subgroups. Equalized odds (Eodds) compares both True Positive Rate (TPR) and False Positive
Rate (FPR) of each groups.</p>
      <p>Fairness of Recourse Another recent research trend is to apply Explainable Artificial
Intelligence (XAI) methods to address fairness issues. One of the key components in this area
is counterfactual explanation (CE), sometimes also called as algorithm recourse. CE focuses
on explaining why a particular outcome occurred instead of an alternative plausible outcome.
[15, 16]. Recourse refers to identifying the closest counterfactuals that could alter the result
with minimal feature changes. Several algorithms have been developed to generate such
counterfactual explanations for machine learning mod1e7l,s1[8, 19]. The concept of fairness
of recourse are proposed b2y0[] and defined as the disparity of the mean cost to achieve the
desirable recourse among the unprivileged subgroup6s,.2[1] propose metrics based on the cost
of counterfactual explanation to measure fairness performance across subgroups. Predictive
Counterfactual Fairness (PreCo2F2)][ utilises CEs to detect the underlying patterns for the
discrimination in the model.</p>
      <p>Bias Mitigation Methods Bias mitigation methods can be categorized into three stages:
pre-processing, in-processing, and post-processin8g,2[3]. Pre-processing mitigations aim
to reduce bias by modifying and creating a fairer training da2t4a,s2e5t, 2[6]. In-processing
mitigation occurs during training by adding regularization terms and constraints to models
[11, 27]. Mitigations in the post-processing stage like calibration are applied after a model has
been successfully trained2[1, 28]. Both pre-processing and post-processing-based methods are
model-agnostic as they occur before and after the model training.</p>
      <p>Over-sampling in the pre-processing stage refers to changing the distribution of the training
dataset by adding more samples. Duplicating instances of the unprivileged group is one
straightforward strate2g9y,3[0]. [31, 32] generate synthetic samples around the unprivileged
group to mitigate bias. MixS G10[] takes both the privileged and unprivileged groups into
consideration when synthesizing new data using mixup, but the potential bias in generated
labels has not been discussed yet.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Preliminaries and Problem Statement</title>
      <p>Notations Given the datas et=
{( ,  ,</p>
      <p>)}=1 with samples, where is a set of features
space, and each featur ein  has a set of values in  , label  ∈  ∶= {0, 1}
attribu te∈  ∶= {0, 1}
. The dataset is divided into trainin g s et and test se t . We use
, and a sensitive
 
to fit a classifier model  ∶  → 
and</p>
      <p>to assess the model’s prediction and fairness
performance. Fairness is measured by the model’s performance on the diference between
subgroups identified by . We define the unprivileged/minority group when Z=0, and Z=1 is
the privileged/majority group.</p>
      <p>
        Mixup Strategy in Fairness Mixup [9] is a data augmentation technique that involves
blending pairs of samples to create new synthetic training examples. The premise of mixup is
that linear combinations of features will result in the same linear combinations of target labels.
a new sample(̃ , ̃)̃ , with random parametersdrawn from a Beta distribution.
Thus, mixup applies stochastic linear combinations to samp0le(s0,  0),  1( 1,  1)to generate
 =̃  ∗  0 + (1 − ) ∗  1,
 =̃  ∗  0 + (1 − ) ∗  1,
where 0,  1 are input vectors
where 0,  1 are target labels
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>To address fairness concerns, previous research has explored the practice of sa m0palinndg
 1 from diferent subgroups, applying this step to both pre-processing stage like mi1x0S]G [
and in-processing stage like fairMixu1p1][ as bias mitigation methods.</p>
      <p>Bias Persist After Mixup</p>
      <p>The premise of mixup lies in the linear relationship between
features and labels. The challenge here is if the original labels in the dataset are biased, the
labels of mixed samples can retain this bias. The newly generated biased samples can impact
the fairness of the trained model.</p>
      <p>r is considered as the sensitive attrib,udtieviding the data
into subgroups. Here, we consider the female subgroup as unprivileged.</p>
      <p>The table shows individual features of male sample1s ( and2 ) and the female sample 2( )
are remarkably similar (Oficer with simila 
r
and
), but with diferent income
labels. This shows initial bias that female and male groups are treated unequally.</p>
      <p>We follow the mixSG method to select one sample from one subgroup and another from
the other subgroup to genera,t̃ ẽ )(. Assume we have chosen one sample 2 from the female
subgroup, 2
will be randomly paired with eith1er or2
from the male subgroup. If the
mixture ratio of the female sample 2 is over 50%, we say the mixed sample
is female.</p>
      <p>Otherwise,</p>
      <p>is male.</p>
      <p>When the random = 0.8,  
will be a female sample. And the labe l
of the mixed female
sample will primarily depend on the label from fema 2le , meaning that both combinations 2of
with1
or2</p>
      <p>will have a high probability of low income≤(50 ). Though individual features
of high-income men (M1 and M2) and low-income women (F2) are remarkably similar (Oficer
with similar capital gain and age), mixed label still indicates a tendency toward lower incomes
for female. If = 0.2, the mixed sample will be most depend on the label from the male sample
and the generated sample becomes male with high income. The labels of mixed samples are
heavily influenced by gender. Considering the initial bias in the dataset, new samples generated
by mixup can deepen gender bias against unprivileged groups, causing the model to be more
likely to predict male samples as high-income and female samples as low-income under similar
conditions.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Methodology and Experiment Design</title>
      <p>for improvments. It synthesizesnew = {( ,  ,</p>
      <p>)}=1 from  
To address the issue of possible biased label for mix-up, we proposed a method called ProxiMix
with the consideration of both
pairwise and proximity samples, to reduce dataset bias. Fitting the model with fairer dataset
′
 
=</p>
      <p>∪  new is expected to improve its fairness performance.
": sample "
!: sample !</p>
      <p>Samples after
Case 1
Case 2
Case 3
0
0
1
1
0
1
1</p>
      <p>0
1
0
0
1
0
0
0
1
1
0</p>
      <p>Sample from Mixup
Sample from ProxiMix
Sample from Mixup
Sample from ProxiMix
Sample from Mixup
Sample from ProxiMix</p>
      <sec id="sec-5-1">
        <title>4.1. ProxiMix Algorithm</title>
        <p>
          The Importance of Proximity Awareness Given a sample  0 from group train( = 0) ,
and another sample1 from  train( = 1) , the proximity samples set o1f is defined as   =
{  0,   1, ...,    }. The label value of each sample can be eithe0ror1. We illustrate three cases
when mixing up two samples 0 and 1: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Case 1: Labels of 0,  1 and all of their proximity
samples are the same.(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Case 2: Labels of 0 and  1 are the same, but there exist diferent
labels among proximity samples  . (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Case 3: Labels of 0 and  1 are diferent. Figure1
presents these three cases.
        </p>
        <p>In Case 1, linear mixing and proximity yield the same results because there are no impurities
between the two samples. In Case 2, both sample0sand 1 have the same label. This implies
that direct mixing will result in all labels becom0inrgegardless of the mixing ratio. This
approach ignores the samples fro1min between and can potentially introduce bias when
predicting subgroups with t1helabel. In Case 3, the mixed label depends on the mixing rate
when using mixup directly. Specifically, the mixed label becomes1 when the mixing rate
exceeds 0.5. However, we can see in the example that the majority of the proximity s a mples
between0 and1 belong to0. It suggests that the probability of being classified0 ashsould be
higher. Considering the proportion of proximity labels can enhance the probability of being
classified as 0.</p>
        <p>ProxiMix Algorithm Design ProxiMix consists of two parts: we first introduce
proximitybased mixed label  and then combine  with  from the existing mixup1[0] using
d-adjusted balancing degree.</p>
        <p>As discussed above, the current mixup approach does not account for potential biases in
labels. Our proposal aims to determine the mixed label by considering the proportions of labels
in proximity samples. Specifically, when mixing two samples, 0 and  1, we calculate their
Euclidean distance with their one-hot encoded fea1t,udreensoted as  = || 0 −  1||, to measure
their proximity. Then, we select all the samples that are withi n thdeistance from 0 to form
a potential proximity samples set ProxiSet. The final mixed label f0oarnd 1 is assigned based
on the label with the larger proportion within0t∪he .</p>
        <p>Let’s look back at the toy examp le=: { 2, 1, 2 } when we want to mi x2 with
either1 or2 . Two-thirds of the labels in t h e is high income, so that the
proximitybased mixed  is high income.</p>
        <p>We combine our proximity-based with  from the current mixup to form the new
definition of mixed  ̃, achieved by calculating∗   + (1 − ) ∗   , where is a balancing
degree between 0 and 1. The algorithm pseudocode is described in Algorithm 1.
Algorithm 1 ProxiMix Algorithm</p>
        <p>Input  0( 0,  0,  0) ∼  train( = 0),  1( 1,  1,  1) ∼  train( = 1)
procedure ProxiMix( 0,  1,  train, )
procedure Proximity-Based-Mixed( 0,  1,  train)
 = [] .
  = || 0 −  1||
for each sample  (  ,   ,   )in  train( = 1) do
  = ||  −  0||
if   ≤   then</p>
        <p>Add   to .</p>
        <p>end if
end for
  = 
0 ∪ 
_( ∈  )/( )
  = 
end procedure
procedure Lambda-Based-Mix( 0,  1)
 = Beta(, )
  =  ∗  0 + (1 − ) ∗  1
end procedure
 =̃  ∗   + (1 − ) ∗   ,  ∈ [0, 1]</p>
        <p>Return  ̃
end procedure
1scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html</p>
        <p>!
"</p>
        <p>Fig. 2 shows an example of how ProxiMix works. Samples are categorized into two subgroups,
green and blue, based on their colors. The shape of each sample represents its label: circles
for label 0, and plus-signs for label 1. Specifically, the green circle0)( and the blue plus-sign
( 1) are two samples selected for ProxiMix. The new label of the mixed samples changes with
diferent values of the balancing paramete.rThe varying shades of blue samples represent
the impact degree o f , while the thickness of the red lines betw ee0nand  1 represents
the strength of . The black line indicates no consideration  fo.rFor = 1 , it employs the
original mixup  ; for = 0 , it utilizes our proximity-bas ed exclusively; and it combines
the two for values in between. We will discuss how difere ntimpact the model performance
in Section5.2.</p>
        <p>
          Accelerating Calculation of ProxiMix in Practice Our core idea is to introduce proximity
samples’ label set as a reference when performing label mixup. To enhance
computational eficiency, we find  ifrst in practice. Our implementation is as follows: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Given
a randomly selected sample0 from   ( = ) , we first find its  from   ( = ¬) .
 contains samples that are proximal t0o; (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Then, we treat each sample in
as  1 and sequentially mix it wit h0, following the ‘furthest-first’ rule. It means the mixing
begins with the sample i n that is furthest fr om0. After each mix, we remove the used
sample from  ; (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Repeat this proce s/s times until the desire d new samples are
generated. The generated samples are merge d to as training samples for classification
model.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Experiment Setting</title>
        <p>
          Fig. 3 presents the overall workflow of our experiment. The parameter balancing degirneoeur
mixup algorithm is tested with values ranging from 0 to 1, in increments of 0.1. The proximity
samples for each round are set to 25. we consider proximity when there are at least 5 neighbors
to ensure credibility. The mixing ratiios randomly generated from the Beta(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          ) distribution.
        </p>
        <p>
          Datasets The experiment is conducted on three datasets for classification problems: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Adult
income dataset3[3]: predicting whether a person’s annual income exceeds 50K
(high/lowincome); (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Law school dataset34[]: predicting whether a person’s in law school will fail/pass
the exam; (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Credit default datas3e5t]:[predicting whether a person’s credit payment will be
on-time/overdue.
        </p>
        <p>Models Three models including logistic regression (LogReg), decision trees (DT) and
multiDataset (Adult/Credit/Law)</p>
        <p>Train Dtrain Test Dtest
❶ProxiMix</p>
        <p>Train Dtrain’
❸Predict
❷Train</p>
        <p>Model
LogReg
DTree
MLP
❹Eval</p>
        <p>Metrics
Prediction
Fairness
CF cost
layer perceptron (MLP) are tested. All implementations are based on sciki2t.-Tlehaermnaximum
depth is 7 in the decision tree. We use a three-layer MLP with 128 neurons in the ith hidden
layer, ‘rule’ as the activation function, and a maximum of 1500 iterations. The random seed is
set to 42 for reproducible results.</p>
        <p>Metrics Prediction performance metrics are based on True Positive (TP), False Positive
(FP),False Negative (FN),True Negative (TN) in the confusion matrix. The equations of Precision,
Recall, and F1-score are as follows. Recall is also called True Positive Rate.
is the distance betwe e n and   . In this way, we can compute the counterfactual cost for
each sample in datase t. The average costs of counterfactuals across diferent groups can be
considered as a measure of fairness: with the cost gap between groups (e.g., females and males)
increasing, the model’s unfairness also grows. Our evaluation follows the implementation of
counterfactual explanation cost pack4,aagned specifically, we opt counterfactual explanations
cost without constraints as metrics.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results</title>
      <p>In section5.1, we fix the balancing degree of ProxiMix and examined the impact of diferent
sampling modes for subgroups on the outcomes. In secti5o.n2, we fix the sampling mode and
explore the impact of diferent balancing degre eson the results. To ensure the consistency
of findings, Section 5.3 assesses the efectiveness of ProxiMix from the counterfactual cost
perspective.
5.1. Sampling Mode Preferences in ProxiMix with Fixed Balancing Degree
ProxiMix is built on the mixup concept, which involves continuously selecting and mixing two
samples to generate new data. To identify which combinations of samples had a more positive
impact on the model’s performance, we divide the dataset into diferent subgroups and sample
from them.</p>
      <p>There are four subgroups with considerations on both labels and values of a single sensitive
feature i n . The first sample selected from each grou p  ( = ,  = ) is notated a1s , 3 ,
the second sample selected from the subgroup ( = )̄ which has the opposite sensitive
label is notated a1s ′,2 ′,3 ′,4 ′, respectively. In Table2, 1 is sampled from&lt;female,
lowincome&gt; subgroup in the Adult dataset, from &lt;thfeemale, failed&gt; subgroup in Law dataset,
and from the&lt;female, on-time&gt; subgroup from the Credit dataset respectiv1el′yr.efers to
the sample selected from the male group in the adult, law and credit datasets. All sampling
combinations are listed in Tab2l.eWe denote the sample derived from ProxiMix with diferent
sampling combination modes as  ⊙   , where  ∈ {1, 2, 3, 4},   ∈ {1 ′, 3 ′}.</p>
      <p>Table 3 presents models performance using ProxiMix under four sampling combinations
  ⊙   and compares it with performance without any augmentation (baseline).
4github.com/HammerLabML/ModelAgnosticGroupFairnessCounterfactuals/</p>
      <p>In the adult dataset, we found that diferent subgroup sampling combinations have diferent
impacts on ProxiMix performance. T h2e⊙ 1 ′ (augmenting high-income female) significantly
improves the fairness performance of both decision tree and logistic regression models. In
contrast 1, ⊙ 1 ′(augmenting low-income female) degrades the fairness of both models,
suggesting it introduces extra bias to the underrepresented group. This implies that focusing
on underrepresented labels in the unprivileged group when generating samples (such as high
income) can greatly improve fairness performance.</p>
      <p>In the Law dataset, nearly all mixup methods enhance model prediction performance, but
only marginally improve fairness. This is because fairness performance DP% without any
augmentation already exceeds 90%, indicating the minimal bias in the model. Therefore, the
improvement potential is limited.</p>
      <p>Overall, ProxiMix enhances fairness when a model displays significant bias. Also, the choice
of the subgroup for sampling during mixup is important: some enhance fairness, while others
can even worsen it.</p>
      <sec id="sec-6-1">
        <title>5.2. The Impact of Balancing Degree in ProxiMix</title>
        <p>In the above section we have discussed the diferent sampling strategies with a balanced mixup
( = 0.5 ). This section explores how diferent in ProxiMix can impact model performance.
Here, we fix strategy  ⊙   while changing balance degree.</p>
        <p>Fig. 4 illustrates the impact of data augmentation on model fairness in the Credit dataset, under
1 ⊙ 1 ′ and3 ⊙ 3 ′ strategies, with diferent degre.eThe trend shows most combinations
positively afect a model fairness, with an optim atlhat maximizes fairness improvements. The
best performance is achieved ad=t0.7 for the1 ⊙ 1 ′ strategy, while fo3r⊙ 3 ′, the optimal
performance is reached adt=0.2.</p>
        <p>Similar patterns are observed in the adult datase5t): (tFhige. impact of diferent values o f
on the model also shows a trend. Specifically, data generated with2t⊙h e2 ′ strategy shows
the better improvement in model fairness w hernanges from0.2 to0.5.</p>
        <p>We noticed the best fairness DP% and Eodds% occur s=a1t under4 ⊙ 3 ′. However, both
TPR of female and male groups decline whe n exceeds0.5. [36] mentions a similar scenario
and suggests to consider both relative and absolute values in fairness performance. To have
a further investigation of their performance in absolute values,4Tparbelseents the model’s
Fairness Performance on Credit Dataset (C1 ⊙ C1’)</p>
        <p>Fairness Performance on Credit Dataset (C3 ⊙ C3’)
performance across diferent subgroups. We can see the model trained with data augmentation
in the0 to0.5 range, although having lower fairness metrics compared=to1 , shows an
absolute improvement in model performance. Therefore, we conclude the optimal balancing
for4 ⊙ 3 ′ strategy i0s.2.
Counterfactual explanations cost comparison on the Adult dataset with Decision Tree across female(F)
and male(M) subgroups with diferent balancing degree  = [0, 0.5, 1] .</p>
        <p>1 ⊙ 1 ′
male(M) subgroups with diferent balancing degree</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.3. Counterfactual Cost across Diferent Groups</title>
        <p>We now evaluate the efectiveness of our algorithm from the XAI perspective, and the results
are consistent with the above observations. First, we calculate the average (avg) and standard
deviation (std) of the counterfactual cost across female (F) and male (M) subgroups. Then, we
compare the cost gaps between the two groups. A smaller gap indicates fairer counterfactual
explanations within diferent groups. In the Adult dat a2s⊙et,1
′ remains to show more
significant bias mitigation performance. In the Law school dataset, as we have disscussed above,
the improvment is limited because the bias in the original dataset is not significant.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>This paper proposes a new debiasing algorithm called ProxiMix. It extends the mixup technique
by considering labels from proximity samples in the subgroup to mitigate potential bias in the
preprocessing stage. Our experiments evaluated the performance of ProxiMix with diferent
sampling combinations and balancing degrees. The results prove that adding proximity-based
labels improves fairness performance, and there exists optimal balancing degree for achieving the
most significant enhancement. These observations were further supported by the experimental
results on the cost comparison of counterfactual explanations. In future work, we plan to extent
ProxiMix to multi-class tasks and consider intersectional fairness.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work is funded by Doctoral Training Partnership Studentship of Engineering and Physical
Sciences Research Council (EPSRC-DTP, EP/W524414/1/2894964).
of the stanford heuristic programming project (the Addison-Wesley series in artificial
intelligence), Addison-Wesley Longman Publishing Co., Inc., 1984.
[16] S. Gregor, I. Benbasat, Explanations from intelligent systems: Theoretical foundations and
implications for practice, MIS quarterly (1999) 497–530.
[17] R. K. Mothilal, A. Sharma, C. Tan, Explaining machine learning classifiers through
diverse counterfactual explanations, in: Proceedings of the 2020 conference on fairness,
accountability, and transparency, 2020, pp. 607–617.
[18] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the
black box: Automated decisions and the gdpr, Harv. JL Tech. 31 (2017) 841.
[19] D. Brughmans, P. Leyman, D. Martens, Nice: an algorithm for nearest instance
counterfactual explanations, Data Mining and Knowledge Discovery (2023) 1–39.
[20] V. Gupta, P. Nokhiz, C. D. Roy, S. Venkatasubramanian, Equalizing recourse across groups,
arXiv preprint arXiv:1909.03166 (2019).
[21] A. Artelt, B. Hammer, Explain it in the same way!–model-agnostic group fairness of
counterfactual explanations, arXiv preprint arXiv:2211.14858 (2022).
[22] S. Goethals, D. Martens, T. Calders, Precof: counterfactual explanations for fairness,</p>
      <p>Machine Learning (2023) 1–32.
[23] S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton,
D. Roth, A comparative study of fairness-enhancing interventions in machine learning,
in: Proceedings of the conference on fairness, accountability, and transparency, 2019, pp.
329–338.
[24] F. Kamiran, T. Calders, Classifying without discriminating, in: 2009 2nd international
conference on computer, control and communication, IEEE, 2009, pp. 1–6.
[25] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, S. Venkatasubramanian, Certifying
and removing disparate impact, in: proceedings of the 21th ACM SIGKDD international
conference on knowledge discovery and data mining, 2015, pp. 259–268.
[26] H. Sun, K. Wu, T. Wang, W. H. Wang, Towards fair and robust classification, in: 2022 IEEE
7th European Symposium on Security and Privacy (EuroS&amp;P), IEEE, 2022, pp. 356–376.
[27] F. Kamiran, T. Calders, M. Pechenizkiy, Discrimination aware decision tree learning, in:
2010 IEEE international conference on data mining, IEEE, 2010, pp. 869–874.
[28] G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, K. Q. Weinberger, On fairness and calibration,</p>
      <p>Advances in neural information processing systems 30 (2017).
[29] J. J. Amend, S. Spurlock, Improving machine learning fairness with sampling and
adversarial learning, J. Comput. Sci. Coll 36 (2021) 14–23.
[30] A. Morano, Bias mitigation for automated decision making systems, Politecnico di Torino
(2020).
[31] D. Dablain, B. Krawczyk, N. Chawla, Towards a holistic view of bias in machine learning:
Bridging algorithmic fairness and imbalanced learning, arXiv preprint arXiv:2207.06084
(2022).
[32] J. Chakraborty, S. Majumder, T. Menzies, Bias in machine learning software: Why? how?
what to do?, CoRR (2021).
[33] R. Kohavi, et al., Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid.,
in: Kdd, volume 96, 1996, pp. 202–207.
[34] K. Xivuri, H. Twinomurinzi, A systematic review of fairness in artificial intelligence
algorithms, in: Responsible AI and Analytics for an Ethical and Inclusive Digitized Society:
20th IFIP WG 6.11 Conference on e-Business, e-Services and e-Society, I3E 2021, Galway,
Ireland, September 1–3, 2021, Proceedings 20, Springer, 2021, pp. 271–284.
[35] I.-C. Yeh, C.-h. Lien, The comparisons of data mining techniques for the predictive accuracy
of probability of default of credit card clients, Expert systems with applications 36 (2009)
2473–2480.
[36] G. Maheshwari, A. Bellet, P. Denis, M. Keller, Fair without leveling down: A new
intersectional fairness definition, in: EMNLP 2023-The 2023 Conference on Empirical Methods in
Natural Language Processing, 2023.
[37] T. Le Quy, A. Roy, V. Iosifidis, W. Zhang, E. Ntoutsi, A survey on datasets for
fairnessaware machine learning, Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery 12 (2022) e1452.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Appendices: Dataset Description</title>
      <sec id="sec-9-1">
        <title>A.1. Adult Income Dataset</title>
        <p>The Adult Income dataset is also known as the Census Income dataset. Its documen5tation
provides a detailed description of 14 features in the dataset. We omitted some features, such as
‘fnlwgt’, and the final features we used after data cleaning are as follows.
5https://www.cs.toronto.edu/~delve/data/adult/adultDetail.html
A.2. Law School Dataset
The Law School dataset contains admission records for law schools. We followed the description
provided in 3[7] and the data cleaning pipeline i2n1[], extracting the following features for the
experiment.</p>
      </sec>
      <sec id="sec-9-2">
        <title>A.3. Credit Default Dataset</title>
        <p>The Credit Default dataset, also known as the credit card clients dataset, explores default
payments on credit cards. Followings are the features and descriptions.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>B. Appendices: Results</title>
      <p>B.1. ProxiMix in Credit Default Dataset with MLP model
B.2. ProxiMix in Adult Income Dataset with MLP model</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Osoba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Welser</surname>
          </string-name>
          <string-name>
            <surname>IV</surname>
          </string-name>
          , W. Welser,
          <article-title>An intelligence in our image: The risks of bias and errors in artificial intelligence</article-title>
          ,
          <source>Rand Corporation</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Burkart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <article-title>A survey on the explainability of supervised machine learning</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>70</volume>
          (
          <year>2021</year>
          )
          <fpage>245</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Arrieta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bennetot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barbado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benjamins</surname>
          </string-name>
          , et al.,
          <article-title>Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai</article-title>
          ,
          <source>Information fusion 58</source>
          (
          <year>2020</year>
          )
          <fpage>82</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabbatini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agiollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>Symbolic knowledge extraction and injection with sub-symbolic predictors: A systematic literature review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I. D.</given-names>
            <surname>Raji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buolamwini</surname>
          </string-name>
          ,
          <article-title>Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products</article-title>
          ,
          <source>in: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>429</fpage>
          -
          <lpage>435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kavouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tsopelas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sacharidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Psaroudaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Theologitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rontogiannis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fotakis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Emiris</surname>
          </string-name>
          ,
          <article-title>Fairness aware counterfactuals for subgroups</article-title>
          ,
          <source>in: Thirty-seventh Conference on Neural Information Processing Systems</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Gohar</surname>
          </string-name>
          , L. Cheng,
          <article-title>A survey on intersectional fairness in machine learning: Notions, mitigation, and challenges</article-title>
          ,
          <source>arXiv preprint arXiv:2305.06969</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harman</surname>
          </string-name>
          ,
          <article-title>Bia mitigation for machine learning classifiers: A comprehensive survey</article-title>
          ,
          <source>arXiv preprint arXiv:2207.07068</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cisse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. N.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Lopez-Paz, mixup: Beyond empirical risk minimization</article-title>
          ,
          <source>arXiv preprint arXiv:1710.09412</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Navarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Little</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Segarra</surname>
          </string-name>
          ,
          <article-title>Data augmentation via subgroup mixup for improving fairness</article-title>
          ,
          <source>arXiv preprint arXiv:2309.07110</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>C.-Y. Chuang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Mroueh</surname>
          </string-name>
          , Fair mixup: Fairness via interpolation,
          <source>arXiv preprint arXiv:2103.06503</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <article-title>K-nn as an implementation of situation testing for discrimination discovery and prevention</article-title>
          ,
          <source>in: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>502</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. B. Zafar</surname>
            , I. Valera,
            <given-names>M. Gomez</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>K. P.</given-names>
          </string-name>
          <string-name>
            <surname>Gummadi</surname>
          </string-name>
          ,
          <article-title>Fairness beyond disparate treatment &amp; disparate impact: Learning classification without disparate mistreatment</article-title>
          ,
          <source>in: Proceedings of the 26th international conference on world wide web</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1171</fpage>
          -
          <lpage>1180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Srebro</surname>
          </string-name>
          ,
          <article-title>Equality of opportunity in supervised learning</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>29</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Buchanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Shortlife</surname>
          </string-name>
          ,
          <article-title>Rule based expert systems: the mycin experiments</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>