<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysing the Impact of Data Distribution Shifts on Model Fairness in Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federico Motta</string-name>
          <email>federico.motta@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yijie Li</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huiping Chen</string-name>
          <email>h.chen.13@bham.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Mandreoli</string-name>
          <email>federica.mandreoli@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Missier</string-name>
          <email>p.missier@bham.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Physics</institution>
          ,
          <addr-line>Informatics and Mathematics</addr-line>
          ,
          <institution>University of Modena and Reggio Emilia</institution>
          ,
          <addr-line>Via Campi 213/B, 41125, Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, University of Birmingham</institution>
          ,
          <addr-line>Edgbaston, Birmingham, B15 2TT</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>As decision-making processes are increasingly automated by the deployment of machine learning techniques, being able to a priori ensure the fairness of these models has become a concern of paramount importance. This is especially true in high-stake domains, where data are often provided as they are, i.e., without any additional insight from domain experts about the presence of possible sensitive attributes, such as gender, nationality, or religion. At the same time, data distribution may evolve over time, i.e., lead to drifts, potentially harmful for the performance of the deployed models. Thus, the interplay between: (i) proactively monitoring for degradation in accuracy and promptly retraining the models, as well as (ii) being able to grant fairness regardless of the possible bias within the data has further entangled this already tricky challenge. In this paper, we present and analyse a synthetically generated example of data distribution shift afecting the model performance, including its fairness. Then, we show how only focusing on singularly addressing either the accuracy drop, rather than the introduced bias, cannot completely solve the issues. In conclusion, we hint at the need for a holistic approach to mitigate both the problems as a possible research direction in this field.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Drift</kwd>
        <kwd>Distribution Shifts</kwd>
        <kwd>Model Performance</kwd>
        <kwd>Fairness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The growing spread of Machine Learning (ML) models across diferent sectors, is having a huge impact
on people’s lives. However, the goodness of these models is tightly related to the quality of the data
used to train them, thus, producing just an accurate model may not always be enough. For example,
whenever operating in scenarios where the data contain sensitive attributes, such as healthcare [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or
lending [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is of paramount importance to also ensure that the produced models are fair [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], i.e.,
do not discriminate or perpetuate any form of bias which may be present in the data.
      </p>
      <p>
        At the same time, real-world data rarely are static and the characteristics of the given input data
may change over time, due to natural changes in the observed/sampled population, changes in the data
collection procedures [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or in the treatment protocols of a disease [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Thus, many ML models once
built for a specific task are then deployed and fed with new data, potentially diferent from the samples
seen during the training phase. These so-called data shifts, if unnoticed can generate concept drifts and
degrade the model performance, both in terms of accuracy and fairness. Therefore, taking measures to
detect them is crucial to preserve the models fair and accurate even in dynamic scenarios.
      </p>
      <p>In this paper we provide background on the key concepts of data shift and model fairness, and then
illustrate, through a simple example, the problem of preserving fairness in the presence of data shift,
and how fairness and performance stand in contrast to each other in such dynamic scenarios.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and related work</title>
      <p>The training of machine learning models relies on the assumption that input data are independent and
identically distributed (i.i.d.). However, the dynamism of real-world environments can cause unexpected
changes in data distributions, which might lead to a degradation of the model performance or a mutation
in the fairness of the ML-model outputs. Moreover, the absence of a clear and unified terminology,
about the various types of changes in the input data, further entangles this challenge; especially in the
already intricate Big Data scenario where data distributions are often unknown.</p>
      <sec id="sec-2-1">
        <title>2.1. Data distribution shifts</title>
        <p>
          Given a target variable and a model attempting to predict it, e.g., over time, the term concept drift
refers to any alteration of the statistical properties of that variable [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Diferent types of concept drifts
exist [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]: covariate shifts [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] concern changes in distribution between the training data and the test
data, while maintaining the conditional probability distribution of the target given the input; depending
on the efect this has on the target variable, we have virtual and real drifts [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], where the former
don’t afect the concept, whilst the latter do. Prior-probability shifts (sometimes referred to as label
or target shifts) [
          <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
          ] are the opposite of covariate shifts, namely the conditional probability
distributions of the outcome given the input are coherent across the training and testing phases, but the
target probability distributions change. Moreover, depending on the extent of the change in the data
distribution, we may have local drifts [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], i.e., only occurring in some regions of the instance space
and not at a dataset level; or just the addition of new attributes/target classes, such as in the case of
feature/concept evolution [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] or their disappearance (concept deletion) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Time-wise, concept drifts are instead classified depending on the pattern followed by their arrival [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]:
sudden drifts [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] occur when the transition from a concept to another is abrupt in time, gradual drifts [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
happen when the target distribution undergoes progressive transformations, recurring drifts [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] are
like gradual ones but they characterised by a periodic transition which evenly phases in/out the new/old
concepts, and finally, incremental drifts [ 20] are about the replacement of the old concept in a slow and
continuous manner (with respect to the gradual drift they don’t have a clear boundary separating the
two concepts, but rather a fading window).
        </p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Drift detection methods</title>
          <p>
            Being able to promptly detect concept drifts is crucially important to keep stable machine learning
models’ performance in dynamic environments where the data distributions of the underlying data
evolve over time [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. Although many supervised methods capable of detecting concept drift exist in
literature [21], they are often impractical to use because of their reliance on the availability of true
labels, requiring instead unsupervised techniques. A taxonomy of the possible categories has been
proposed to reflect use case scenarios [ 22]. For streaming data, online-based approaches use a reference
ifxed [ 23, 24] or sliding [25, 26] window to check for drift at every arriving instance. Batch-based
approaches accumulate instances and use them all, as in whole-batch approaches [27, 28], or only a subset
of them, like in partial-batch approaches [29, 30] to detect the drift. DriftLens [31] is an unsupervised
drift detection method that applies specifically to deep learning models and real-time applications. Its
execution consists of an ofline phase, where the baseline distributions and thresholds are estimated,
and an online phase, where new data streams are analysed using fixed-size windows. Moreover, in each
window per-batch and per-label distributions are computed and compared with the baseline; drifts are
predicted whenever the distances between the distributions exceed the thresholds.
          </p>
          <p>The majority of the methods mentioned above rely on distances [32] between data distributions to
quantify the severity of drifts, including: the Kullback-Leibler (KL) divergence (a measure of relative
entropy), the Hellinger distance (a symmetric version of the KL-divergence), the Jensen-Shannon
divergence (a smoothed version of the KL-divergence), the Wasserstein-1-distance, the Frechét Inception
Distance [31] (between normal distributions), but also techniques intersecting neighbourhoods, or
leveraging statistical tests such as the Maximum Mean Discrepancy, the Kolmogorov-Smirnov test
(which allow incremental sampling over time) and Hoefding’s inequality-based bound identification [ 33]
(for independent and bounded random variables).</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Approaches to handle drifts</title>
          <p>
            The simplest reaction to drift is to retrain the entire model. This however is ineficient, and more
practical methods have been developed [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ].
          </p>
          <p>
            Instance selection This technique selects only relevant instances based on the currently learned
target, whilst filtering out irrelevant, noisy or redundant samples. Mostly adopted in online learning
contexts, these methods use a sliding window to define a short-term memory and sample relevant
instances [
            <xref ref-type="bibr" rid="ref10">10, 27</xref>
            ]. However, they are vulnerable to local or recurrent drifts, when the fixed window
size is shorter than the drift transition time [34].
          </p>
          <p>
            Instance weighting This approach leverages the capability of some models, such as Support Vector
Machines, to weight instances accordingly with their age or relevance [34]; allowing to gradually shift
the focus, as the target evolves, e.g., using age as a proxy for an exponentially decaying memory. At the
same time, this technique is unfortunately more prone to over-fitting [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], hinting that more complex
instance selection heuristics, like those maintaining the window-size dynamic, usually perform better.
Ensemble learning This method leverages a set of models, built over diferent time periods. The
key to keep the focus on the current actual concept is combining several sub-models’ predictions. For
example, tuning the low/high diversity of the newly trained ensembles with the respect to the type
of detected drift, may outperform models trained from scratch after the drift as already occurred [35].
In [
            <xref ref-type="bibr" rid="ref16">16, 33</xref>
            ] is instead extended the instance weighting approach, with simpler (e.g. binary) classifiers
specialised on each class. These are later combined with aggregation techniques like majority/weighted
voting. However, maintaining a complex learning architecture may not always justify its benefits over
less accurate but simpler predictors.
          </p>
          <p>
            Active learning on harmful data drifts Finally, recent works [
            <xref ref-type="bibr" rid="ref7">36, 7</xref>
            ] introduced harmful and
benign data drifts, i.e., the ability to state whether a drift in the input data can cause a concept drift
capable of degrading the ML model performance. In [36] ensembles of Constrained Disagreement
Classifiers ( CDCs) trained to agree on training data and disagree on test data are used to analyse the
ratio of disagreement and thus detect harmful data drifts. Whilst in [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] are introduced Data Distributions
with Low Accuracy (DDLAs) identifiers, i.e., subsets of the feature space where the accuracy of the
model is lower than its overall performance. These, if respectively measured on the training/test data,
can determine the harmfulness too, suggesting the need of completely retrain the model or just fine-tune
it on a sampling of the misclassified testing instances, once they have been labelled by experts.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Fairness</title>
        <p>
          A crucial aspect of the application of ML algorithms to real-world data is the ability to ensure their
fairness, i.e., grant that their behaviour will not perpetuate prejudice or societal bias against any
sub-population of individuals because of their inherent or acquired characteristics [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In order to
measure the amount of fairness of an algorithm we first need to introduce the concept of protected or
sensitive attribute [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], i.e., any feature capable of partitioning individuals into groups sharing similar
benefits [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]; e.g., a non-exhaustive list of attributes traditionally behaving like this is: disability, gender
expression/identity, health/marital status, nationality, race, religion, sex and sexual orientation [37].
        </p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Fairness metrics</title>
          <p>
            Multiple formal definitions of algorithmic fairness have been proposed, and taxonomies are
emerging [
            <xref ref-type="bibr" rid="ref1 ref2 ref4">38, 2, 1, 4</xref>
            ]. These definitions are usually built around three fundamental aspects of a classifier: (i)
independence, i.e., not taking into account the potential correlation between the prediction and the
sensitive attribute, (ii) separation, or the amount of non-correlation between the prediction and the
conditional of the sensitive attribute given the target variable and (iii) suficiency, which aims at keeping
independent the target variable and the conditional of the sensitive attribute given the prediction. With
these abstract fairness criteria in mind, a coarse grain distinction in between group, sub-group and
individual fairness definitions can be identified.
          </p>
          <p>Group fairness According to [38], group-based fairness metrics essentially compare the outcomes
of a classifier trained to distinguish between two or more groups defined by considering the sensitive
attribute. Among these:
• Parity-based metrics, compare predicted Positive Rate (PR) between the groups. For instance,
Statistical/ Demographic Parity (DP) [39, 40] ensures that individuals from the protected and
non-protected groups are equally probable of having a positive result, i.e.:
(^ = 1 | 1) = · · · = (^ = 1 |   )
(1)
where ^ is the predictor, and  the sensitive attribute defining  groups, 1, . . . , . For
simplicity,  is often considered Boolean and thus only taking values in {0, 1}. Inversely, Disparate
Impact [41] controls the fraction of positive predictions given the sensitive attribute being unset
vs those predicting the same outcome but given the sensitive attribute being set;
• Confusion matrix-based metrics, leverage instead the true/false positive/negative rates. For
example, Equal Opportunity (EO) [39, 42] ensures the same chances of positive outcomes for all
individuals regardless of the group they belong to, Equalised Odds [43, 42] aim at achieving the
same rate of true positives and false positives on diferent groups, overall accuracy equality seeks
for the same accuracy on each protected group, conditional use accuracy equality tries to balance
the false omission rate and false discovery rate, Treatment Equality (TE) aims at the same ratio of
false negative and false positives across the groups [44], conditional equal opportunity [45] permits
to equally weight opportunities on a given sensitive attribute, Average Odds Diference [ 46] is
the average between the False PR and the True PR, conditional statistical parity [47] ensures that
given a limited set of sensitive attributes, an equal proportion of individuals sharing the same
values are detained in each group;
• Calibration-based metrics, only consider the predicted probability or score, like the Well
calibration, or test fairness/calibration/matching conditional frequencies fairness [48];
• Score-based metrics, finally try to balance the positive and negative classes, like Bayesian
fairness [49] does.</p>
          <p>Individual fairness As opposed to group-based metrics; individual fairness considers the outcome
for each participating individual. For instance, counterfactual fairness [50] stems from the intuition that
a decision is fair if it holds both in the actual world and in a counterfactual one, where the individual
belongs to a diferent group, contrastive fairness [ 51] instead compares the outcomes between similar
individuals under all the relevant aspects except for their values on the sensitive attribute, equality of
eforts [ 52] focus on which efort should be made in order to achieve the same outcome predicted for
individuals having a diferent value of sensitive attribute. All these three metrics have their roots in
causal models, while the Generalised Entropy Index (GEI) [53] measure the individual impact on the
prediction in a manner similar to the Gini Index [54]. The Theil Index is just a special case of GEI,
when the parameter  = 1 . Finally, also Fairness through Awareness/Unawareness [55, 56, 50] fall under
individual fairness too. In detail, the former states that fixed a similarity metric, similar individuals
should receive similar outcomes; whilst the latter defines the fairness of an algorithm as its capability
of non-explicitly use sensitive attributes in the decision-making process.</p>
          <p>
            Sub-group fairness Last but not least, sub-group fairness [
            <xref ref-type="bibr" rid="ref20">57, 58</xref>
            ] tries harmonising both the group
and individual fairness objectives; e.g. by picking a Group Fairness Indicator (GFI) [
            <xref ref-type="bibr" rid="ref2 ref21">59, 2</xref>
            ] and wondering
whether it also holds on to a wider collection of subgroups.
          </p>
          <p>In this paper, we primarily focus on group fairness. Given its better visualisation capability and simplicity,
we mainly use DP (as defined in Eq. 1) to measure fairness when illustrating the problem in Sec. 3.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Fairness-specific distribution shifts</title>
        <p>
          Among all the data distribution shifts described in Sec. 2.1, some influence more than others fairness.
According to the centralisation on sensitive attributes or relationship between sensitive attributes and
labels, these shifts can be distinguished as three primary categories: demographic shifts [
          <xref ref-type="bibr" rid="ref22 ref23">60, 61</xref>
          ] denote
the distribution changes of sensitive variables that are highly associated with fairness; therefore, a
model that is fair in the training data may struggle to maintain fairness on the deployment data due to
the altered group proportions. Sub-population shifts [
          <xref ref-type="bibr" rid="ref24">62</xref>
          ] refer to a particular subgroup with specific
values of sensitive attributes and labels having fewer positively labelled samples in the training phase
and an increased proportion in the deployment phase. Last but not least, Correlation shifts [
          <xref ref-type="bibr" rid="ref25 ref26">63, 64</xref>
          ]
contribute to change the dependence relationship between sensitive attributes and labels, proposing a
straightforward strategy to address fairness problems in the context of dynamic environments.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Illustration</title>
      <p>We now present an example to show the efect of data shift on the accuracy of a binary classifier, and
on its fairness relative to a single binary sensitive attribute. The example shows that a perfect and fair
classifier, trained on a linearly separable dataset, loses both accuracy and fairness in the presence of shift,
specifically when there is a shift in correlation between the sensitive attribute ( ) and other covariates
(), denoted - correlation shift. Furthermore, we show that the simple approach of retraining the
model to optimise for either of the two objectives degrades the other. This justifies further work into
preserving accuracy-fairness trade-ofs in the presence of data shift.</p>
      <sec id="sec-3-1">
        <title>3.1. Notation and Problem Setup</title>
        <p>Let x ∈ X where X denotes the feature space and let  ∈ Y where Y denotes the label space. The

sensitive attributes are denoted {}=1 (e.g., gender, race),  ∈ X; for simplicity, we focus on the
case with a single sensitive attribute . At time 0, a model  (0) is trained on an original dataset
0 = {(x, , )}|=01|, where there exist specific correlations among the feature vector x, the sensitive
attribute , and the label . However, at time 1, a shift may occur in the data distribution such that the
original correlations between the three variables are altered or even broken. This change can adversely
afect the performance of  (0), impacting both predictive accuracy and fairness properties. In this
section, we illustrate a relatively unnoticed circumstance leveraging a classifier  , parameterised by
 ∈ Θ , and investigating the shifts in data, namely the disruption of the x– correlation, which can
lead to degradation in model accuracy and fairness.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Running Example—with x- Correlation Changes</title>
        <p>Consider a model designed to decide whether an individual can be granted a loan. Here  = income
and  = age are the only covariates. This yields the two sensitive groups: the older group (GO), with
age ≥ 40 , and the younger group (GY), with age &lt; 40. In addition, being granted a loan is the positive
class ( = 1), and being rejected is the negative class ( = 0). Sec. 3.2.1 describes the settings of our
experiments in detail. The results and analysis are shown in Sec. 3.2.2.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Simulation Settings</title>
          <p>
            To characterise and evaluate the impact on the model performance of shifts in the data distribution,
we design an experiment using synthetic data. This synthetic dataset consists of 2,000 records and
comprises two continuous and normally distributed features: income and age independently generated,
and the loan approval, a binary variable used as classification target. More specifically:
1. income follows a bimodal distribution, high-income cases are generated to be normally distributed
around 70, whilst low-income ones are centred on 30; both have a standard deviation (std. dev.)
of 20. The resulting average (income = 50) is equally distant from the two peaks. This feature
will not be afected by the shift, i.e., it is the same in both  0 and 1.
2. in 0 the sensitive group attribute age is normally distributed around 40 ( a0ge) with std. dev. 10
in the [
            <xref ref-type="bibr" rid="ref18">18, 80</xref>
            ] range, i.e., it does not depend on the income. In 1 a shift is instead introduced to
reflect a new negative correlation between age and income; in detail a Gaussian with the same
std. dev. but  a1ge =  a0ge − factor · (income − income) is used. In our experiments factor was set
to 0.2. The resulting shift reflects a potential systemic change in the population structure.
3. in order to have target labels aligned with the shift, we sampled them with the following sigmoid
probabilities: 0 = (income − income) before the shift, and 1 = ((income − income) −
(age −  a0ge)) after. Hence, if we draw a sample above 0.5 we assign it to Class 1 (C1), i.e., loan
approved, or to Class 0 (C0) otherwise. This choice is set to adjust the decision boundary as
balance as possible between the diferent classes, therefore reducing the impact of class imbalance
and focusing more on the change in correlation.
          </p>
          <p>This setting creates a realistic scenario, where the correlations between features and the feature–outcome
relationships might evolve over time simulating how real-world decision processes might change.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Experimental Results</title>
          <p>To ensure rigorous evaluation, we split both the original (0) and shifted (1) data into 80/20% for the
training and test sets. After standardising the data, a 5-fold cross-validation procedure is employed
to optimise the hyper-parameters during training, while the test set is exclusively used for the final
evaluation. Three models were trained in this experiment: (i) a standard logistic regression classifier
0 trained on 0, yielding  0income = 9.347,  0age = 0.011 as weights, and an intercept of 0.255; (ii) a
new logistic regression model  trained directly on 1 and optimised for accuracy, with weights
 income = 8.428,  age = −3.451 and an intercept of 0.247; (iii) a model  optimised for fairness, i.e.,
with a linear decision boundary that only optimises model fairness while ensuring the overall accuracy
remains above a threshold. Specifically,  is obtained by performing a grid search to optimise a linear
classifier of the form  (x) = 1 · income +  2 · age +  , subject to minimising the absolute diference
in positive PR between the two sensitive groups under the constraint that the drop in accuracy is lower
than 30%. The optimal parameters obtained are 1 = 0.036, 2 = 0.053, indicating that age and income
are similarly weighted, and  = −4.090 . Additionally, the original model 0 is evaluated on both 0
and 1 while  and  are only assessed on 1.</p>
          <p>Fig. 1 depicts the changes in the data vertically, i.e., 0 in the top panel and 1 in the bottom ones;
while the lines represent how the three models’ decision boundaries partition the feature space. Tab. 1
reports the number of data points in each specific sub-group both in the original and shifted datasets.
One can observe that in the original test set the older group is slightly favoured; whilst after adding
negative correlations between income and age, the younger group is clearly more favoured.</p>
          <p>Before the x– correlation shift, labels only depend on income because they are generated by a
logistic model predicting the probability of an approved loan application. Consequently, as shown in
Tab. 2; 0 since trained on 0, achieves perfect accuracy (1.000) with low bias (DP = 1.114). Therefore,
we can also see from the top panel in Fig. 1 that 0 separates green and blue dots precisely, and the
points of the same shape are evenly distributed on both sides of the model. Thus, the original model is
both accurate and fair. However, when 0 is used for inference on shifted data, its accuracy drops to
0.915 and its bias to 0.499, clearly favouring the younger group. The shifted data is generated to have
the age feature negatively correlated with the income. In this situation, the positive class is predicted by
a logistic model that depends both on income and age. In the bottom panels in Fig. 1, the shift makes
the younger group (GY) being over-represented in the high-income range while the older group (GO) is
more concentrated in the low-income range, resulting in imbalanced proportions of approved samples
in two groups. The red line in the bottom-left panel of Fig. 1 presents a newly trained model  that
optimises accuracy, leading to a perfect accuracy of 1.000 (again, because the two classes are linearly
separable) but an even more biased DP of 0.322. On the contrary, the purple line representing  in
the bottom-right panel is optimised for fairness. Thus, this model has nearly zero bias, with a DP of
1.039, whilst still experiencing a significant degradation in terms of accuracy (0.718).</p>
          <p>These results confirm that although the initially perfect model 0 achieves high accuracy and fairness
on 0, when the data distribution changes in a way altering the prior x- correlations, its performance
can degrade. This indicates that a model trained on the old data may sufer from both reduced accuracy
and increased bias under the context of data drifts. Meanwhile, the accurate model , since it is not
fairness-aware, can have poor fairness on shifted data despite maintaining a high accuracy.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Concluding remarks</title>
      <p>In this paper, we considered the problem of performance degradation in ML models, including both
accuracy and fairness, under data distribution shifts and illustrated the problem in a specific scenario.
On the one hand, the experiments in Sec. 3 highlight the impact of correlation shifts between sensitive
and non-sensitive attributes on model fairness as well as performance, illustrating the challenges of
maintaining both high accuracy and fairness in dynamically changing environments. On the other hand,
Sec. 3.2.2 demonstrates that focusing solely on accuracy to handle data drifts is not a viable solution for
ensuring fairness. In fact, it may even amplify model bias. Instead, explicitly optimising for fairness
metrics can efectively mitigate bias caused by distribution shifts—though this comes at a significant
cost to accuracy, emphasising the need to carefully balance these trade-ofs in real-world applications.</p>
      <p>
        With respect to that, future work will further explore this framework’s generalisation capabilities,
considering a wider set of models (e.g., tree-based ensembles) and fairness measures (e.g., EO, TE). This
comparative analysis will leverage more complex and adherent to real-world scenarios benchmarks [
        <xref ref-type="bibr" rid="ref27 ref28">65,
66</xref>
        ]. Another interesting research direction will be the study of the robustness to diferent types of data
drift (e.g., sudden, gradual, recurring, etc.) in more dynamic online learning scenarios. Our analysis of
the literature and experimental findings clearly indicate the need for methods that not only control
fairness during model training but also ensure robustness against distribution shifts during deployment.
      </p>
      <p>
        Focusing on the data, it is worth noting that real-world data are rarely provided in a way that is
readily suitable for the development of ML models [
        <xref ref-type="bibr" rid="ref29">67</xref>
        ], thus data engineering pipelines are often used to
clean the raw input data through a series of step-by-step transformations resulting in a clean ML-ready
datasets. At the same time, it is also known that this data preparation process is often neglected [
        <xref ref-type="bibr" rid="ref30">68</xref>
        ]; i.e.,
once the pre-processing pipeline is built, all the efort is put on the model development and deployment.
      </p>
      <p>
        On the contrary, we argue that this robustness can be achieved in two ways: (i) proactively monitoring
the raw input data going through the data preparation process and alarming the data scientist whenever
a shift in the data is detected; or (ii) detecting drifts within the data fed to the ML model and monitoring
it for performance degradation. In the former, and less explored case, system functionality is maintained
by acting upstream on the data preprocessing pipeline, e.g., updating it to preserve by design the fairness
of the downstream model by providing it with already balanced and unbiased data. In the latter and
more adopted case, system reliability is achieved by repairing the model, e.g., by just fine-tuning it
against a bunch of misclassified samples belonging to harmful data drift [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], rather than completely
retraining it, as done in Sec. 3.2.2.
      </p>
      <p>One approach does not exclude the other, eventually, minor shifts could be fixed downstream,
whilst major accuracy/fairness drops, since usually involving retraining ML models from scratch, could
probably benefit from repairs in the data engineering pipeline and in the model design, e.g., by choosing
a diferent trade-of between accuracy and fairness, as suggested by this paper’s results.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
Feature Space, IEEE Transactions on Neural Networks and Learning Systems 25 (2014) 95–110.
doi:10.1109/TNNLS.2013.2271915.
[20] R. P. J. C. Bose, W. M. P. Van Der Aalst, I. Žliobaitė, M. Pechenizkiy, Handling Concept Drift in
Process Mining, in: Advanced Information Systems Engineering, volume 141, 2011, pp. 391–405.
doi:10.1007/978-3-642-21640-4_30.
[21] R. S. M. Barros, S. G. T. C. Santos, A large-scale comparison of concept drift detectors, Information</p>
      <p>Sciences 451-452 (2018) 348–370. doi:10.1016/j.ins.2018.04.014.
[22] R. N. Gemaque, A. F. J. Costa, R. Giusti, E. M. Dos Santos, An overview of unsupervised drift
detection methods, WIREs Data Mining and Knowledge Discovery 10 (2020) e1381. doi:10.1002/
widm.1381.
[23] Y. Kim, C. H. Park, An Eficient Concept Drift Detection Method for Streaming Data under
Limited Labeling, IEICE Transactions on Information and Systems E100.D (2017) 2537–2546.
doi:10.1587/transinf.2017EDP7091.
[24] A. M. Mustafa, G. Ayoade, K. Al-Naami, L. Khan, K. W. Hamlen, B. Thuraisingham, F. Araujo,
Unsupervised deep embedding for novel class detection over data stream, in: 2017 IEEE International
Conference on Big Data (Big Data), 2017, pp. 1830–1839. doi:10.1109/BigData.2017.8258127.
[25] R. F. De Mello, Y. Vaz, C. H. Grossi, A. Bifet, On learning guarantees to unsupervised concept drift
detection on data streams, Expert Systems with Applications 117 (2019) 90–102. doi:10.1016/j.
eswa.2018.08.054.
[26] F. Pinagé, E. M. Dos Santos, J. Gama, A drift detection method based on dynamic classifier selection,</p>
      <p>Data Mining and Knowledge Discovery 34 (2020) 50–74. doi:10.1007/s10618-019-00656-w.
[27] A. G. Maletzke, D. M. Dos Reis, G. E. A. P. A. Batista, Combining instance selection and self-training
to improve data stream quantification, Journal of the Brazilian Computer Society 24 (2018) 12.
doi:10.1186/s13173-018-0076-0.
[28] B. Li, Y.-j. Wang, D.-s. Yang, Y.-m. Li, X.-k. Ma, FAAD: an unsupervised fast and accurate anomaly
detection method for a multi-dimensional sequence over data stream, Frontiers of Information
Technology &amp; Electronic Engineering 20 (2019) 388–404. doi:10.1631/FITEE.1800038.
[29] A. F. J. Costa, R. A. S. Albuquerque, E. M. D. Santos, A Drift Detection Method Based on Active
Learning, in: 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8.
doi:10.1109/IJCNN.2018.8489364.
[30] T. S. Sethi, M. Kantardzic, Handling adversarial concept drift in streaming data, Expert Systems
with Applications 97 (2018) 18–40. doi:10.1016/j.eswa.2017.12.022.
[31] S. Greco, B. Vacchetti, D. Apiletti, T. Cerquitelli, Driftlens: A concept drift detection tool, in:
Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum,
Italy, March 25 - March 28, 2024, pp. 806–809. doi:10.48786/EDBT.2024.75.
[32] G. I. Webb, R. Hyde, H. Cao, H. L. Nguyen, F. Petitjean, Characterizing concept drift, Data Mining
and Knowledge Discovery 30 (2016) 964–994. doi:10.1007/s10618-015-0448-4.
[33] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under Concept Drift: A Review, IEEE
Transactions on Knowledge and Data Engineering (2018) 1–1. doi:10.1109/TKDE.2018.2876
857.
[34] R. Klinkenberg, Learning drifting concepts: Example selection vs. example weighting, Intelligent</p>
      <p>Data Analysis 8 (2004) 281–300. doi:10.3233/IDA-2004-8305.
[35] L. L. Minku, X. Yao, DDD: A New Ensemble Approach for Dealing with Concept Drift, IEEE
Transactions on Knowledge and Data Engineering 24 (2012) 619–633. doi:10.1109/TKDE.2011.
58.
[36] T. Ginsberg, Z. Liang, R. G. Krishnan, A Learning Based Hypothesis Test for Harmful Covariate
Shift, in: The Eleventh International Conference on Learning Representations, 2023, pp. 1–34.</p>
      <p>URL: https://openreview.net/forum?id=rdfgqiwz7lZ.
[37] Geneva: Joint United Nations Programme on HIV/AIDS, UNAIDS terminology guidelines, 2024.</p>
      <p>URL: https://www.unaids.org/en/resources/documents/2024/terminology_guidelines.
[38] S. Caton, C. Haas, Fairness in Machine Learning: A Survey, ACM Computing Surveys 56 (2024)
1–38. doi:10.1145/3616865.
[39] M. Scutari, F. Panero, M. Proissl, Achieving fairness with a simple ridge penalty, Statistics and</p>
      <p>Computing 32 (2022) 77. doi:10.1007/s11222-022-10143-w.
[40] T. Zhao, E. Dai, K. Shu, S. Wang, Towards Fair Classifiers Without Sensitive Attributes: Exploring
Biases in Related Features, in: Proceedings of the Fifteenth ACM International Conference on
Web Search and Data Mining, 2022, pp. 1433–1442. doi:10.1145/3488560.3498493.
[41] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, S. Venkatasubramanian, Certifying and
Removing Disparate Impact, in: Proceedings of the 21st ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, 2015, pp. 259–268. doi:10.1145/2783258.2783311.
[42] J. Wang, Y. Li, C. Wang, Synthesizing Fair Decision Trees via Iterative Constraint Solving, in:
Computer Aided Verification, volume 13372, 2022, pp. 364–385. doi: 10.1007/978-3-031-131
88-2_18.
[43] V. Perrone, M. Donini, M. B. Zafar, R. Schmucker, K. Kenthapadi, C. Archambeau, Fair Bayesian
Optimization, in: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021,
pp. 854–863. doi:10.1145/3461702.3462629.
[44] R. Berk, H. Heidari, S. Jabbari, M. Kearns, A. Roth, Fairness in Criminal Justice Risk Assessments:
The State of the Art, Sociological Methods &amp; Research 50 (2021) 3–44. doi:10.1177/00491241
18782533.
[45] A. Beutel, J. Chen, T. Doshi, H. Qian, A. Woodruf, C. Luu, P. Kreitmann, J. Bischof, E. H. Chi,
Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements, in: Proceedings
of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 453–459. doi:10.1145/
3306618.3314234.
[46] M. A. U. Alam, AI-Fairness Towards Activity Recognition of Older Adults, in: MobiQuitous 2020
17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking
and Services, 2020, pp. 108–117. doi:10.1145/3448891.3448943.
[47] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, A. Huq, Algorithmic Decision Making and the Cost
of Fairness, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2017, pp. 797–806. doi:10.1145/3097983.3098095.
[48] A. Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction</p>
      <p>Instruments, Big Data 5 (2017) 153–163. doi:10.1089/big.2016.0047.
[49] C. Dimitrakakis, Y. Liu, D. C. Parkes, G. Radanovic, Bayesian Fairness, Proceedings of the AAAI</p>
      <p>Conference on Artificial Intelligence 33 (2019) 509–516. doi:10.1609/aaai.v33i01.3301509.
[50] M. J. Kusner, J. Loftus, C. Russell, R. Silva, Counterfactual Fairness, in: Advances in Neural
Information Processing Systems, volume 30, 2017, pp. 4066–4076. URL: https://proceedings.neurip
s.cc/paper_files/paper/2017/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf.
[51] T. Chakraborti, A. Patra, J. A. Noble, Contrastive Fairness in Machine Learning, IEEE Letters of
the Computer Society 3 (2020) 38–41. doi:10.1109/LOCS.2020.3007845.
[52] W. Huang, Y. Wu, L. Zhang, X. Wu, Fairness through Equality of Efort, in: Companion Proceedings
of the Web Conference 2020, 2020, pp. 743–751. doi:10.1145/3366424.3383558.
[53] T. Speicher, H. Heidari, N. Grgic-Hlaca, K. P. Gummadi, A. Singla, A. Weller, M. B. Zafar, A Unified
Approach to Quantifying Algorithmic Unfairness: Measuring Individual &amp;Group Unfairness
via Inequality Indices, in: Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery &amp; Data Mining, 2018, pp. 2239–2248. doi:10.1145/3219819.3220046.
[54] C. Gini, On the measure of concentration with special reference to income and statistics, Colorado</p>
      <p>College Publication General series 208 (1936).
[55] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, Fairness through awareness, in: Proceedings
of the 3rd Innovations in Theoretical Computer Science Conference, 2012, pp. 214–226. doi:10.1
145/2090236.2090255.
[56] N. Grgic-Hlaca, M. B. Zafar, K. P. Gummadi, A. Weller, The case for process fairness in learning:
Feature selection for fair decision making, in: Symposium on Machine Learning and the Law at
the 29th Conference on Neural Information Processing Systems (NIPS 2016), 2016, pp. 45–55. URL:
https://www.mlandthelaw.org/papers/grgic.pdf.
[57] M. Kearns, S. Neel, A. Roth, Z. S. Wu, Preventing Fairness Gerrymandering: Auditing and Learning</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Rabonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Berton</surname>
          </string-name>
          ,
          <article-title>A systematic review of fairness in machine learning</article-title>
          ,
          <source>AI and Ethics</source>
          (
          <year>2024</year>
          ).
          <source>doi:10.1007/s43681-024-00577-5.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Jui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rivas</surname>
          </string-name>
          ,
          <article-title>Fairness issues, current approaches, and challenges in machine learning models</article-title>
          ,
          <source>International Journal of Machine Learning and Cybernetics</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>3095</fpage>
          -
          <lpage>3125</lpage>
          . doi:
          <volume>10</volume>
          .1007/s1 3042-
          <fpage>023</fpage>
          -02083-2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Howe</surname>
          </string-name>
          ,
          <article-title>The Many Facets of Data Equity</article-title>
          ,
          <source>Journal of Data and Information Quality</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . doi:
          <volume>10</volume>
          .1145/3533425.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Bias and Fairness in Machine Learning</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>54</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          . doi:
          <volume>10</volume>
          .1145/3457607.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zadorozhny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thoral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Elbers</surname>
          </string-name>
          , G. Cinà,
          <article-title>Out-of-Distribution Detection for Medical Applications: Guidelines for Practical Evaluation, in: Multimodal AI in Healthcare: A Paradigm Shift in Health Intelligence</article-title>
          ,
          <year>2023</year>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>153</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -14771-5_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Abbasi</given-names>
            <surname>Bavil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Subasri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fine</surname>
          </string-name>
          , E. Dolatabadi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdalla</surname>
          </string-name>
          ,
          <article-title>Empirical data drift detection experiments on real-world medical imaging data</article-title>
          ,
          <source>Nature Communications</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <year>1887</year>
          . doi:
          <volume>10</volume>
          .1038/s41467-024-46142-w.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <source>Eficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines, Proceedings of the VLDB Endowment</source>
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <fpage>3072</fpage>
          -
          <lpage>3081</lpage>
          . doi:
          <volume>10</volume>
          .14778/3681954.3681984.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bayram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kassler</surname>
          </string-name>
          ,
          <article-title>From concept drift to model degradation: An overview on performance-aware drift detectors</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>245</volume>
          (
          <year>2022</year>
          )
          <article-title>108632</article-title>
          . doi:
          <volume>10</volume>
          .1016/j. knosys.
          <year>2022</year>
          .
          <volume>108632</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kawanabe</surname>
          </string-name>
          ,
          <article-title>Machine learning in non-stationary environments: introduction to covariate shift adaptation</article-title>
          , MIT Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsymbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Puuronen</surname>
          </string-name>
          ,
          <article-title>Dynamic integration of classifiers for handling concept drift</article-title>
          ,
          <source>Information Fusion 9</source>
          (
          <year>2008</year>
          )
          <fpage>56</fpage>
          -
          <lpage>68</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.inffus.
          <year>2006</year>
          .
          <volume>11</volume>
          .0
          <fpage>02</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Moreno-Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Raeder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alaiz-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>A unifying view on dataset shift in classification</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>45</volume>
          (
          <year>2012</year>
          )
          <fpage>521</fpage>
          -
          <lpage>530</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.
          <source>patcog.2 011</source>
          .06.019.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smola</surname>
          </string-name>
          ,
          <article-title>Detecting and Correcting for Label Shift with Black Box Predictors</article-title>
          ,
          <source>in: Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>3122</fpage>
          -
          <lpage>3130</lpage>
          . URL: https://proceedings.mlr.press/v80/lipton18a.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Muandet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Domain Adaptation under Target and Conditional Shift</article-title>
          ,
          <source>in: Proceedings of the 30th International Conference on Machine Learning</source>
          , volume
          <volume>28</volume>
          ,
          <year>2013</year>
          , pp.
          <fpage>819</fpage>
          -
          <lpage>827</lpage>
          . URL: http://proceedings.mlr.press/v28/zhang13d.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>M. M. Masud</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            , J. Han,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Thuraisingham</surname>
          </string-name>
          ,
          <article-title>Classification and Novel Class Detection of Data Streams in a Dynamic Feature Space, in: Machine Learning and Knowledge Discovery in Databases</article-title>
          , volume
          <volume>6322</volume>
          ,
          <year>2010</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>352</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -15883-
          <issue>4</issue>
          _
          <fpage>2</fpage>
          2.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Elwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Polikar</surname>
          </string-name>
          ,
          <article-title>Incremental Learning of Concept Drift in Nonstationary Environments</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          <volume>22</volume>
          (
          <year>2011</year>
          )
          <fpage>1517</fpage>
          -
          <lpage>1531</lpage>
          . doi:
          <volume>10</volume>
          .1109/TNN.
          <year>2011</year>
          .
          <volume>2160459</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Krawczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Minku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stefanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <article-title>Ensemble learning for data stream analysis: A survey</article-title>
          ,
          <source>Information Fusion</source>
          <volume>37</volume>
          (
          <year>2017</year>
          )
          <fpage>132</fpage>
          -
          <lpage>156</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.inffus.
          <year>2017</year>
          .
          <volume>02</volume>
          .0
          <fpage>04</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsymbal</surname>
          </string-name>
          ,
          <article-title>The problem of concept drift: definitions and related work</article-title>
          , Computer Science Department, Trinity College Dublin 106 (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Hickey</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Black</surname>
          </string-name>
          ,
          <article-title>Refined Time Stamps for Concept Drift Detection During Mining for Classification Rules</article-title>
          , in: Temporal, Spatial, and
          <string-name>
            <surname>Spatio-Temporal Data</surname>
            <given-names>Mining</given-names>
          </string-name>
          , volume
          <year>2007</year>
          ,
          <year>2001</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>30</lpage>
          . doi:
          <volume>10</volume>
          .1007/3-540-45244-
          <issue>3</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>J. B. Gomes</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Gaber</surname>
            ,
            <given-names>P. A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Sousa</surname>
          </string-name>
          , E. Menasalvas,
          <article-title>Mining Recurring Concepts in a Dynamic for Subgroup Fairness</article-title>
          ,
          <source>in: Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>2564</fpage>
          -
          <lpage>2572</lpage>
          . URL: https://proceedings.mlr.press/v80/kearns18a.html.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kearns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>An Empirical Study of Rich Subgroup Fairness for Machine Learning</article-title>
          ,
          <source>in: Proceedings of the Conference on Fairness, Accountability, and Transparency</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>100</fpage>
          -
          <lpage>109</lpage>
          . doi:
          <volume>10</volume>
          .1145/3287560.3287592.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>Unbiased look at dataset bias</article-title>
          ,
          <source>in: CVPR</source>
          <year>2011</year>
          ,
          <year>2011</year>
          , pp.
          <fpage>1521</fpage>
          -
          <lpage>1528</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2011</year>
          .
          <volume>5995347</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>B.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Che</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Transferring Fairness under Distribution Shifts via Fair Consistency Regularization</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>35</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>32582</fpage>
          -
          <lpage>32597</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2022/file/d1dbaabf 454a479ca86309e66592c7f6-Paper-Conference.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>S.</given-names>
            <surname>Giguere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Metevier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Brun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            da
            <surname>Silva</surname>
          </string-name>
          , P. S. Thomas,
          <string-name>
            <given-names>S.</given-names>
            <surname>Niekum</surname>
          </string-name>
          ,
          <article-title>Fairness Guarantees under Demographic Shift</article-title>
          ,
          <source>Proceedings of the 10th International Conference on Learning Representations (ICLR)</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          . URL: https://par.nsf.gov/biblio/10334581.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [62]
          <string-name>
            <given-names>S.</given-names>
            <surname>Maity</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yurochkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Does enforcing fairness mitigate biases caused by subpopulation shift?</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>34</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>25773</fpage>
          -
          <lpage>25784</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2021/file/d800149d2f947a d4d64f34668f8b20f6-
          <fpage>Paper</fpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [63]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Roh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Whang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Suh</surname>
          </string-name>
          ,
          <article-title>Improving Fair Training under Correlation Shifts</article-title>
          ,
          <source>in: Proceedings of the 40th International Conference on Machine Learning</source>
          , volume
          <volume>202</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>29179</fpage>
          -
          <lpage>29209</lpage>
          . URL: https://proceedings.mlr.press/v202/roh23a.html.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [64]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Towards Fair Disentangled Online Learning for Changing Environments</article-title>
          ,
          <source>in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>3480</fpage>
          -
          <lpage>3491</lpage>
          . doi:
          <volume>10</volume>
          .1145/3580305.3599523.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [65]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <article-title>Statlog (German Credit Data)</article-title>
          ,
          <source>UCI Machine Learning Repository</source>
          ,
          <year>1994</year>
          . doi:https: //doi.org/10.24432/C5NC77.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [66]
          <article-title>Agency for Healthcare Research and Quality (AHRQ)</article-title>
          ,
          <source>Medical Expenditure Panel Survey (MEPS)</source>
          ,
          <year>1996</year>
          . URL: https://meps.ahrq.gov/mepsweb.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [67]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mandreoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Guidetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Motta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <article-title>Real-world data mining meets clinical practice: Research challenges and perspective</article-title>
          ,
          <source>Frontiers in Big Data</source>
          <volume>5</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3389/fdat a.
          <year>2022</year>
          .
          <volume>1021621</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [68]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sambasivan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kapania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Highfill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Akrong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <article-title>"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI</article-title>
          ,
          <source>in: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1145/34
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>