1. Introduction

Learning Fairer Representations with FairVIC

Charmaine Barker

Daniel Bethell

Dimitar Kazakov

0 0 Department of Computer Science, University of York , York , United Kingdom

2025

Mitigating bias in automated decision-making systems, particularly in deep learning models, is a critical challenge due to nuanced definitions of fairness, dataset-specific biases, and the inherent trade-of between fairness and accuracy. To address these issues, we introduce FairVIC, an innovative approach that enhances fairness in neural networks by integrating variance, invariance, and covariance terms into the loss function during training. Unlike methods based on predefined criteria, FairVIC abstracts fairness to minimise dependency on protected attributes. We evaluate FairVIC against comparable bias mitigation techniques on benchmark datasets, considering both group and individual fairness, and conduct an ablation study on the accuracy-fairness trade-of. FairVIC demonstrates significant improvements ( ≈ 70%) in fairness across all tested metrics without significantly compromising accuracy (≈ − 5%), thus ofering a robust, generalisable solution for fair deep learning across diverse tasks and datasets.

eol>Machine Learning Bias Mitigation Fairness In-Processing Fairness-Accuracy Trade-of Ethical Basis for Trustworthy AI

1. Introduction

With the ever-increasing utilisation of Artificial Intelligence (AI) in everyday applications, neural networks have emerged as pivotal tools for automated decision making systems in sectors such as healthcare [ 1 ], finance [ 2 ], and recruitment [ 3 ]. However, bias in the data–stemming from historical inequalities, imbalanced distributions, or flawed feature representations–are often learned by these models, posing significant challenges to fairness. Such bias can lead to real-world harms. For instance, several studies have shown how bias in facial recognition technologies disproportionately misidentifies individuals of certain ethnic backgrounds [ 4, 5 ], leading to potential discrimination in law enforcement and hiring practices.

This highlights the urgent need to address AI bias. Ensuring fairness in deep learning models presents complex challenges, primarily due to the black-box nature of these models, which often complicates understanding and interpreting decisions. Moreover, the dynamic and high-dimensional nature of the data involved, combined with nuances in fairness definitions, further complicates the detection and correction of bias. This complexity necessitates the development of more sophisticated, inherently fair algorithms.

Previous mitigation strategies dealing with algorithmic bias–whether through pre-processing, inprocessing, or post-processing–have significant limitations. Pre-processing techniques, which attempt to cleanse biased data, are labour-intensive, dependent on expert intervention [ 6 ], and only eliminate considered biases. Current in-processing methods frequently lead to unstable models and often rely upon arbitrary definitions of fairness [ 7 ]. Post-processing techniques, which adjust model predictions directly, ignore deeper issues without addressing the underlying biases in the data and model. These approaches lack stability, generalisability, and the ability to ensure fairness across multiple metrics [ 8 ].

In this paper, we introduce FairVIC (Fairness through Variance, Invariance, and Covariance), a novel approach that embeds fairness directly into neural networks by optimising a custom loss function.

This function is designed to minimise the correlation between binary decisions and binary protected characteristics while maximising overall prediction performance. FairVIC integrates fairness through the concepts of variance, invariance, and covariance during the training process, making it more principled and intuitive, and universally applicable to diverse datasets. Unlike previous methods that often optimise to a chosen fairness metric, FairVIC ofers a robust, generalisable solution that introduces an abstract concept of fairness to significantly reduce bias. Our experimental evaluations demonstrate FairVIC’s ability to significantly improve performance in all fairness metrics tested without compromising prediction accuracy. We compare our proposed method against comparable in-processing bias mitigation techniques, such as regularisation and constraint approaches, and highlight the improved, robust performance of our FairVIC model.

Our contributions in this paper are multi-fold: • A novel, generalisable in-processing bias mitigation technique for neural networks; • A comprehensive experimental evaluation, using a multitude of comparable methods on a variety of metrics across several datasets, including diferent modalities such as tabular, text, and image; • An extended analysis of our proposed method to examine its robustness, including a full ablation study on the lambda weight terms within our loss function.

This paper is structured as follows: Section 2 discusses current approaches to mitigating bias throughout each processing stage. Section 3 describes any preliminary details for this work, including the fairness metrics used in the evaluation. Section 4 outlines our method, including how each term in our loss function is calculated and an algorithm detailing how these terms are applied. Section 5 describes the experiments carried out, Section 6 outlines the results with discussion, and Section 7 concludes this work. Extra information, including the dataset metadata and more extensive experiments, is to be found in the Appendix.

2. Related Work

There exist three broad categories of mitigation strategies for algorithmic bias: pre-processing, inprocessing, and post-processing. Each aims to increase fairness diferently by acting upon either the training data, the model itself, or the predictions outputted by the model, respectively.

Pre-processing methods aim to fix the data before training, recognising that bias is primarily an issue with the data itself [ 7 ]. In practice, this can be done a number of diferent ways, such as representative sampling, or re-sampling the data to reflect the full population [ 9, 10 ], reweighing the data such that diferent groups influence the model in a representative way [ 11, 12 ], or generating synthetic data to balance out the representation of each group [ 13 ]. Another set of approaches utilises causal methods to delineate relationships between sensitive attributes and the target variables within the data [ 14, 15, 16 ]. Such techniques as these are labour-intensive and do not generalise well, requiring an expert with knowledge of the data to manually process each case of a new dataset [ 6 ]. They also cannot provide assurances that all bias has been removed–a model may draw upon non-linear/ complex relationships between features that lead to bias, which are hard for the expert/method to spot.

In-processing methods aim to train models to make fairer predictions, even upon biased data. There are a plethora of ways in which this has been done. For example, Celis et al. [ 17 ] and Agarwal et al. [ 18 ] utilise a chosen fairness metric and perform constraint optimisation during training. This requires choosing a fairness metric, introducing human bias and limiting generalisation [ 7 ]. Therefore, fairness cannot be achieved across multiple definitions in this way [ 7 ]. Another approach involves incorporating an adversarial component during model training that penalises the model if protected characteristics can be predicted from its outputs [ 19, 20, 21 ]. These methods are often efective but their main shortcoming is seen in their instability [22]. Finally, the most relevant comparisons from previous work to our proposed method are regularisation-based techniques that incorporate fairness constraints or penalties directly into the model’s loss function during training. There are a number of ways that this has been done, such as through data augmentation strategies to promote less sensitive decision boundaries [23] or by incorporating fairness adjustments into the boosting process [24]. The performance of these models difers from approach to approach, and those that work by constraining the model by a fairness metric directly sufer from the issue of human bias and misrepresenting the bias within the data/model.

Post-processing techniques involve adjusting model predictions or decision rules after training to ensure fair outcomes. In practice, decision thresholds have been adjusted for diferent groups to achieve equal outcomes in a particular metric [25]. Alternatively, labels near the decision boundary can be altered to favour less biased outcomes [26, 27]. Calibration [28, 29] adjusts the predictions of the model directly so that the proportion of positive instances is equal across each sub-group. These methods oversimplify fairness and ignore model-level bias. For those techniques that require the specification of a single fairness metric, the same issue applies surrounding this choice as before.

To summarise, there currently lies a number of issues which have not yet been solved in parallel within one technique. These are: stability, generalisability, equal improvements to fairness across metrics [ 8 ], and built without requirements for user-induced definitions of fairness. In this paper, we solve all these requirements for an efective, generalisable approach to mitigate bias through FairVIC.

3. Preliminaries 3.1. VICReg

Variance-Invariance-Covariance Regularization (VICReg) [30] has previously been used in selfsupervised learning to tackle feature collapse and redundancy. It maximises variance across features to ensure the model produces diverse outputs for diferent inputs, minimises invariance between augmented representations of the same input to enhance stability, and reduces covariance among features to capture a broader range of information. VICReg has mostly been confined to self-supervised learning, with little exploration beyond. To this extent, FairVIC reworks the VIC components completely to serve its application in supervised learning for bias mitigation, a principle that has remained so far unexplored. This adaptation addresses the challenges of fairness in decision-making systems, expanding the application of VIC principles beyond their original scope and ofering a novel, generalisable solution to fairness in supervised learning models.

3.2. Group Fairness Metrics

In this section, we introduce notation and state the fairness measures that we use to quantify bias. Equalized Odds Diference requires that both the True Positive Rate (TPR) and False Positive Rate (FPR) are the same across groups defined by the protected attribute, where = + and = + [25]. Therefore, we calculate max (| − | , | − |), where represents the unprivileged groups and the privileged group and 0 signifies perfect fairness. Average Absolute Odds Diference averages the absolute diferences in the false positive rates and true positive rates between groups, defined as 21 (| − | + | − |), where represents the unprivileged groups and the privileged group, with 0 signifying perfect fairness. Statistical Parity Diference evaluates the diference in the probability of a positive prediction between groups, aiming for 0 to signify perfect fairness. Formally, = | (^ = 1|) − (^ = 1|)|, where represents the unprivileged groups, –the privileged group, and ^ = 1–a positive prediction [31]. Disparate Impact compares the proportion of positive outcomes for the unprivileged group to that of the privileged group, with a ratio of 1 indicating no disparate impact, and therefore perfect fairness. Denoted as = (^ =1|) , where represents the unprivileged groups, –the privileged group, and (^ =1|) ^ = 1–a positive prediction [32].

3.3. Individual Fairness

While FairVIC aims to increase group fairness, the invariance term promotes direct improvements in individual fairness. This can be observed in our evaluations through counterfactual fairness [ 15 ]. Counterfactual fairness ensures that decisions made by an algorithm are fair even when considering hypothetical (counterfactual) scenarios. For each individual, the sensitive attribute is switched to assess the model’s ability to perform equally in both the original and counterfactual scenarios.

Formally, if denotes the unprivileged group, the privileged group and ^ is the decision outcome, then the model is considered counterfactually fair if ^ = ^ for diferent groups and of the sensitive attribute while all non-sensitive features remain the same.

4. Approach

We propose FairVIC (Fairness through Variance, Invariance, and Covariance), a novel loss function that enables a model’s ability to learn fairness in a robust manner. FairVIC is comprised of three terms: variance, invariance, and covariance. Optimising for these three terms encourages the model to be stable and consistent across protected characteristics, thereby reducing bias during training. This broad, generalised approach to defining bias improves performance across a range of fairness metrics. This makes it an efective strategy for reducing bias across various applications, ensuring more equitable outcomes in diverse settings.

4.1. FairVIC Training

To understand how FairVIC operates, it is crucial to define variance, invariance, and covariance in their adapted forms: each is based on a classical statistical concept but used to express a specific fairness objective within the model. We consider binary classification tasks, where the model output ^ ∈ [ 0, 1 ] represents the predicted probability of the favourable class, obtained via a sigmoid activation.

Variance: This term promotes diversity in the latent representations by penalising low variance across features in the bottleneck embeddings, denoted ∈ R, where is the output of the encoder for each input. It ensures the embeddings capture suficient information, rather than collapsing to a trivial relationship such as the protected characteristic.

var = 1 ∑︁ max(0, − ())

=1 where () represents the standard deviation of the -th feature across the batch, we set to 1.0 as a margin parameter to encourage unit-scale variability in the latent representation, and is the number of features in the embedding.

Invariance: This term ensures the model’s predictions remain consistent when the protected attribute is flipped, promoting individual (counterfactual) fairness. where ^ is the prediction for the original input, ^* is the prediction for the input with the protected attribute flipped to its complement, and is the number of samples.

Covariance: This component seeks to penalise any contribution of the protected attribute to the decision process of the classifier, ensuring that predictions are not systematically skewed for the privileged group. By doing so, it promotes group fairness. The loss function is designed to minimise this covariance, as defined by the following equation: =1 where ^ ∈ (0, 1) is the model’s softmax output, ∈ {0, 1} is the binary protected attribute with = 1 denoting the privileged group, and is the number of samples. In general, E[^] reflects the empirical average of predictions across the batch, though it may approach 0.5 in a balanced and well-calibrated setting. During training, this loss reduces the deviation of privileged group predictions from the batch mean. In practice, it tends to soften overly confident predictions for the privileged group without necessarily changing the predicted class. This term does not directly afect accuracy, which is controlled by a separate loss, but its interaction with the accuracy objective discourages reliance on group membership in decision-making, thereby promoting group fairness.

Together, alongside a suitable accuracy loss, FairVIC jointly optimises for the total loss equation seen in Equation 4.

total = accacc + varvar + invinv + covcov (4) where each is a non-negative weighting coeficient balancing the contribution of its corresponding term, subject to the normalisation constraint acc + var + inv + cov = 1.

5. Experiments 5.1. Datasets

In our experimental evaluation, we assess the performance of FairVIC1 against a set of comparable in-processing bias mitigation methods on a series of datasets known for their bias. Here, we describe the datasets used and the methods we compare against.

We evaluate FairVIC on seven widely used bias mitigation benchmarks across tabular, text, and image modalities. These datasets contain known demographic disparities, allowing us to assess FairVIC’s generalisability. For each, we designate one attribute as the protected characteristic on which fairness is to be improved.

Tabular datasets. We use Adult Income [33], COMPAS [34], and German Credit [35], all binary classification tasks with known biases. Adult Income predicts whether income exceeds $50K and exhibits gender and racial bias. COMPAS predicts recidivism risk and is notorious for racial bias. German Credit classifies creditworthiness, with biases related to age and gender [36]. Language datasets. CivilComments-WILDS [37] and BiasBios [38] are used to assess FairVIC on text data. We sample 50,000 stratified examples per dataset to ensure balance. CivilComments classifies online comments as toxic or non-toxic, using race as the protected attribute. BiasBios consists of professional biographies classified into favourable vs. unfavourable occupations (e.g., surgeon vs. nurse), with gender as the protected feature.

Image datasets. CelebA [39] contains celebrity portraits labelled for attributes such as blond hair; we predict whether a person has blond hair or not, using gender as the protected attribute. UTKFace [40] includes facial images labelled with age and race; we predict whether a subject is above or below 30, with race as the protected attribute.

Detailed metadata for each dataset, including our selections for protected groups and classification goals, can be found in Appendix A.2.

5.2. Comparable Techniques

To highlight the performance of FairVIC, we evaluate against five comparable in-processing bias mitigation methods. These are: Adversarial Debiasing. This method leverages an adversarial network that aims to predict protected characteristics based on the predictions of the main model. The primary model seeks to maximise its own prediction accuracy while minimising the adversary’s prediction accuracy [ 19 ]. 1The code for our FairVIC implementation is provided at: https://github.com/CharmaineBarker/FairVIC. Exponentiated Gradient Reduction. This technique reduces fair classification to a sequence of cost-sensitive classification problems, returning a randomised classifier with the lowest empirical error subject to a chosen fairness constraint [ 18 ].

Meta Fair Classifier.

This classifier allows a fairness metric as an input and optimises the model with

respect to regular performance and the chosen fairness metric [ 17 ].

Fair MixUp. This technique generates synthetic samples by linearly interpolating between pairs of training data points by protected attribute to smooth decision boundaries. The loss function is then further constrained by a fairness metric [23].

FairGBM. This method uses a gradient-boosting decision tree model that integrates fairness constraints directly into the boosting process by adjusting the loss function to account for fairness metrics [24].

A baseline neural network trained with binary cross-entropy was also implemented to reflect dataset biases. Model architecture and hyperparameters for both baseline and FairVIC are detailed in Appendix A.3.

6. Evaluation 6.1. Core Results Analysis

To assess the prediction and fairness performance of FairVIC2 and state-of-the-art approaches, we test all methods across each tabular dataset to enable a fair comparison. Table 1 shows these results. We have also provided Figure 3, which visualises the absolute diference from the ideal value of each metric, highlighting how far each method deviates from perfect accuracy and fairness on each tabular dataset. Accuracy and fairness results of FairVIC versus the baseline and five in-processing methods on three tabular datasets. Bold indicates the best overall trade-of between fairness and accuracy, reflecting our aim of holistic improvement (see Figure 3). Underlined values show the best score for each individual metric.

2See Table 6, Appendix B.3 for our FairVIC lambdas, and discussion on these selections.

could misleadingly suggest that the model is fair, when in reality, the bias may only become evident when captured through a diferent perspective. Single-metric approaches like EGR often fail to address significant bias.

Overall, FairVIC outperforms all other comparable methods by demonstrating consistent improvements in both fairness and accuracy retention. As seen in Figure 1, our FairVIC model achieves the lowest cumulative absolute error from perfect accuracy and fairness in the Adult Income dataset, effectively balancing the fairness-accuracy trade-of. The trend is also consistent across the COMPAS and German Credit datasets seen in Figure 3, Appendix B. This further highlights FairVIC’s ability to generalise across datasets.

Other comparable methods are generally not as efective as FairVIC, each exhibiting diferent shortcomings. For instance, MetaFair often struggles to improve even upon the baseline in cumulative absolute diference from the ideal value, and many techniques struggle to balance the improvements across all fairness metrics, often prioritising Equalised and Absolute Odds over Disparate Impact, particularly in the Adult Income dataset. Similarly, FairMixUp, though initially promising and achieving second place after FairVIC in the COMPAS and German Credit datasets, fails to maintain its performance on the Adult Income dataset, where its results only just beat the baseline. In many cases, such as FairMixUp on the COMPAS and German Credit datasets, comparable techniques improve fairness but at the cost of accuracy, failing to achieve a balanced tradeof. Exponentiated Gradient Reduction (EGR) sees similar struggles. While performing the best in terms of each individual fairness metric in the COMPAS dataset, it does this at the expense of accuracy, where it sees a drop of 15.83% in accuracy. Further to this, EGR then performs poorly on the Adult Income dataset, only achieving a disparate impact of 0.4602, suggesting inability to generalise across datasets.

Following this paper’s objective to create an approach that performs well across multiple fairness metrics without significantly compromising accuracy, we find that FairVIC demonstrates a consistent ability to balance fairness and accuracy across diverse datasets. Its adaptability, strong performance on all fairness metrics, and robustness to dataset shifts position it as the most efective method overall. This is further supported by its consistently low cumulative absolute error across both fairness and performance measures, highlighting its advantage over existing in-processing techniques.

6.2. Individual Fairness Analysis

To emphasise further FairVIC’s ability to perform well across all fairness metrics, we also evaluate upon individual fairness by outputting the results of the counterfactual model, as described in Section 3.3. The full results, alongside the absolute diference in averages for each metric across the regular and counterfactual models, are seen in Table 5, Appendix B.2.

The FairVIC model shows considerable promise in enhancing individual fairness across diferent datasets when compared with the baseline models. The counterfactual results from the FairVIC model with invariance term weighted heavily (FairVIC Invariance) exhibits lower absolute diferences in metrics across all datasets. For example, in the German Credit dataset, the mean absolute diference across all six metrics between the regular and the counterfactual baseline model is 0.0277, while for FairVIC Invariance’s regular and counterfactual models it is lower at 0.0108. This suggests a more stable and fair performance under counterfactual conditions. This capability highlights FairVIC’s strength in not only addressing group fairness but also ensuring that individual decisions remain consistent and fair when hypothetical scenarios are considered. In the FairVIC model with recommended lambdas, we prioritise group fairness so invariance is weighted less. Even with this lower invariance weighting, FairVIC still achieved improved individual fairness.

6.3. Language and Image Dataset Results

To demonstrate FairVIC’s versatility, we apply it to CivilComments-WILDS and BiasBios. Results are shown in Table 2; our selected lambda configurations are given in Table 6. Across all fairness metrics, FairVIC improves upon the baseline, consistent with trends seen in the tabular datasets. Notably, Disparate Impact improves from 1.9390 to 1.1344 (CivilComments-WILDS), 0.6038 to 0.7817 (BiasBios), 0.4353 to 0.8259 (CelebA), and 1.3136 to 1.0076 (UTKFace). For individual fairness, the mean absolute diference between regular and counterfactual models is reduced substantially: from 0.3303 to 0.0503 on CivilComments-WILDS, 0.2563 to 0.1207 on BiasBios, 0.4618 to 0.1001 on CelebA, and 0.1468 to 0.0125 on UTKFace. These results confirm FairVIC’s efectiveness across modalities and architectures, improving both group and individual fairness with minimal accuracy trade-of.

6.4. Lambda Ablation Study Analysis

Finally, to better understand the internal dynamics of FairVIC, we conduct an ablation study on the weightings of FairVIC’s loss terms to analyse their individual contributions. The FairVIC loss terms are combined with binary cross entropy for training the neural network to enable optimisation of both accuracy and fairness, minimising the trade-of. The efect of FairVIC on the overall loss function can be increased and decreased by changing the weight for each FairVIC term. To evaluate this efect, we train a number of neural networks with the architecture described in Appendix A.3, with a diferent acc weighting each time. In this experiment, we evaluate the efect of weighting the FairVIC loss terms equally, so that var = inv = cov = (1− 3 acc) , where 0 < acc < 1. The performance and fairness measures for each model are listed in Table 7, Appendix C, and visualisations for the absolute diference in performance and fairness from ideal values for each run are visualised in Figure 7, Appendix C.

In Figure 7 (Appendix C), the trade-of between accuracy and fairness is evident. As acc increases, predictive performance improves, but the fairness metrics deviate further from the ideal value. In contrast, when acc is lower, fairness improves, but this time with only a negligible drop in accuracy. This suggests that lower acc values provide a better overall performance balance. This trend is much more prevalent for the larger Adult dataset, where more complex relationships could lead to a larger accuracy-fairness trade-of. In the COMPAS and German Credit datasets, this trade-of, while still following the same pattern, is much smaller.

To evaluate the efect of each individual VIC term within the loss function, we can suppress the lambda terms from two out of three of variance, invariance, and covariance to leave only one remaining. We keep acc = 0.1 since the previous lambda experiment showed this to be most efective and revealing in terms of the efect on fairness, while the chosen FairVIC loss term is assigned a weighting of 0.9. Similarly, we can also suppress a single term at a time, assigning two out of the three VIC terms a weighting of 0.45. The performance and fairness results for each experiment with diferent weightings are listed in Table 8.

It can be concluded that each term has a diferent efect. The variance term is shown to have the lowest standard deviation across all metrics and all tabular data in Table 8, ofering stability to FairVIC. The covariance term makes the greatest contribution to group fairness, as seen in Table 8. The invariance term aims to give similar outputs to similar inputs, regardless of the protected attribute; therefore, it should have more of an efect towards individual fairness. Table 5 corroborates this hypothesis, as the FairVIC Invariance model (FairVIC with the invariance loss term weighted to 0.9, and accuracy loss of 0.1) consistently has a lower absolute diference than the baseline between the regular and counterfactual models across all metrics and tabular datasets, signalling greater individual fairness. Therefore, we conclude that the combination of all three terms would aim to improve both group and individual fairness, and increase stability.

Based on these findings, we can ofer our recommendations for default lambda values to achieve efective results. We suggest beginning with a configuration of acc = 0.1, var = 0.1, inv = 0.1, and cov = 0.7, which consistently performs well across datasets. This configuration provides a strong default, with further tuning advised based on the relative importance of group fairness, individual fairness, or accuracy in a given application. A detailed discussion of dataset-specific configurations and practical guidance for adjusting these weights is provided in Appendix B.3.

7. Conclusion and Future Work

In this paper, we introduced FairVIC, an in-processing bias mitigation technique that introduces three new terms into the loss function of a neural network- variance, invariance, and covariance. Across our experimental evaluation, FairVIC significantly improves scores for all fairness metrics, with minimal drop in accuracy, compared to previous comparable methods which typically aim to improve only upon a single metric. This balance showcases FairVIC’s strength in providing a robust and efective solution applicable across various tasks and datasets. Future work will extend FairVIC to multiple protected attributes and targets.

Declaration on Generative AI The author(s) have not employed any Generative AI tools.

networks, in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019. [22] X. Han, J. Chi, Y. Chen, Q. Wang, H. Zhao, N. Zou, X. Hu, Ffb: A fair fairness benchmark for in-processing group fairness methods, arXiv preprint arXiv:2306.09468 (2023). [23] C.-Y. Chuang, Y. Mroueh, Fair mixup: Fairness via interpolation, in: International Conference on

Learning Representations, 2021. [24] A. Cruz, C. G. Belém, J. Bravo, P. Saleiro, P. Bizarro, Fairgbm: Gradient boosting with fairness constraints, in: The Eleventh International Conference on Learning Representations, 2023. [25] M. Hardt, E. Price, N. Srebro, Equality of opportunity in supervised learning, Advances in neural information processing systems 29 (2016). [26] F. Kamiran, A. Karim, X. Zhang, Decision theory for discrimination-aware classification, in: 2012

IEEE 12th international conference on data mining, IEEE, 2012, pp. 924–929. [27] F. Kamiran, S. Mansha, A. Karim, X. Zhang, Exploiting reject option in classification for social discrimination control, Information Sciences 425 (2018) 18–33. [28] M. Kim, O. Reingold, G. Rothblum, Fairness through computationally-bounded awareness, in: Advances in Neural Information Processing Systems, volume 31, Curran Associates, Inc., 2018. URL: https://proceedings.neurips.cc/paper/2018/hash/c8dfece5cc68249206e4690fc4737a8d-Abstract. html. [29] A. Noriega-Campero, M. A. Bakker, B. Garcia-Bulle, A. Pentland, Active fairness in algorithmic decision making, in: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 77–83. URL: https: //dl.acm.org/doi/10.1145/3306618.3314277. doi:10.1145/3306618.3314277. [30] A. Bardes, J. Ponce, Y. LeCun, VICReg: Variance-Invariance-Covariance Regularization for selfsupervised learning, arXiv preprint arXiv:2105.04906 (2021). [31] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, Fairness through awareness, in: Proceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226. [32] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, S. Venkatasubramanian, Certifying and removing disparate impact, in: proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 259–268. [33] B. Becker, R. Kohavi, Adult, UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. [34] J. Angwin, J. Larson, S. Mattu, L. Kirchner, Machine bias, in: Ethics of data and analytics, Auerbach

Publications, 2022, pp. 254–264. [35] H. Hofmann, Statlog (German Credit Data), UCI Machine Learning Repository, 1994. DOI: https://doi.org/10.24432/C5NC77. [36] F. Kamiran, T. Calders, Classifying without discriminating, in: 2009 2nd international conference on computer, control and communication, IEEE, 2009, pp. 1–6. [37] P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao, et al., Wilds: A benchmark of in-the-wild distribution shifts, in: International conference on machine learning, PMLR, 2021, pp. 5637–5664. [38] M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, A. T. Kalai, Bias in bios: A case study of semantic representation bias in a high-stakes setting, in: proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 120–128. [39] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in: Proceedings of

International Conference on Computer Vision (ICCV), 2015. [40] Z. Zhang, Y. Song, H. Qi, Age progression/regression by conditional adversarial autoencoder, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5810–5818. [41] J. Kleinberg, S. Mullainathan, M. Raghavan, Inherent trade-ofs in the fair determination of risk scores, arXiv preprint arXiv:1609.05807 (2016).

A. Experiment Details A.1. FairVIC Training Algorithm

During training, the model iterates over epochs , with the data shufled into batches. For each batch, the model produces predictions ^ , which are compared with the true labels using a suitable accuracy loss function (e.g., binary cross-entropy, hinge loss, or Huber loss). The resulting loss is then minimised by an optimiser.

The pseudocode below summarises the full training procedure for FairVIC. It integrates the accuracy loss alongside the proposed fairness-promoting objectives- variance, invariance, and covariance- as defined in Section 4.1. The total loss is computed as a weighted combination of these terms. Subsequently, gradients are computed, and the optimiser adjusts the model parameters with respect to this combined loss.

The multipliers enable users to balance the trade-of between fairness and predictive performance, which is typical in bias mitigation techniques. Assigning a higher weight to acc directs the model to prioritise accuracy, while increasing the weights of ( var, inv, cov) shifts the focus towards enhancing fairness in the model’s predictions. In our implementation, the lambda coeficients ( acc, var, inv, cov) are constrained such that their sum equals one. In other words, acc = 1 − var − inv − cov. This normalisation ensures the optimisation will not produce multiple solutions in the form { · acc, · var, · inv, · cov}, ∈ R.

Algorithm 1: FairVIC Loss Function

Detailed metadata for each dataset, including our selection of privileged group, can be found in Table 3. Note that for the language datasets, the number of features is obtained by combining the protected characteristic and the target label with the 50 tokenised text features and for image datasets, this is the pixels in an image plus the protected attribute and target label. For BiasBios, we take architect, attorney, dentist, physician, professor, software engineer, surgeon as the favourable professions, and interior designer, journalist, model, nurse, poet, teacher, and yoga teacher as the unfavourable professions for our binary classification task.

A.3. Neural Network Configuration

The configurations for the neural networks utilised for the tabular, language, and image data can be seen in Table 4. To obtain results, each model was run 10 times over random seeds, with a randomised train/test split each time. The averages and standard deviations were then outputted from across all 10 of the runs.

A visualisation for our neural network architecture for tabular data is seen in Figure 2, alongside our loss terms to illustrate where FairVIC components are applied.

All models were run with minimal and consistent data preprocessing. While some models, such as MetaFair, may underperform due to their reliance on specific sampling techniques, all comparable methods are treated uniformly as in-processing techniques. This allows them to be applied to any dataset, ensuring a fair evaluation across models.

B. Full Training Results

In addition to the results and analysis presented in Section 6, this section provides supplementary experiments and figures. First, the qualitative visualisations for the COMPAS and German Credit can be seen in Figure 3, following the analysis of the Adult Income dataset in Figure 1. Discussion on these results can be seen in Section 6.1.

(a) COMPAS dataset.

(b) German Credit dataset.

B.1. Feature Importances

Figure 4 shows the feature importance of the baseline and FairVIC models across the three tabular datasets. In all baseline models, the protected attributes show some importance to the decision-making process, such as in the COMPAS dataset, where race is a dominant feature. Combined with the results presented in Section 6.1, this suggests that the baseline models are prone to using the protected attribute to propagate bias. Additionally, proxy variables (highlighted with their importance in black), which are strongly correlated with the protected attributes, further show how bias can be perpetuated in the baseline model. For example, in the Adult Income dataset, relationship has a mean feature importance of 0.0124. This indicates that even though the model appears to have limited reliance on the protected attribute sex (which is among the least used features), it may still propagate bias through proxies such as relationship.

In contrast, the FairVIC models for all three datasets demonstrate a strong reduction in the mean importance of protected attributes and proxy variables. This reduction is due to the three additional terms used in FairVIC- variance, invariance, and covariance. We can see that the covariance term exactly minimises the model’s dependency on the protected characteristic, which, in combination with results in Section 6.1, suggests a fairer decision-making process. The reduction in proxy variables should also be noted. Not only does FairVIC successfully reduce the reliance on the protected attribute, but it can also reduce the reliance on any features strongly correlated to the protected attribute. For example, in the Adult Income dataset, sex and relationship have a strong negative correlation (− 0.58) meaning a model cannot only propagate bias through the use of sex but also through the use of relationship which we see the baseline model rely upon. The FairVIC model sees the mean feature importance of relationship drop by approximately a third and the importance of sex drop by half. This shows FairVIC’s ability to mitigate both direct and indirect biases, leading to more equitable outcomes. On the COMPAS dataset, while race remains as the second most important feature, its actual importance dropped by ≈ 41%.

(a) Adult Income dataset.

(b) COMPASdataset.

B.2. Individual Fairness Results

Following the analysis found in Section 6.2, Table 5 shows the individual fairness on both the baseline and FairVIC with our recommended lambdas, and FairVIC Invariance ( acc = 0.1, inv = 0.9, var, cov = 0.0) models using their absolute diferences to their counterfactual model results. In the Adult Income dataset, the mean absolute diference across all six metrics combined for the baseline model is 0.0094, Adult Income while for FairVIC invariance it is 0.0055. In the COMPAS dataset, the mean absolute diference for the baseline model is 0.0285, while for FairVIC Invariance it is 0.0050. Finally, for the German Credit dataset the mean absolute diference for the baseline model is 0.0277, while for FairVIC Invariance it is 0.0108. FairVIC’s invariance term, designed to enhance individual fairness, proves to be efective. The FairVIC invariance model consistently achieves significantly absolute diferences, demonstrating the success of the approach. In our selection of FairVIC terms, we prioritize group fairness by weighting invariance lower, yet the model still maintains low counterfactual absolute diferences.

For discussion on the FairVIC Invariance model individual fairness results, see Section 6.2. Counterfactual (CF) model results and absolute diferences (ADs) for the baseline, FairVIC ( acc, var, inv = 0.1, cov = 0.7), and FairVIC Invariance ( acc = 0.1, inv = 0.9, var, cov = 0.0) models.

B.3. Hyperparameter Recommendations

The weights for the loss terms in FairVIC ( acc, var, inv, cov) were chosen based on insights from our ablation studies. We have outlined the weights we used in our evaluation in Table 6.

To help users configure FairVIC efectively, we provide a recommended starting point and guidance for adapting the loss term weights to suit diferent fairness and performance goals. While our paper focuses on generalisable defaults, in a real-world deployment, one could envisage a tuning process to identify optimal results for a given application or domain.

We recommend starting with the configuration acc = 0.1, var = 0.1, inv = 0.1, and cov = 0.7, which provided strong results across multiple datasets including COMPAS, German Credit, and CivilComments-WILDS. This balanced setting encourages individual fairness, supports diverse representations, and strongly targets group fairness without significantly compromising accuracy. The decision to use a relatively low weight for accuracy ( acc = 0.1) stems from the equal ablation study results, which demonstrated that this value achieves the best fairness-accuracy trade-of for these datasets. Group fairness is given significant emphasis, as shown by the higher weight assigned to the covariance

Adult Income Compas German Credit CivilComments- WILDS BiasBios CelebA UTKFace term ( cov = 0.7), which plays a key role in minimizing disparities across protected groups. Meanwhile, the variance ( var) and invariance ( inv) terms were assigned a weight of 0.1, as this value still allowed for their individual fairness aims to be achieved efectively, thus balancing all fairness and accuracy objectives.

To adjust for diferent dataset characteristics or application goals, users should consider the intended role of each loss term:

If accuracy needs improvement while fairness is already strong, increasing the accuracy term weight can help the model prioritise predictive performance. For instance, in the Adult Income and UTKFace datasets, increasing acc to 0.2 (while slightly lowering cov) led to better trade-ofs. This is likely because both datasets have relatively high-quality features and well-separated class distributions, allowing the model to benefit from a greater emphasis on discriminative capacity once basic fairness has been addressed.

When group fairness is a higher priority, the covariance term should be weighted more heavily. This term explicitly penalises statistical dependence between predictions and group membership, and is most impactful when datasets show strong baseline disparities between protected and privileged groups. For example, image-based datasets such as CelebA often contain visually encoded group cues that strongly correlate with target labels, while text-based datasets like BiasBios may exhibit linguistic patterns linked to social or demographic attributes. In such cases, increasing cov (e.g. to 0.8 or 0.85) helps reduce spurious correlations and promotes more group-independent predictions.

To enhance individual fairness, a higher weight on the invariance term helps the model maintain consistency under counterfactual changes to the protected attribute. This is especially useful for datasets where protected features are not strongly entangled with the label, such that counterfactual consistency is a plausible fairness goal. Although our default already includes invariance, users with specific fairness requirements may find that increasing inv ofers more robust guarantees.

In practice, the ideal configuration depends on the dataset size, complexity, and the fairnessperformance trade-of required by the application. Larger or more complex datasets may require a slightly higher weight on the accuracy term to avoid underfitting, while fairness-sensitive domains may justify higher weights on the fairness terms.

• Default configuration: acc = 0.1, var = 0.1, inv = 0.1, cov = 0.7 • To improve accuracy: Increase acc • To emphasise group fairness: Increase cov • To emphasise individual fairness: Increase inv

If users have a specific optimisation goal—such as maximising a particular fairness or accuracy metric—a targeted grid search over the loss weights may be appropriate, though this falls outside the primary scope of our work.

B.4. Model Representation Analysis

An example latent space visualisation from the baseline model and FairVIC can be seen in Figure 5. In the baseline model, we observe a separation between subgroups, where women (subgroup 0) are predominantly located in the upper region and men (subgroup 1) in the lower region of the latent space. This separation suggests that the baseline model’s representations may be influenced by the protected attribute, leading to the biased decision-making reported in Table 1. In contrast, the FairVIC model shows a more condensed and overlapping distribution of both subgroups within the same latent space. This, alongside results in Table 1 and feature importance in Figure 4, indicates that FairVIC has successfully reduced the model’s reliance on the protected characteristic and any proxy variables, thereby promoting more equitable representations. The overlapping and compact structure in the FairVIC latent space demonstrates that similar data points, regardless of their subgroup membership, are mapped closer together, ensuring that the model’s predictions are not unfairly biased towards one group over the other.

B.5. Model Optimization Analysis

Figure 6 illustrates the loss landscapes of the baseline and FairVIC models on the Adult Income dataset. Both models exhibit smooth loss surfaces, indicating that they are relatively well-optimized. The baseline model (left) shows a stable loss landscape with a slight gradient. The FairVIC model (right), despite incorporating additional fairness constraints, maintains a similarly smooth surface albeit with tiny peaks in various places. This demonstrates that the inclusion of variance, invariance, and covariance terms in the loss function does not introduce instability or optimisation challenges.

B.6. Theoretical Analysis and Discussion

In this section, we theoretically analyse FairVIC and show how each individual loss term is subdiferentiable. where is the latent embeddings for the input . = () is continuous in , where is the function/layer that maps input to the latent embedding . () is the standard deviation of a continuous variable in a finite sample, which is continuous except at rare instances where all are identical. Even in this degenerate case, (· ) is sub-diferentiable. The max(0, · ) operator is only non-diferentiable at 0, where the sub-derivative set is [ 0, 1 ]. Hence max(0, · ) is sub-diferentiable w.r.t . The invariance term is defined as: (5) (6) (7) where ^ is the model’s predictions, and ^* is the model’s predictions where the protected attribute is flipped. As ^ is diferentiable in , then (^ − ^* )2 is diferentiable as it is the composition of smooth functions. The covariance term is defined as: 1 ⎸⎸⎯∑︁ ((^ − =1 B.6.1. Theorem 1 Theorem 1. Each individual term in FairVIC , , is sub-diferentiable everywhere in the model’s parameters .

Proof. The variance term is defined as:

where ∑︀=1 ((^ − E[^]) · )2 is a sum of squares, which is smooth and diferentiable. The square root is diferentiable for non-zero input and sub-diferentiable at 0.

Each of the three terms is (sub-)diferentiable everywhere in . Hence a gradient-based or subgradientbased method can be applied directly with FairVIC.

B.7. Trade-Of Discussion

The design of FairVIC is grounded in the recognition that group fairness metrics such as Statistical Parity Diference, Equalized Odds Diference, and Disparate Impact capture distinct and often conflicting notions of fairness. Foundational theoretical work by Kleinberg et al. [41] has shown that, under realistic conditions- such as difering base rates between groups- it is impossible for any prediction system to simultaneously satisfy multiple fairness criteria, such as calibration and equalized odds, without introducing trade-ofs.

FairVIC does not attempt to satisfy these fairness criteria simultaneously in a formal sense. Rather than optimising for any specific fairness definition directly, FairVIC introduces inductive biases at the representation level through variance, invariance, and covariance regularisation. These objectives are agnostic to downstream fairness metrics and aim to induce feature representations that are disentangled from protected attributes while preserving predictive signal.

The improvements observed across multiple fairness metrics are therefore not the result of directly encoding incompatible fairness constraints, but an emergent property of the learned representations. As shown in Table 1, FairVIC achieves significant reductions in Statistical Parity Diference and Equalized Odds Diference across all datasets, and improves Disparate Impact in most cases. In the case of our experiments, these fairness improvements come with modest reductions in predictive performanceconsistent with expected trade-ofs. Sometimes, improvements to one fairness metric such as Disparate equally, such that acc + var + inv + cov = 1. Impact- slightly afect the results for another fairness metric, highlighting that such trade-ofs are context-dependent and not inevitable.

Importantly, FairVIC does not commit to a normative fairness definition during training. This design choice reflects a practical and ethical consideration: in many real-world settings, there may be no consensus on which fairness metric best captures the relevant notion of harm or justice. By focusing on representation-level properties rather than metric-specific constraints, FairVIC supports post hoc evaluation across multiple fairness metrics, encouraging empirical pluralism and transparency in fairness assessments.

In summary, FairVIC does not circumvent impossibility theorems, nor does it attempt to. Rather, it sidesteps the need to encode conflicting fairness notions directly, and instead fosters representation learning that generalises well across subgroups. The observed metric improvements - and their trade-ofs - are empirical outcomes of this strategy, not theoretical contradictions.

C. Lambda Ablation Study Results Tables 7 and 8 show the full results for each

adapted. Table 7 shows the efect of changing model when the weights on the FairVIC terms are acc while keeping the FairVIC terms equal so that var, inv, cov = 1− 3 acc , where 0 < acc < 1, and Table 8 sets acc = 0.1, and suppresses one or two FairVIC terms to explore the efect of only utilising one or two term(s) at a time. For full discussion and analysis of the results of the lambda ablation study, see Section 6.4.

Accuracy

[1]

Esteva ,

Kuprel ,

R. A.

Novoa ,

Ko ,

S. M.

Swetter ,

H. M.

Blau ,

Thrun , Dermatologist-level classification of skin cancer with deep neural networks , Nature 542 ( 2017 ) 115 - 118 .

[2]

Dixon ,

Klabjan ,

J. H.

Bang , Classification-based financial markets prediction using deep neural networks , Algorithmic Finance 6 ( 2017 ) 67 - 77 .

[3]

Vardarlier ,

Zafer , Use of artificial intelligence as business strategy in recruitment process and social perspective, Digital business strategies in blockchain ecosystems: Transformational design and future of global business ( 2020 ) 355 - 373 .

[4]

Birhane , The unseen Black faces of AI algorithms , Nature 610 ( 2022 ) 451 - 452 .

[5]

J. G.

Cavazos ,

P. J.

Phillips ,

C. D.

Castillo , A. J. O'Toole , Accuracy comparison across face recognition algorithms: Where are we on measuring race bias? , IEEE transactions on biometrics, behavior, and identity science 3 ( 2020 ) 101 - 111 .

[6]

Salimi ,

Rodriguez ,

Howe ,

Suciu , Interventional fairness: Causal database repair for algorithmic fairness , in: Proceedings of the 2019 International Conference on Management of Data , 2019 , pp. 793 - 810 .

[7]

Caton ,

Haas , Fairness in machine learning: A survey, ACM Computing Surveys ( 2020 ). URL: https://api.semanticscholar.org/CorpusID:222208640.

[8]

Berk ,

Heidari ,

Jabbari ,

Joseph ,

Kearns ,

Morgenstern ,

Neel ,

Roth , A convex framework for fair regression , arXiv preprint arXiv:1706.02409 ( 2017 ).

[9]

Shekhar , G. Fields,

Ghavamzadeh , T. Javidi, Adaptive sampling for minimax fair classification , in: Advances in Neural Information Processing Systems , volume 34 , Curran

Associates

, Inc., 2021 , p. 24535 - 24544 . URL: https://proceedings.neurips.cc/paper/2021/hash/ cd7c230fc5deb01f5f7b1be1acef9cf-Abstract.html.

[10]

Ustun ,

Liu ,

Parkes , Fairness without harm: Decoupled classifiers with preference guarantees , in: Proceedings of the 36th International Conference on Machine Learning, PMLR , 2019 , p. 6373 - 6382 . URL: https://proceedings.mlr.press/v97/ustun19a.html.

[11]

Calders , I. Žliobaitė, Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures, Springer, Berlin, Heidelberg, 2013 , p. 43 - 57 . URL: https://doi.org/10.1007/ 978-3- 642 -30487- 3 _3. doi: 10 .1007/978-3- 642 -30487- 3 _ 3 .

[12]

Kamiran ,

Calders , Data preprocessing techniques for classification without discrimination , Knowledge and Information Systems 33 ( 2012 ) 1 - 33 . doi: 10 .1007/s10115-011-0463-8.

[13]

Jang ,

Zheng ,

Wang , Constructing a fair classifier with generated fair data , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 35 , 2021 , pp. 7908 - 7916 .

[14]

Chiappa ,

W. S.

Isaac , A causal bayesian networks viewpoint on fairness, Privacy and

Identity

Management . Fairness, Accountability, and Transparency in the Age of Big Data: 13th IFIP WG 9 . 2 , 9 .6/11.7, 11 .6/SIG 9. 2 . 2

International

Summer School , Vienna, Austria, August 20-24 , 2018 , Revised Selected Papers 13 ( 2019 ) 3 - 20 .

[15] M. J. Kusner , J.

Loftus , C.

Russell , R.

Silva , Counterfactual fairness, Advances in neural information processing systems 30 ( 2017 ).

[16]

Russell ,

M. J.

Kusner ,

Loftus ,

Silva , When worlds collide: integrating diferent counterfactual assumptions in fairness , Advances in neural information processing systems 30 ( 2017 ).

[17]

L. E.

Celis ,

Huang ,

Keswani ,

N. K.

Vishnoi , Classification with fairness constraints: A metaalgorithm with provable guarantees , in: Proceedings of the conference on fairness, accountability, and transparency , 2019 , pp. 319 - 328 .

[18]

Agarwal ,

Beygelzimer ,

Dudík ,

Langford ,

Wallach , A reductions approach to fair classification , in: International conference on machine learning, PMLR , 2018 , pp. 60 - 69 .

[19]

B. H.

Zhang ,

Lemoine , M. Mitchell, Mitigating unwanted biases with adversarial learning , in: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , 2018 , pp. 335 - 340 .

[20]

Wadsworth ,

Vera ,

Piech , Achieving fairness through adversarial learning: an application to recidivism prediction , arXiv preprint arXiv: 1807 . 00199 ( 2018 ).

[21]

Xu ,

Wu ,

Yuan ,

Zhang , X. Wu, Achieving causal fairness through generative adversarial