<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Md</forename><surname>Fahim Sikder</surname></persName>
							<email>fahim.sikder@liu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer and Information Science (IDA)</orgName>
								<orgName type="institution">Linköping University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Resmi</forename><surname>Ramachandranpillai</surname></persName>
							<email>r.ramachandranpillai@northeastern.edu</email>
							<affiliation key="aff1">
								<orgName type="department">Institute for Experiential AI</orgName>
								<orgName type="institution">Northeastern University</orgName>
								<address>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Daniel</forename><surname>De Leng</surname></persName>
							<email>daniel.de.leng@liu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer and Information Science (IDA)</orgName>
								<orgName type="institution">Linköping University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fredrik</forename><surname>Heintz</surname></persName>
							<email>fredrik.heintz@liu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer and Information Science (IDA)</orgName>
								<orgName type="institution">Linköping University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">167902C2C17B8883A484D636B2143E2C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Fair evaluation</term>
					<term>Benchmarking tool</term>
					<term>Synthetic data</term>
					<term>Data utility</term>
					<term>Explainability</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present FairX, an open-source Python-based benchmarking tool designed for the comprehensive analysis of models under the umbrella of fairness, utility, and eXplainability (XAI). FairX enables users to train benchmarking bias-mitigation models and evaluate their fairness using a wide array of fairness metrics, data utility metrics, and generate explanations for model predictions, all within a unified framework. Existing benchmarking tools do not have the way to evaluate synthetic data generated from fair generative models, also they do not have the support for training fair generative models either. In FairX, we add fair generative models in the collection of our fair-model library (preprocessing, in-processing, post-processing) and evaluation metrics for evaluating the quality of synthetic fair data. This version of FairX supports both tabular and image datasets. It also allows users to provide their own custom datasets. The open-source FairX benchmarking package is publicly available at https://github.com/fahim-sikder/FairX.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the rapid development of artificial intelligence-based systems to aid us in our daily lives, it is important for these systems to give outcomes that is acceptable for all users, including-but not limited to-from demographic perspective. Troublingly, as the available data is filled with human or machine bias, models trained with these dataset often gives unfair outcome towards some demographic <ref type="bibr" target="#b0">[1]</ref>. It is therefore critical to mitigate bias in the dataset and model. Over the years, researchers have used different techniques to achieve this <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. These techniques can be roughly grouped into three families: 1) Pre-processing, i.e. where the dataset is processed in such a manner that it produces less biased outcomes, before passing it to a model for training; 2) In-processing, i.e. where the model learns the original data distribution and shifts the data distribution to a fair distribution by adding some constraints during the training process; and 3) Post-processing, i.e. where the model's outcome is changed in such a manner that it gives fair outcomes relative to protected attributes. The performance of these models or datasets can be measured by the evaluation metrics that reflect both the fairness and data utility. To ease up the work for training models and evaluating them, researchers has developed benchmarking tool that bring the training and evaluation in one framework. Recently, research on fair generative models has found a lot of spotlight and measuring the quality of the synthetic data is as crucial as evaluating fairness and data utlity.</p><p>Existing fairness-related benchmarking tools focus on creating benchmarks and measuring their fairness on different datasets. For example, FairLearn <ref type="bibr" target="#b3">[4]</ref> by Microsoft contains several fair models and evaluation metrics for checking fairness and data utility. AI Fairness 360 (AIF360) <ref type="bibr" target="#b4">[5]</ref> by IBM also contains fairness evaluation metrics and basic data utility measuring metrics. But both of these frameworks lack the ability to train fair generative models and measure the data utility for synthetic data. For synthetic fair data, it is important to validate the quality of the generated data alongside measuring the fairness and other data utilities. Explainability is an essential property of fair models because it aids in making the model's decision-making process more transparent. These modules should therefore be included in such benchmarking tools.</p><p>In this work, we present FairX, an open-source modular fairness benchmarking tool, available to use at https://github.com/fahim-sikder/FairX. A high-level system overview is given in Figure <ref type="figure" target="#fig_0">1</ref>. FairX contains data processing techniques and benchmarking fairness models (incorporating pre-processing, in-processing, and post-processing), including generative fair models. We evaluate these models in terms of fairness, data utility. We also add evaluation methods for synthetic fair data (Advanced Utility) to check the quality of the generated samples. FairX supports both tabular and image data and can plot feature importance for down-streaming task using explainable algorithms.</p><p>The remainder of this paper is organised as follows. In Section 2 we discuss some background information that will help the reader understand the rest of the paper. We then present FairX in Section 3. Section 4 shows some fairness results obtained by FairX for a number of datasets and models. Finally, the paper looks ahead towards future improvements in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>In this section, we provide the necessary details to follow the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Bias mitigation methods</head><p>A variety of bias mitigation methods have been proposed in the literature based on data, training, and predictions. These methods can be broadly categorized into three main approaches: preprocessing, in-processing, and post-processing techniques.</p><p>Pre-processing. These techniques involve altering the training data to resolve any potential causes of biases before it is fed to the model. There are various techniques in the literature such as disparate impact remover <ref type="bibr" target="#b5">[6]</ref>, data cleaning and augmentation, and fair representation learning <ref type="bibr" target="#b6">[7]</ref>. This involves balancing the representation of different groups or generating synthetic data to augment underrepresented groups, assigning weights to uphold some minority groups, and transforming the data representation in a format that obscures protected features while maintaining feature attributions.</p><p>In-processing. This involves mitigating biases during training. The techniques involve fairness constraints, adversarial de-biasing <ref type="bibr" target="#b7">[8]</ref>, and fairness-aware learning. In fairness constraints training, a multi-objective optimization combining a prediction loss and a fairness penalty will be used such as adding regularization terms to the objective function that penalizes unfairness or incorporating fairness metrics as part of the optimization process. In adversarial de-biasing <ref type="bibr" target="#b7">[8]</ref>, adversarial training is used to reduce bias. The model is trained to perform well on the primary classification/prediction tasks while simultaneously trying to prevent an adversary from predicting the protected features, thus forcing the model to learn less biased representations.</p><p>Post-processing. These methods are applied to the predictions of a classifier. Techniques such as threshold adjustment, calibration <ref type="bibr" target="#b8">[9]</ref>, and Reject Option Classifications <ref type="bibr" target="#b9">[10]</ref> fall under this category. In threshold adjustment the decision thresholds of a trained model are adjusted to ensure that the outcomes meet the chosen fairness metric. Calibration <ref type="bibr" target="#b8">[9]</ref> ensures that the predicted probabilities maintain the true likelihood of outcomes equally across different demographic groups. Techniques like equalized odds post-processing is used where the model's outputs are adjusted to satisfy fairness constraints. Reject Option-Based Classification (ROC) <ref type="bibr" target="#b9">[10]</ref> allows the model to prevent from making a decision when the confidence is low, for the chosen sensitive attributes. This can reduce the likelihood of biased or unfair decisions in uncertain instances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Evaluation metrics</head><p>To measure the performance of models or dataset, various evaluation methods are being used. For evaluating fair model or checking the dataset for potential bias, different kinds of fairness metrics exists. For example, demographic parity checks if the decision from a down-streaming task is equal for each class in sensitive attributes. Fairness through unawareness <ref type="bibr" target="#b10">[11]</ref> checks </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Benchmarking Fairness Evaluation Synthetic Data Explainability Generative Model Tools</head><p>Evaluation Training</p><formula xml:id="formula_0">Fairlearn [4] ✓ ✗ ✗ ✗ AIF360 [5] ✓ ✗ ✓ ✗ Jurity [14] ✓ ✗ ✗ ✗ AEQUITAS [15] ✓ ✗ ✗ ✗ REVISE [17] ✓ ✗ ✗ ✗ FairBench [16] ✓ ✗ ✗ ✗ FairX (ours) ✓ ✓ ✓ ✓</formula><p>how the accuracy of down-stream task effects if no-sensitive attributes is used during the training and prediction phase. Adding fairness constraints to the models or datasets may change the data distributions and thereby affect the performance of the dataset or models <ref type="bibr" target="#b11">[12]</ref>. To check the data utility performance, we commonly use Accuracy score, F1-score, Precision and Recall. To evaluate the quality of the synthetic data researchers use, 𝛼-precision <ref type="bibr" target="#b12">[13]</ref>, 𝛽-recall <ref type="bibr" target="#b12">[13]</ref>. Also to check, is the generative model is truly generating new contents or not, the metrics authenticity <ref type="bibr" target="#b12">[13]</ref> is being used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Comparison of existing benchmarking tools</head><p>Over the years researchers have developed various fairness benchmarking tools which commonly include a dataset loader, different bias mitigation techniques and evaluation metrics. Fairlearn <ref type="bibr" target="#b3">[4]</ref> by Microsoft is one such benchmarking tool. It has support for different algorithms for bias mitigation and measuring the fairness of a model. AIF360 <ref type="bibr" target="#b4">[5]</ref> by IBM is another benchmarking tool. It supports a wide range of evaluation metrics (both for fairness and data utility) and bias-removal algorithms (in-processing, pre-processing and post-processing). Another example is Jurity <ref type="bibr" target="#b13">[14]</ref>. It contains recommender system evaluations, and various fairness and data utility functions. AEQUITAS <ref type="bibr" target="#b14">[15]</ref>, FairBench <ref type="bibr" target="#b15">[16]</ref> generate fairnes report and REVISE <ref type="bibr" target="#b16">[17]</ref> is a tool to detect and mitigate bias in the image dataset. More recently, in the area of generative models, there has been an increased interest in generating fair data in the image, tabular and medical domains <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b0">1,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22]</ref>. But the aforementioned benchmarking tools do not contain these models. Also, when evaluating models, other benchmarking tools, only measure the fairness and data utility of the models itself. But evaluation methods for generated data is needed. We need to verify the quality of the synthetic data. We also need to verify the authenticity of the synthetic data, to show the generative models are actually generating new content rather than just copying the data itself. FairX is bridging this gap. We add support for evaluating synthetic data and add generative models in our benchmarking tool. Table <ref type="table" target="#tab_0">1</ref> shows the comparison among the models with FairX.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">FairX</head><p>In this section we present FairX in detail. FairX is built on three primary modules, 1) the Data Loading Module, 2) the Bias-mitigating Techniques Module, and 3) the Evaluation Module. The main pipeline (shown in Figure <ref type="figure" target="#fig_0">1</ref>) works as follows. Given a dataset, FairX will pre-process it in a way that is compatible with the benchmarking model. Next the model will train itself using the dataset. After the training the evaluation module will give the results based on fairness, data utility and explain the outcome using explainability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data loading module</head><p>The BaseDataClass handles the internal processing of datasets and make it compatible with the bias-mitigating models that are present on our framework as well as making it easier to handle for other bias-mitigating models that are not present in this tool. This class contains different methods for handling different kinds of data extension (CSV, and others). We add three widely used tabular datasets (Adult-Income, COMPAS and Credit Card) and two image datasets (Colored MNIST and CelebA) in the benchmarking tool, and we plan to add more. The BaseDataClass process datasets based on numerical and categorical features. It also provides methods to normalize the dataset and is equipped with functionality for various encodings (e.g. One-hot encoding, QuantileTransformer). It also has a dataset-splitting function to split the dataset for training and testing purpose. We also add functionality to prepare the dataset for explainability algorithms. Sample usage of datasets are described in Appendix Section A, Listing 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Custom Dataset Loader.</head><p>Besides adding widely used benchmarking datasets for fair data research, we also provide the option to use custom dataset. By using the CustomDataClass, users can load their own dataset (CSV, TXT, etc.) and train the models. Users need to specify the sensitive attributes and target attributes while using the CustomDataClass. Pre-processing and other functionalities are also available in this class, like in the BaseDataClass. We present sample usage of CustomDataClass in Listing 4 of Appendix Section A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Bias-mitigating techniques module</head><p>One of FairX's main aims is to benchmark different bias-mitigation techniques on various datasets. Over the years, different techniques have been proposed, and we add models from these techniques to the tool. For the benchmarking process, we use the same hyper-parameters used in their respective works. We create a common format for all the bias-mitigation techniques to make it easy for the users. For example, each bias-mitigation technique has its own class, which has model.fit() function. This fit() function takes the dataset and processes it (if needed for the specific model). For the generative models (in-processing techniques), this function also generates synthetic data and saves it as a Pandas dataframe. Sample usage of models is described in Appendix Section A, Listing 2. Pre-processing. We add the support for Correlation remover <ref type="bibr" target="#b3">[4]</ref> (CorrRemover in FairX) in the benchmarking. Correlation Remover removes the correlation between the sensitive attributes with other data features by using a linear transformation while keeping as much information as possible. It is also possible to control on how much correlation we want remove by using the remove_intensity parameter while the value 1.0 will result maximum correlation removal while 0.0 will do the opposite. We can access the pre-processing algorithm by using fairx.models.preprocessing.</p><p>In-processing. Most recent in-processing bias mitigation techniques are based on generative models. And the fairness benchmarking tools we mentioned in this work does not contain these models. One of our contribution of FairX is that, we add several fair generative models, such as, TabFairGAN <ref type="bibr" target="#b20">[21]</ref>, Decaf <ref type="bibr" target="#b21">[22]</ref> and Fairdisco <ref type="bibr" target="#b0">[1]</ref>. We can access the in-processing algorithm by using fairx.models.inprocessing module. After training, these models will generate and save the samples automatically.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Post-processing. For the post-processing bias mitigation technique, We add Threshold</head><p>Optimizer <ref type="bibr" target="#b3">[4]</ref>. This technique operates on a classifier and improve the output of its based on a fairness constraint. In this case, we use demographic_parity as a fairness constraint to improve the outcome of the classifier as presented in <ref type="bibr" target="#b3">[4]</ref>. For using the post-processing algorithm, we can use fairx.models.postprocessing module.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Evaluation module</head><p>In FairX, we to evaluate the performance of model or dataset using wide range of evaluation metrics. We evaluate in terms of fairness, data utility. Other existing fairness benchmarking tools, lacks the capability to measure the data quality of the synthetic data. It is necessary to check the data quality of the synthetic data as well as the fairness criteria. Here, we present the evaluation module FairX has and we use XGBoost as a classifier, also we keep the option to use scikit-learn's LogisticRegression.</p><p>Fairness Evaluation. We create the FairnessUtils class to accommodate fairness evaluation metrics. In this class, currently we add the support for checking the Demographic Parity Ratio, Equalized Odds Ratio, Fairness Through Unawareness (FTU) metrics. We also have plan to add more metrics over the time. Fairness metrics can be accessed using the fairx.metrics.FairnessUtils module.</p><p>Data Utility. Beside checking the fairness criteria of the datasets or models, we also add the functionality to check the data utility using FairX. We add the support for checking the Accuracy, Precision, Recall, AUROC, and F1-score. And these functions can be accessed by using the fairx.metrics.DataUtilsMetrics module.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Synthetic Data Evaluation.</head><p>In FairX, we add the functionality to evaluate the quality of the generated data by the fair generative models. It is important to validate the quality of the synthetic data along with the validation of fairness and data utility criteria. Existing fairness measuring benchmark do not have the functionality to evaluate the synthetic data quality. We evaluate the synthetic data quality in terms of fidelity, diversity and check if the synthetic data has any trace of original data in it <ref type="bibr" target="#b24">[25]</ref>. We use 𝛼-precision <ref type="bibr" target="#b12">[13]</ref> to evaluate the fidelity of the synthetic data, 𝛽-recall <ref type="bibr" target="#b12">[13]</ref> to check the diversity and Authenticity <ref type="bibr" target="#b12">[13]</ref> is used to check if the generative models are just memorising the training data or not. Synthetic data evaluation module can be accessed from fairx.metrics.SyntheticEvaluation. We also add the t-SNE and PCA plots to check the fidelity and diversity of the synthetic data too, more about the plots are discussed in section 3.4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Explainability.</head><p>We add the explainability functionality in FairX to explain the prediction of a model. We train a classifier (XGBoost) on the benchmarking datasets, and then we explain the prediction using the fairx.explainability.ExplainUtils module. This module is based on the TreeExplainer of SHAP <ref type="bibr" target="#b25">[26]</ref>. Beside this, we give the functionality to show the feature importance while making a decision. This functionality is especially useful when we want to see how much importance is given to the sensitive attributes while making a decision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Plotting</head><p>We add various plotting support in FairX. They can be accessed under the fairx.utils.plotting module. We add support to show the performance trade-off of model accuracy and their fairness performance. Also, we plot the feature importance to show which features are responsible for Evaluation on the Adult-Income dataset using different models presented at the FairX. Bold indicates best result, and all the metrics score are higher as better. Synthetic Data Evaluation is only applicable to the Fair Generative Models (i.e. TabFairGAN and Decaf).  prediction outcome. This comes in handy analyzing original data and synthetic fair data to see how much the fair model reduce the feature importance for the sensitive attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fairness Metrics</head><p>To show the quality of the synthetic data generated by the fair generative models, we add PCA and t-SNE plots. These plots shows how close the synthetic data is from the original data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results and discussion</head><p>We now consider the fairness, data utility and synthetic data evaluation (only for in-processing generative models) of the models presented in this benchmarking tool. We also present the explainability analysis where we use the generated data by in-processing generative models and show how the fair generated data perform on down-streaming task and how the prediction is affected by the sensitive attributes. We also show the feature importance by using these explainability analysis.</p><p>Table <ref type="table">3</ref> and 4 shows the performance of the bias mitigation algorithms for the Adult-Income dataset and Compas dataset respectively. We run experiment using different Protected attributes <ref type="foot" target="#foot_0">1</ref> . Besides, fairness and data utility, we add synthetic data evaluation for the output of TabFairGAN<ref type="foot" target="#foot_1">2</ref> , and Decaf <ref type="foot" target="#foot_2">3</ref> .</p><p>From the table, we see for the generative fair models, TabFairGAN is performing well comparing with the Decaf in both datasets with both protected attributes. The 𝛼−precision, 𝛽−recall scores of TabFairGAN is better than Decaf, this represents the synthetic data quality of TabFair-GAN is superior than Decaf. On the other hand, TabFairGAN perform poorly in the fairness evaluation for the 'race' protected attribute of the Adult-Income dataset. Whereas In-processing technique FairDisco<ref type="foot" target="#foot_3">4</ref> performs well in terms of fairness and data utility.</p><p>On the visual evaluation of fair synthetic data, we use the synthetic data generated by TabFairGAN. Figure <ref type="figure" target="#fig_1">2</ref> shows the PCA and t-SNE plots of the synthetic data generated by TabFairGAN. We show how closely the synthetic data distribution is matching with the original data. If the generative model can capture the original data distribution, original and synthetic data should overlap with each other on the PCA and t-SNE plot. Figure <ref type="figure" target="#fig_1">2</ref> shows that data generated by TabFairGAN partially learned the data distribution of the original data.</p><p>In Figure <ref type="figure" target="#fig_2">3</ref>, we show the feature importance for a down-streaming task to predict the target attribute of the Adult-Income dataset where the 'Sensitive attribute' is 'sex'. We compared the  feature importance of original data with the synthetic data generated by TabFairGAN. We can see the feature importance of the synthetic data is lower than the original data. This means the synthetic data generated by the TabFairGAN is less biased towards entity.</p><p>Finally, Figure <ref type="figure" target="#fig_3">4</ref> shows the intersectional bias on the Adult-Income dataset. We plot the percentage of 'salary-income' for both 'race' and 'sex' protected attributes. We see in the dataset, decisions are given in favor towards white people.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and future work</head><p>Massive of data are being produced everyday. Unfortunately, much of this data contains human or machine biases. Furthermore, the usage of recommendation system has increased with advancements in artificial intelligence. But if we use biased data to train a recommendation system, there is a high chance that the recommendation system will yield unfair decision towards some demographics. To mitigate this issue, researchers have developed various measure to mitigate the bias from the dataset, or to train the model in such a way that the model produces bias-free data. To help in this process, benchmarking tools equipped with different bias-mitigation techniques and evaluation metrics were developed over the years. But these benchmarking tools commonly lack the option to evaluate generative models or to train them. We therefore presented FairX, an open-source, modular, fairness benchmarking tool. FairX comes with a data-loader, supports model training, and has an evaluation module. FairX provides support for training fair generative models and for evaluating the synthetic data created by them. FairX also contains various fairness evaluation metrics, data utility evaluation metrics and different plotting techniques to help users to evaluate models and visualize outcomes. FairX comes with support for explainability analysis of a prediction using the dataset (both original and synthetic) and shows feature importance. We believe FairX will help the researchers and mitigate the gap of not having fair generative models and way of evaluating synthetic data.</p><p>In the future, we intend to extend FairX to be able to handle other modalities in addition to tabular and image data, for example text and video. Also, we will add wider range of evaluation metrics for both synthetic data utility and fairness metrics. For the models, we plan to add text based and more tabular and image based fair generative models <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b26">27,</ref><ref type="bibr" target="#b17">18]</ref>. In this version of FairX, we do not have option to add custom models, but we plan to add this features in future version, so users can use their own model and use all the functionalities of FairX for their model. We also plan to add hyper-parameter optimization feature for the models so, we can find the optimal parameters and best result. Finally, we plan to add functionalities to evaluate the output of large language models.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A High-level overview of FairX. An input dataset (possibly custom) is fed to the FairX data loading module followed by a bias-mitigation module and an extensive evaluation module providing multi-faceted evaluations.</figDesc><graphic coords="2,89.30,84.19,416.69,127.56" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: PCA and t-SNE plots the original data and Synthetic data generated by TabFairGAN. Here each dot represents a record, if the generative model learns the original data distribution then the dots should overlap with each other. Dataset: 'Adult-Income', Protective attribute: 'sex'.</figDesc><graphic coords="9,150.55,84.19,291.69,145.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>3 :</head><label>3</label><figDesc>Feature Importance on Prediction task on the Original Data (left) and Synthetic Data (right) by TabFairGAN, the Sensitive Feature here is 'sex', The Feature Value of Sensitive Attribute in Synthetic Data is less than Original Data.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Representation of 'sex' and 'race' features on the target class, here we can see the dataset is heavily in favor of white people.</figDesc><graphic coords="10,89.29,308.79,416.69,166.67" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparison of existing benchmarking tools with FairX over different key areas of interests: Fairness Evaluation; Synthetic Data Evaluation; Model Explainability; and Generative Fair Model Training.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Breakdown of FairX-supported features.</figDesc><table><row><cell></cell><cell></cell><cell>Adult-Income, Compas, Credit-card</cell></row><row><cell>Dataset</cell><cell></cell><cell>Colored MNIST (Image)</cell></row><row><cell></cell><cell></cell><cell>CelebA (Image)</cell></row><row><cell></cell><cell>Pre-processing</cell><cell>Correlation Remover</cell></row><row><cell></cell><cell></cell><cell>TabFairGAN [21]</cell></row><row><cell>Models</cell><cell>In-processing</cell><cell>FairDisco [1]</cell></row><row><cell></cell><cell></cell><cell>Decaf [22]</cell></row><row><cell></cell><cell>Post-processing</cell><cell>Threshold Optimizer</cell></row><row><cell></cell><cell></cell><cell>Demographic Parity Ratio (DPR)</cell></row><row><cell></cell><cell>Fairness</cell><cell>Equilized Odds Ratio (EOR)</cell></row><row><cell></cell><cell></cell><cell>Fairness through Unawareness (FTU)</cell></row><row><cell>Metrics</cell><cell>Data Utility</cell><cell>AUROC, F1-score, Precision Recall, Accuracy</cell></row><row><cell></cell><cell></cell><cell>𝛼-precision [13]</cell></row><row><cell></cell><cell>Synthetic Data Evaluation</cell><cell>𝛽-recall [13]</cell></row><row><cell></cell><cell></cell><cell>Authenticity [13]</cell></row><row><cell></cell><cell></cell><cell>PCA [23] &amp; t-SNE [24] plots</cell></row><row><cell>Plotting</cell><cell></cell><cell>Feature Importance Fairness vs Accuracy</cell></row><row><cell></cell><cell></cell><cell>Intersectional Bias</cell></row><row><cell>Explainability</cell><cell></cell><cell>Explain prediction of a model Feature Importance</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Evaluation on the Compas dataset using different models presented at the FairX. Bold indicates best result, and all the metrics score are higher as better. Synthetic Data Evaluation is only applicable to the Fair Generative Models (i.e. TabFairGAN and Decaf).</figDesc><table><row><cell></cell><cell></cell><cell cols="2">Fairness Metrics</cell><cell></cell><cell>Data Utility</cell><cell></cell><cell cols="3">Synthetic Data Evaluation</cell></row><row><cell></cell><cell>Protected</cell><cell>DPR</cell><cell>EOR</cell><cell>ACC</cell><cell>AUC</cell><cell>F1-</cell><cell>𝛼-</cell><cell>𝛽-</cell><cell>Authenticity</cell></row><row><cell></cell><cell>Attribute</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Score</cell><cell>precision</cell><cell>recall</cell></row><row><cell>Correlation-</cell><cell>Gender</cell><cell cols="3">0.43 ± .01 0.33 ± .01 0.64 ± .01</cell><cell>0.64 ± .01</cell><cell>0.59 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell>Remover</cell><cell>Race</cell><cell cols="3">0.58 ± .01 0.63 ± .01 0.65 ± .01</cell><cell>0.64 ± .01</cell><cell>0.60 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell>TabFairGAN</cell><cell>Gender</cell><cell cols="7">0.52 ± .01 0.42 ± .01 0.68 ± .01 0.68 ± .01 0.66 ± .01 0.84 ± .01 0.70 ± .01</cell><cell>0.37 ± .01</cell></row><row><cell></cell><cell>Race</cell><cell cols="7">0.50 ± .01 0.49 ± .01 0.69 ± .01 0.68 ± .01 0.64 ± .01 0.94 ± .01 0.75 ± .01</cell><cell>0.33 ± .01</cell></row><row><cell>Decaf</cell><cell>Gender</cell><cell cols="3">0.87 ± .01 0.84 ± .01 0.45 ± .01</cell><cell>0.45 ± .01</cell><cell cols="3">0.42 ± .01 0.77 ± .01 0.45 ± .01</cell><cell>0.61 ± .01</cell></row><row><cell></cell><cell>Race</cell><cell cols="3">0.99 ± .01 0.96 ± .01 0.45 ± .01</cell><cell>0.45 ± .01</cell><cell cols="3">0.42 ± .01 0.77 ± .01 0.45 ± .01</cell><cell>0.61 ± .01</cell></row><row><cell>FairDisco</cell><cell>Gender</cell><cell cols="3">0.97 ± .01 0.92 ± .01 0.55 ± .01</cell><cell>0.54 ± .01</cell><cell>0.43 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell></cell><cell>Race</cell><cell cols="3">0.87 ± .01 0.76 ± .01 0.53 ± .01</cell><cell>0.53 ± .01</cell><cell>0.44 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell>Threshold</cell><cell>Gender</cell><cell cols="3">0.92 ± .01 0.98 ± .01 0.65 ± .01</cell><cell>0.65 ± .01</cell><cell>0.61 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell>Optimizer</cell><cell>Race</cell><cell cols="3">0.99 ± .01 0.76 ± .01 0.63 ± .01</cell><cell>0.63 ± .01</cell><cell>0.60 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell>Original Data</cell><cell>Gender</cell><cell cols="3">0.37 ± .01 0.28 ± .01 0.66 ± .01</cell><cell>0.65 ± .01</cell><cell>0.61 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row><row><cell></cell><cell>Race</cell><cell cols="3">0.54 ± .01 0.58 ± .01 0.66 ± .01</cell><cell>0.65 ± .01</cell><cell>0.61 ± .01</cell><cell>n/a</cell><cell>n/a</cell><cell>n/a</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">For the sake of brevity, we could not include additional results using other datasets-we refer the reader to the FairX repository for these results. Some metrics like precision, recall, fairness through unawareness (FTU), and plots like fairness-accuracy trade-offs were similarly omitted.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/amirarsalan90/TabFairGAN</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/vanderschaarlab/synthcity</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://github.com/SoftWiser-group/FairDisCo</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The work was partially funded by the Knut and Alice Wallenberg Foundation, and the TAILOR Network of Excellence for trustworthy AI (EC Grant Agreement 952215). Portions of this work were carried out using the AIOps/Stellar facilities funded by the Excellence Center at Linköping-Lund in Information Technology (ELLIIT).</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Detailed Usage</head><p>In this section, we present different sample code example of our tool. We give a brief description of each module and their corresponding class description and function details.</p><p>Dataset usage. To use the dataset already pre-loaded with the tool, we need to use the BaseDataClass. This class takes three hyperparameters as input; dataset_name, sensi-tive_attirbute and a boolean flag for attaching the target variable with the main dataframe. BaseDataClass has two functions, preprocess_data() and split_data() to preprocess the dataset using categorical, numerical transformation and split the dataset for training and testing purpose respectively. Model usage. We add three kinds of bias-removal techniques under the models folder of FairX. The list of available models can be found in Table <ref type="table">2</ref>. Here is an example usage of inprocessing algorithm called TabFairGAN. After initializing the Model, we train the it by calling fit() function which takes the dataset, batch size and number of epochs as parameters.</p><p>After training, for the fair generative models (TabFairGAN and Decaf), synthetic data will be automatically saved in the working directory. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Metrics usage.</head><p>Here, we give a sample code for measuring the fairness and data utilities with a dataset that is already part of the FairX system. Both FairnessUtils and DataU-tilsMetrics class takes the dataset as input and then we call the evaluate_fairness() and evaluate_utility() function to measure the fairness data utilities respectively. The result is stored as a dictionary file. The following code example is to use the CustomDataClass to load custom dataset in FairX. We need to give the dataset path, list of sensitive attributes and a boolean operator for attaching the target. This code also shows the usage of synthetic data evaluation using the SyntheticEvaluation class. </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Fair representation learning: An alternative to mutual information</title>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Tong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1088" to="1097" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Bias in data-driven artificial intelligence systems-an survey</title>
		<author>
			<persName><forename type="first">E</forename><surname>Ntoutsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Gadiraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Iosifidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Nejdl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-E</forename><surname>Vidal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruggieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Turini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Papadopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Krasanakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e1356</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A survey on bias and fairness in machine learning</title>
		<author>
			<persName><forename type="first">N</forename><surname>Mehrabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Morstatter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Saxena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lerman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Galstyan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM computing surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="1" to="35" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Fairlearn: Assessing and improving fairness of ai systems</title>
		<author>
			<persName><forename type="first">H</forename><surname>Weerts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dudík</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Edgar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jalali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Madaio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1" to="8" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Bellamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hind</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Houde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kannan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lohia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mojsilović</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IBM Journal of Research and Development</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="4" to="5" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Certifying and removing disparate impact</title>
		<author>
			<persName><forename type="first">M</forename><surname>Feldman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Friedler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Moeller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scheidegger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Venkatasubramanian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</title>
				<meeting>the 21th ACM SIGKDD international conference on knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="259" to="268" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Learning fair representations</title>
		<author>
			<persName><forename type="first">R</forename><surname>Zemel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Swersky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pitassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dwork</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="325" to="333" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Mitigating unwanted biases with adversarial learning</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lemoine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society</title>
				<meeting>the 2018 AAAI/ACM Conference on AI, Ethics, and Society</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="335" to="340" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">On fairness and calibration</title>
		<author>
			<persName><forename type="first">G</forename><surname>Pleiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kleinberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Decision theory for discrimination-aware classification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Kamiran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Karim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2012 IEEE 12th international conference on data mining</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="924" to="929" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Auditing fairness under unawareness through counterfactual reasoning</title>
		<author>
			<persName><forename type="first">G</forename><surname>Cornacchia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">W</forename><surname>Anelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Biancofiore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Narducci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ragone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">Di</forename><surname>Sciascio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="page">103224</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Understanding instance-level impact of fairness constraints</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">E</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="23114" to="23130" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Alaa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Breugel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Saveliev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Der Schaar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="290" to="306" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Surrogate membership for inferred metrics in fairness evaluation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thielbar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kadıoğlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dannull</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning and Intelligent Optimization</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="424" to="442" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Saleiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kuester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hinkson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>London</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stevens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anisfeld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">T</forename><surname>Rodolfa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ghani</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1811.05577</idno>
		<title level="m">Aequitas: A bias and fairness audit toolkit</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Krasanakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Papadopoulos</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2405.19022</idno>
		<title level="m">Towards standardizing ai bias exploration</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">REVISE: A tool for measuring and mitigating bias in visual datasets</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Narayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Russakovsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conference on Computer Vision (ECCV)</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ramachandranpillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Sikder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bergström</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Heintz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">-</forename><surname>Bt</surname></persName>
		</author>
		<author>
			<persName><surname>Gan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence Research (JAIR)</title>
		<imprint>
			<biblScope unit="volume">79</biblScope>
			<biblScope unit="page" from="1313" to="1341" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Fairgan: Gans-based fairness-aware learning for recommendations with implicit feedback</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Deng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM web conference 2022</title>
				<meeting>the ACM web conference 2022</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="297" to="307" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Fair Latent Deep Generative Models (FLDGMs) for Syntax-Agnostic and Fair Synthetic Generation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ramachandranpillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Sikder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Heintz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECAI 2023</title>
				<imprint>
			<publisher>IOS Press</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1938" to="1945" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Tabfairgan: Fair tabular data generation with generative adversarial networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rajabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">O</forename><surname>Garibay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning and Knowledge Extraction</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="488" to="501" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Decaf: Generating fair synthetic data using causally-aware generative networks</title>
		<author>
			<persName><forename type="first">B</forename><surname>Van Breugel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kyono</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Berrevoets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Der Schaar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="22221" to="22233" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Principal-Components Analysis and Exploratory and Confirmatory Factor Analysis</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">B</forename><surname>Bryant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">R</forename><surname>Yarnold</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Visualizing Data using T-SNE</title>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Sikder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ramachandranpillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Heintz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.12667</idno>
		<title level="m">Transfusion: Generating long, high fidelity time series using diffusion models with transformers</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">From local explanations to global understanding with explainable ai for trees</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Erion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Degrave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Prutkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Nair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Katz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Himmelfarb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="2522" to="5839" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Fair generative modeling via weak supervision</title>
		<author>
			<persName><forename type="first">K</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Grover</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ermon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1887" to="1898" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
