<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mitigating Bias in Medical Datasets: A Comparative Analysis of Generative Adversarial Networks (GANs) Based Data Generation Techniques ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mohamed</forename><surname>Ashik</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Regulated Software Research Centre (RSRC)</orgName>
								<orgName type="institution">Dundalk Institute of Technology</orgName>
								<address>
									<settlement>Dundalk</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Shahul</forename><surname>Hameed</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Regulated Software Research Centre (RSRC)</orgName>
								<orgName type="institution">Dundalk Institute of Technology</orgName>
								<address>
									<settlement>Dundalk</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Asifa</forename><surname>Mehmood Qureshi</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Regulated Software Research Centre (RSRC)</orgName>
								<orgName type="institution">Dundalk Institute of Technology</orgName>
								<address>
									<settlement>Dundalk</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Abhishek</forename><surname>Kaushik</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Regulated Software Research Centre (RSRC)</orgName>
								<orgName type="institution">Dundalk Institute of Technology</orgName>
								<address>
									<settlement>Dundalk</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Mitigating Bias in Medical Datasets: A Comparative Analysis of Generative Adversarial Networks (GANs) Based Data Generation Techniques ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0174F3DCD71F70231A151BA2779559D9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:14+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Bias</term>
					<term>fairness</term>
					<term>medical datasets</term>
					<term>GANs</term>
					<term>TGAN</term>
					<term>CTGAN</term>
					<term>MedGAN</term>
					<term>MC-MedGAN</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The increasing use of Artificial intelligence (AI) in the medical domain has highlighted a critical issue: bias in datasets. Biases in medical datasets can lead to skewed predictions, unfair clinical decisions, incorrect diagnoses and poor generalisation of AI models. Very often, these biases are the consequence of imbalance in the dataset. Generative Adversarial Networks (GANs) have appeared to be a promising solution for solving the data imbalance issue. Synthetic data can help mitigate bias by balancing the dataset for sensitive attributes as well as for class labels. However, the efficiency of different GAN variants in mitigating bias remains unexplored in the medical domain. This paper investigates and compares various GAN variants to identify the most effective approach to producing balanced data. In this study, we evaluated different variants of GAN on three medical datasets with the aim of contributing to the development of more fairer and inclusive AI models in the medical domain. The study shows that the performance of the Machine Learning (ML) model improves when the dataset is balanced using synthetic data samples. Moreover, the MedGAN variant performs better when compared with other variants of GAN.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Bias in Artificial Intelligence (AI) models refers to AI systems that produce biased results that reflect and amplify human prejudices within a community, encompassing past and contemporary social injustices <ref type="bibr" target="#b0">[1]</ref>. These biases when replicated in medical datasets can have life-threatening consequences due to incorrect diagnosis or treatment recommendations <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. For example, German researchers built a skin cancer detection system using neural networks in 2016. The system was able to detect 95% of melanoma cases accurately. It was trained on 10,000 skin images and outperformed 58 dermatologists. Later, it was found that the data was highly dominated by white skin images and did not generalise well to a diverse population <ref type="bibr" target="#b3">[4]</ref>. These biases can be handled at pre-processing, algorithmic level or in the post-processing stages of an AI model development <ref type="bibr" target="#b4">[5]</ref>. Handling bias would help achieve fair models that do not discriminate against different groups and treat them unfairly <ref type="bibr" target="#b5">[6]</ref>. Pre-processing techniques involve handling bias at the data level. One of the widely used techniques to mitigate bias is over-sampling. Over-sampling is the generation of synthetic data that mirrors the characteristics of real-world data. It helps to reduce bias by balancing the representation of different demographic groups so that machine learning models produce reasonable outcomes and generalise well over a diverse population <ref type="bibr" target="#b6">[7]</ref>. There are several techniques to generate synthetic data to ensure fairness in medical datasets <ref type="bibr" target="#b7">[8]</ref>. These techniques include SMOTE <ref type="bibr" target="#b8">[9]</ref>, FairSMOTE <ref type="bibr" target="#b9">[10]</ref>, BorderlineSMOTE <ref type="bibr" target="#b10">[11]</ref>, and Cluster-based over-sampling <ref type="bibr" target="#b11">[12]</ref>. Moreover, deep learning is also widely used to generate artificial data because of its high efficiency and accuracy in generating data. The most commonly used algorithm is the Generative Adversarial Network (GANs) that have gained immense popularity in the research community <ref type="bibr" target="#b12">[13]</ref>.</p><p>GAN is a deep learning model that mainly consists of two neural networks: a Generator used to generate artificial data and a discriminator that tries to distinguish between real and synthetic data to improve quality. These models were first introduced to process only image data, but later different variants of GAN were proposed to process tabular data as well. These variants include Tabular GAN (TGAN) <ref type="bibr" target="#b13">[14]</ref>, Conditional Tabular GAN (CTGAN) <ref type="bibr" target="#b14">[15]</ref>, Medical GAN (MEDGAN) <ref type="bibr" target="#b15">[16]</ref>, Multi-Categorical GAN (MC-MedGAN) <ref type="bibr" target="#b16">[17]</ref> and many more.</p><p>In this study, we evaluated various GAN variants including GAN, TGAN, CTGAN, MedGAN, and MC-MedGAN to generate synthetic samples to balance different group representations within medical datasets. The newly balanced dataset was fed into different ML models including Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbour (KNN) to draw a comparison. The GAN models are evaluated on three different medical datasets that consist of gender as a sensitive attribute to balance: the Asthma Disease Dataset <ref type="bibr" target="#b17">[18]</ref>, the Heart Disease Prediction Dataset <ref type="bibr" target="#b18">[19]</ref>, and the Cancer Prediction dataset <ref type="bibr" target="#b19">[20]</ref>. The performance is evaluated using various metrics i.e., accuracy, precision, F1-score, recall, and Area Under Curve (AUC) scores. Fairness is evaluated using Equal Opportunity (EO) <ref type="bibr" target="#b20">[21]</ref>, Propensity Score (PS) <ref type="bibr" target="#b21">[22]</ref>, and Statistical Parity (SP) <ref type="bibr" target="#b22">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Motivation</head><p>In today's world, AI is an integral part of the healthcare system. The AI model must incorporate transparency and accountability. The goal of this research is to reduce bias in medical datasets that contain inherent biases due to unequal representation of different demographic groups. AI models can become unfair and imbalanced, particularly in the healthcare sector, where underrepresented groups may receive scant care. Bias in medical datasets poses a significant challenge to the reliability of predictive models <ref type="bibr" target="#b23">[24]</ref>. This could be critical for healthcare systems since an automated model prediction has a direct effect on patients that affects their mental health, and quality of life or may risk the life of an individual <ref type="bibr" target="#b24">[25]</ref> as well it also leads to financial loss <ref type="bibr" target="#b25">[26]</ref>. Due to an unbiased dataset, certain populations may receive incorrect diagnoses or treatments as a result of unreliable predictions brought on by bias in datasets. Nonetheless, GANs provide a potentially helpful way to generate AI data that can assist in balancing underrepresented groups in health databases. The aim to explore how GAN-based techniques can eliminate bias through data augmentation and enable more reliable and equitable Machine Learning (ML) models motivates this effort <ref type="bibr" target="#b12">[13]</ref>. The comparative study's main goal is to identify the optimal variant to lessen bias in medical datasets. We want to improve the quality of treatment by lowering bias and ensuring that AI systems generate reliable, accurate, and equitable forecasts for a range of demographics. Therefore, the motivation of this study is to investigate different variants of GAN including TGAN, CTGAN, MedGAN, and MC-MedGAN for their efficacy in mitigating bias and improving predictive performance on multiple medical datasets. This work will serve as a foundation for further experimentation on data generation via GAN to mitigate biases.</p><p>Hypothesis: GAN-based data generation methods can help to reduce biases and ensure fairness in medical datasets.</p><p>The formulated research questions to explore the above hypothesis are as follows:</p><p>• Does GAN-based synthetic data generation help reduce biases in medical datasets? If yes, which GAN variant performs better among basic GAN, TGAN, CTGAN, MedGAN, and MC-MedGAN?</p><p>The rest of the article is structured as follows: Section 3 highlights some of the recent related work. Section 4 explains the methodology in detail. Section 5 explains the results. Section 6 discusses the hypothesis and research questions and Section 7 concludes the discussion with future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Related work</head><p>GANs have gained significant attention in recent years due to their capability of generating highquality data. Therefore, this section reviews the recent methodologies that leverage GAN models to generate synthetic data. A study <ref type="bibr" target="#b26">[27]</ref> presents the potential of GANs in generating synthetic data from observational health data and discusses some of the unique challenges associated with healthcare datasets, such as concerns about class imbalance. Observational Health Data (OHD) is highly valuable for medical research and health informatics. The use of such data is severely limited because of strict regulations. It highlights that GAN-generated synthetic data can help overcome some of the common challenges, such as bias, privacy and class imbalance. The authors argue that GANs are useful in generating healthcare data to combat the scarcity of high-quality medical datasets. Moreover, to address the challenges of drift and class imbalance of gas detection systems, <ref type="bibr" target="#b27">[28]</ref> employed CTGAN for data augmentation. The result shows a significant improvement in the classification accuracy of each class for both Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) thus reducing bias toward the majority class. They conclude that CTGAN provides a feasible solution to generate a balanced dataset.</p><p>In another study <ref type="bibr" target="#b28">[29]</ref> various variants of GAN including CTGAN, TGAN, and Wasserstein GAN (WGAN) are utilised for the anonymisation of real data through data synthesis. These models were compared for precision, recall, and coverage scores to evaluate the generation of realistic tabular data, handling missing and class-imbalanced data, and ensuring privacy. The results show that, although no GAN method performs best in each evaluation metric, CTGAN and TGAN produce better scores in most of the evaluation metrics. Additionally, in <ref type="bibr" target="#b29">[30]</ref> a new variant of GAN called Multi-label Timeseries GAN (MTGAN) is proposed to generate sequential Electronic Health Record (EHR) data using a gated recurrent unit with a smooth conditional matrix, while the critic evaluates temporal features using Wasserstein distance for improving the quality of synthetic data. The results show that MTGAN generates realistic EHR data effectively and improves accuracy for uncommon diseases.</p><p>The above studies show that GANs have the potential to generate high-quality diverse datasets that can be used to handle bias in real-world datasets. Therefore, to analyse the capabilities of different GAN variants, this study aims to conduct multiple experiments and then assess the fairness within the newly generated synthetic medical datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methodology</head><p>Figure <ref type="figure" target="#fig_0">1</ref> shows the systematic methodology diagram used to evaluate the different variants of GANs. First, the data is preprocessed and split into standard train and test sets. Then, the data is fed into the GAN variant to generate synthetic data. The newly generated data is augmented with the real data to balance the number of samples for the sensitive attributes and the output label. Afterwards, ML models are trained on the newly generated data to evaluate the overall performance as well as the fairness of the models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Preprocessing</head><p>The data preprocessing includes one hot encoding to replace categorical variables with numerical numbers. Afterwards, we applied z-score normalisation on each distinct numerical feature because they did not contain extreme outliers <ref type="bibr" target="#b30">[31]</ref>. Normalisation helps to specify each variable within a specified range to simplify the model-learning process <ref type="bibr" target="#b31">[32]</ref>. Then, the resulting dataset is split into a 70:30 ratio for train and test sets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Generate synthetic data</head><p>In order to balance the dataset for the sensitive attribute i.e., Gender and class labels. We employed five GAN architectures: basic GAN, TGAN, CTGAN, MedGAN and MC-MedGAN. These variants are specifically designed to handle tabular and medical datasets which is the primary focus of our study. GAN is a type of neural network architecture where two networks, a generator, and a discriminator, are trained simultaneously <ref type="bibr" target="#b32">[33,</ref><ref type="bibr" target="#b33">34,</ref><ref type="bibr" target="#b34">35]</ref>. Tabular GAN is an application-driven variant of the GAN that is designed to generate synthetic tabular data, containing rows and columns like in a spreadsheet or database <ref type="bibr" target="#b32">[33,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b35">36]</ref>. The CTGAN is an extension to Tabular GAN that generates synthetic tabular data while taking into consideration the distribution of dependent target variables. This will help associate relations between columns and observe dependence relationships <ref type="bibr" target="#b14">[15]</ref>. MedGAN is a specialised version of GAN that generates synthetic data in the medical field, mainly in tabular form containing sensitive information <ref type="bibr" target="#b15">[16]</ref>. MC-MedGAN is a variant of MedGAN designed for handling multi-categorical variables, commonly present in medical datasets <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Train ML classifiers</head><p>After generating synthetic samples to balance the datasets for sensitive attribute (gender) and class labels, different commonly used ML classifiers including Logistic regression (LR), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbour (KNN) with default parameters are trained on the newly generated datasets to evaluate the performance of GAN variants.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Datasets</head><p>To evaluate the performance of GAN variants, we used three different medical datasets that contain sensitive attributes. The details of each of these datasets are as follows:</p><p>Asthma Disease Dataset: The Asthma Disease Dataset <ref type="bibr" target="#b17">[18]</ref> contains a record of 2,392 samples with 28 features. The output label is the diagnosis indicator, which is taken as 0 for the absence and 1 for a positive case. It contains 2,268 samples for class 0 as compared to 124 samples with class label 1. Also, the number of samples for males is 1212 whereas for females the count is 1180.</p><p>Heart Disease Prediction Dataset: The Heart Disease Prediction Dataset <ref type="bibr" target="#b18">[19]</ref> consists of 13 features and 303 samples. The dataset contains 207 male and 96 female samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cancer Prediction Dataset:</head><p>The Cancer Prediction Dataset <ref type="bibr" target="#b19">[20]</ref> contains 1,500 samples with 8 features. The target variable 'diagnosis' indicates whether a patient has cancer or not (0 for no cancer and 1 for cancer). The diagnosis distribution shows 943 patients without cancer and 557 with cancer. There are 736 female samples and 764 males in total.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>The performance is evaluated by training different ML classifiers as mentioned in Section 4. The classifiers are assessed using accuracy, f1-score, precision, recall and AUC. Whereas the fairness of the dataset is evaluated via EO, PS, and SP. EO guarantees that all individuals receive the same treatment and meet the same requirements <ref type="bibr" target="#b20">[21]</ref>. PS can be defined as the conditional probability of being exposed to a treatment given the observed covariates <ref type="bibr" target="#b21">[22]</ref>. SP is a fairness criterion that requires the probability of a favourable outcome to be the same for each demographic group <ref type="bibr" target="#b22">[23]</ref>. Tables 1, Table <ref type="table" target="#tab_2">2</ref>, and Table <ref type="table">3</ref> show each classifier's performance on the original as well as on each generated dataset. It can be seen that MedGAN performs well for the Asthma Disease Dataset and Cancer Prediction Dataset while MC-MedGAN has a better score for the Heart Disease Dataset. Figure <ref type="figure" target="#fig_1">2</ref>, shows the fairness metric performance on the Asthma Disease dataset. The SP, PS, and EO scores improve when the dataset is balanced for class label and gender. MEDGAN has a better performance for all three datasets followed by MC-MedGAN and TGAN. The same performance is observed for the other two datasets. The other graphs are given in Appendix A.</p><p>Overall, the results show that balancing the dataset for class labels and sensitive attributes improves the performance as well as the fairness of the model. Among different GAN variants, the MEDGAN produces good results and lower statistical, propensity and equal opportunity scores showing its great capability for reducing bias followed by MC-MedGAN. Moreover, the predictive ability of RF classifiers is better than other classifiers in terms of accuracy, precision, recall, f1-score, and AUC.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>This section discusses the overall findings of the study in view of the literature review and extensive experimentation conducted to analyse our hypothesis. Based on our research question, the experiments show that classifier performance as well as the fairness metrics score improves when the datasets are balanced for sensitive attributes and class labels. Figure <ref type="figure" target="#fig_1">2</ref> shows the improvement in the fairness scores across each metric when the dataset is balanced via synthetic data generation using GAN variants as compared to the original dataset. Moreover, the analysis of each GAN variant based on performance evaluation using accuracy, precision, F1-score, recall, AUC and fairness metrics via EO, PS, and SP indicates that the MedGAN produces efficient performance followed by MC-MEDGAN across all three datasets. To validate any statistically significant difference between these two methods, we applied a paired t-test on the EO, PS, and SP scores for each of these methods. The p-values for EO, PS, and SP came out to be 0.34, 0.61, and 0.30 respectively. Therefore, we fail to reject our hypothesis and conclude that these two methods are not significantly different. These GAN variants are specifically designed for medical datasets to capture the interdependencies between the different variables to generate synthetic data similar to original data properties <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref>. However, further experimentation with other datasets including post-hoc tests will be conducted in future to provide deeper insights into the capability of GAN variants for data generation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion and future work</head><p>In this paper, we tested different types of GANs for their capacity to produce synthetic tabular data to decrease bias in medical datasets. Our key findings are GAN-based models are effective for bias migration and GAN can provide a balanced dataset to produce generalised AI models and provide a solution AI for all and AI for good. On the other hand, traditional GANs were successful but medical domain-based GANs displayed greater performance in generating high-quality and unbiased data. It drives us to have more specific models in the future. Despite certain advantages of the GAN, we face some obstacles such as evaluation metrics. There is a need to have more standardised and compressive evaluation metrics of this model focused on decreasing bias. The studies in this article suggest that synthetic data can assist in eliminating bias and improve the effectiveness of the classifier. Moreover, MedGAN performs better in terms of SP, PS, and EO. In future, we will extend our work for various variations of GAN focused on refining GAN architecture to adapt the multimodality medical data, bias-sensitive evaluation mechanism and testing the GAN-based techniques in real-world clinical data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Systematic Methodology Diagram to Evaluate GAN Variants</figDesc><graphic coords="4,194.65,65.60,205.99,280.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Fairness Assessment for Asthma Disease dataset (a) Statistical Parity (b) Propensity Score, (c) Equal Opportunity</figDesc><graphic coords="7,72.00,502.04,451.29,197.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="11,72.00,85.05,451.29,193.22" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="11,72.00,293.23,451.29,195.19" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="11,72.00,503.39,451.29,196.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="12,72.00,84.23,451.29,194.21" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="12,72.00,293.40,451.29,194.86" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="12,72.00,503.22,451.29,197.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Accuracy, F1-score, Precision, Recall and AUC score comparison over the Asthma</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Disease Dataset Method Model Accuracy F1-score Precision Recall AUC</head><label></label><figDesc></figDesc><table><row><cell></cell><cell>LR</cell><cell>0.4989</cell><cell>0.4594</cell><cell>0.4951</cell><cell>0.4285 0.4996</cell></row><row><cell>Original Dataset</cell><cell>RF DT</cell><cell>0.4926 0.4864</cell><cell>0.5050 0.4029</cell><cell>0.4901 0.4770</cell><cell>0.5210 0.4628 0.3487 0.4559</cell></row><row><cell></cell><cell>KNN</cell><cell>0.4926</cell><cell>0.4840</cell><cell>0.4892</cell><cell>0.4789 0.4903</cell></row><row><cell></cell><cell>LR</cell><cell>0.5381</cell><cell>0.5283</cell><cell>0.5600</cell><cell>0.5000 0.5487</cell></row><row><cell>GAN</cell><cell>RF DT</cell><cell>0.7124 0.6662</cell><cell>0.6967 0.6862</cell><cell>0.7667 0.6680</cell><cell>0.6383 0.7912 0.7053 0.6648</cell></row><row><cell></cell><cell>KNN</cell><cell>0.5958</cell><cell>0.6128</cell><cell>0.6074</cell><cell>0.6183 0.6047</cell></row><row><cell></cell><cell>LR</cell><cell>0.5147</cell><cell>0.4892</cell><cell>0.5380</cell><cell>0.4485 0.5254</cell></row><row><cell>TGAN</cell><cell>RF DT</cell><cell>0.7267 0.6689</cell><cell>0.7064 0.6932</cell><cell>0.7967 0.6666</cell><cell>0.6345 0.7797 0.7221 0.6669</cell></row><row><cell></cell><cell>KNN</cell><cell>0.5691</cell><cell>0.5878</cell><cell>0.5827</cell><cell>0.5929 0.5912</cell></row><row><cell></cell><cell>LR</cell><cell>0.5103</cell><cell>0.4952</cell><cell>0.5428</cell><cell>0.4553 0.5119</cell></row><row><cell>CTGAN</cell><cell>RF DT</cell><cell>0.7425 0.6563</cell><cell>0.7248 0.6952</cell><cell>0.8309 0.6532</cell><cell>0.6427 0.7920 0.7429 0.6509</cell></row><row><cell></cell><cell>KNN</cell><cell>0.5597</cell><cell>0.5720</cell><cell>0.5871</cell><cell>0.5577 0.5786</cell></row><row><cell></cell><cell>LR</cell><cell>0.5272</cell><cell>0.5140</cell><cell>0.5472</cell><cell>0.4845 0.5425</cell></row><row><cell>MedGAN</cell><cell>RF DT</cell><cell>0.7409 0.6715</cell><cell>0.7233 0.6980</cell><cell>0.8054 0.6640</cell><cell>0.6563 0.7883 0.7356 0.6694</cell></row><row><cell></cell><cell>KNN</cell><cell>0.5756</cell><cell>0.5898</cell><cell>0.6114</cell><cell>0.5695 0.5858</cell></row><row><cell></cell><cell>LR</cell><cell>0.5134</cell><cell>0.5128</cell><cell>0.5167</cell><cell>0.5134 0.5084</cell></row><row><cell>MC-MedGAN</cell><cell>RF DT</cell><cell>0.6927 0.6226</cell><cell>0.6916 0.6172</cell><cell>0.7015 0.6238</cell><cell>0.6927 0.7635 0.6226 0.6220</cell></row><row><cell></cell><cell>KNN</cell><cell>0.5711</cell><cell>0.5706</cell><cell>0.5705</cell><cell>0.5711 0.6062</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Accuracy, F1-score, Precision, Recall and AUC score comparison over the Heart</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Disease Prediction Dataset Method Model Accuracy F1-score Precision Recall AUC</head><label></label><figDesc></figDesc><table><row><cell></cell><cell>LR</cell><cell>0.7049</cell><cell>0.8125</cell><cell>0.8125</cell><cell>0.8125 0.6522</cell></row><row><cell>Original Dataset</cell><cell>RF DT</cell><cell>0.8032 0.6885</cell><cell>0.8775 0.7654</cell><cell>0.8600 0.9393</cell><cell>0.8958 0.7996 0.6458 0.8261</cell></row><row><cell></cell><cell>KNN</cell><cell>0.7868</cell><cell>0.8631</cell><cell>0.8723</cell><cell>0.8541 0.7203</cell></row><row><cell></cell><cell>LR</cell><cell>0.7126</cell><cell>0.8166</cell><cell>0.8567</cell><cell>0.8957 0.7039</cell></row><row><cell>GAN</cell><cell>RF DT</cell><cell>0.8915 0.8915</cell><cell>0.9010 0.8988</cell><cell>0.9318 0.9523</cell><cell>0.8723 0.9621 0.8510 0.8977</cell></row><row><cell></cell><cell>KNN</cell><cell>0.8674</cell><cell>0.8791</cell><cell>0.9090</cell><cell>0.8510 0.9255</cell></row><row><cell></cell><cell>LR</cell><cell>0.7349</cell><cell>0.8441</cell><cell>0.8205</cell><cell>0.8808 0.7352</cell></row><row><cell>TGAN</cell><cell>RF DT</cell><cell>0.8915 0.9277</cell><cell>0.9010 0.9333</cell><cell>0.9318 0.9767</cell><cell>0.8723 0.9621 0.8936 0.9329</cell></row><row><cell></cell><cell>KNN</cell><cell>0.8674</cell><cell>0.8791</cell><cell>0.9090</cell><cell>0.8510 0.9137</cell></row><row><cell></cell><cell>LR</cell><cell>0.7250</cell><cell>0.8153</cell><cell>0.8500</cell><cell>0.9217 0.6876</cell></row><row><cell>CTGAN</cell><cell>RF DT</cell><cell>0.9083 0.8250</cell><cell>0.9197 0.8292</cell><cell>0.9264 0.9444</cell><cell>0.9130 0.9766 0.7391 0.8975</cell></row><row><cell></cell><cell>KNN</cell><cell>0.8750</cell><cell>0.8800</cell><cell>0.9821</cell><cell>0.7971 0.9903</cell></row><row><cell></cell><cell>LR</cell><cell>0.7108</cell><cell>0.8891</cell><cell>0.8355</cell><cell>0.8734 0.7340</cell></row><row><cell>MedGAN</cell><cell>RF DT</cell><cell>0.9036 0.8674</cell><cell>0.9130 0.8791</cell><cell>0.9333 0.9090</cell><cell>0.8936 0.9598 0.8510 0.9021</cell></row><row><cell></cell><cell>KNN</cell><cell>0.8674</cell><cell>0.8791</cell><cell>0.9090</cell><cell>0.8510 0.9284</cell></row><row><cell></cell><cell>LR</cell><cell>0.7746</cell><cell>0.9096</cell><cell>0.9510</cell><cell>0.8382 0.7133</cell></row><row><cell>MC-MedGAN</cell><cell>RF DT</cell><cell>0.9277 0.9277</cell><cell>0.9347 0.9333</cell><cell>0.9555 0.9767</cell><cell>0.9148 0.9728 0.8936 0.9320</cell></row><row><cell></cell><cell>KNN</cell><cell>0.8433</cell><cell>0.8539</cell><cell>0.9047</cell><cell>0.8085 0.9414</cell></row><row><cell>Table 3</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="6">Accuracy, F1-score, Precision, Recall and AUC score comparison over the Cancer Prediction Dataset</cell></row><row><cell>Method</cell><cell cols="5">Model Accuracy F1-score Precision Recall AUC</cell></row><row><cell></cell><cell>LR</cell><cell>0.6000</cell><cell>0.5714</cell><cell>0.5882</cell><cell>0.5555 0.6846</cell></row><row><cell>Original Dataset</cell><cell>RF DT</cell><cell>0.6133 0.5666</cell><cell>0.5671 0.5608</cell><cell>0.6129 0.5460</cell><cell>0.5277 0.6588 0.5763 0.6028</cell></row><row><cell></cell><cell>KNN</cell><cell>0.6366</cell><cell>0.5657</cell><cell>0.6635</cell><cell>0.4930 0.6586</cell></row><row><cell></cell><cell>LR</cell><cell>0.6218</cell><cell>0.5776</cell><cell>0.5944</cell><cell>0.6431 0.6898</cell></row><row><cell>GAN</cell><cell>RF DT</cell><cell>0.7527 0.6890</cell><cell>0.7580 0.7154</cell><cell>0.7500 0.6656</cell><cell>0.7661 0.8453 0.7733 0.6881</cell></row><row><cell></cell><cell>KNN</cell><cell>0.7090</cell><cell>0.7359</cell><cell>0.6798</cell><cell>0.8021 0.8002</cell></row><row><cell></cell><cell>LR</cell><cell>0.6549</cell><cell>0.5891</cell><cell>0.5920</cell><cell>0.5921 0.6958</cell></row><row><cell>TGAN</cell><cell>RF DT</cell><cell>0.7271 0.6849</cell><cell>0.7478 0.7304</cell><cell>0.7317 0.6676</cell><cell>0.7647 0.8331 0.8062 0.6774</cell></row><row><cell></cell><cell>KNN</cell><cell>0.7161</cell><cell>0.7574</cell><cell>0.6914</cell><cell>0.8373 0.8369</cell></row><row><cell></cell><cell>LR</cell><cell>0.6696</cell><cell>0.6458</cell><cell>0.5958</cell><cell>0.5735 0.7680</cell></row><row><cell>CTGAN</cell><cell>RF DT</cell><cell>0.7287 0.7269</cell><cell>0.7349 0.7409</cell><cell>0.7375 0.7224</cell><cell>0.7323 0.8342 0.7605 0.7260</cell></row><row><cell></cell><cell>KNN</cell><cell>0.6690</cell><cell>0.7053</cell><cell>0.6498</cell><cell>0.7711 0.7836</cell></row><row><cell></cell><cell>LR</cell><cell>0.7745</cell><cell>0.6698</cell><cell>0.5899</cell><cell>0.6563 0.7057</cell></row><row><cell>MedGAN</cell><cell>RF DT</cell><cell>0.7071 0.6795</cell><cell>0.7028 0.7119</cell><cell>0.7230 0.6534</cell><cell>0.6836 0.7852 0.7818 0.6782</cell></row><row><cell></cell><cell>KNN</cell><cell>0.7071</cell><cell>0.7371</cell><cell>0.6757</cell><cell>0.8109 0.8123</cell></row><row><cell></cell><cell>LR</cell><cell>0.7452</cell><cell>0.5923</cell><cell>0.5977</cell><cell>0.7090 0.7694</cell></row><row><cell>MC-MedGAN</cell><cell>RF DT</cell><cell>0.7005 0.6635</cell><cell>0.7011 0.6862</cell><cell>0.7116 0.6524</cell><cell>0.6909 0.7782 0.7236 0.6625</cell></row><row><cell></cell><cell>KNN</cell><cell>0.6543</cell><cell>0.6867</cell><cell>0.6366</cell><cell>0.7454 0.7558</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was managed by the CREATE-DkIT project, supported by the HEA's TU-Rise programme and co-financed by the Government of Ireland and the European Union through the ERDF Southern, Eastern Midland Regional Programme 2021-27 and the Northern Western Regional Programme 2021-27. This research is also partially supported by the Research Ireland under Grant Number 21/FFP-A/9255.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Fairness Assessment Graphs</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Large language models struggle to learn long-tail knowledge</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kandpal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="15696" to="15707" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies</title>
		<author>
			<persName><forename type="first">E</forename><surname>Ferrara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sci</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">3</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Dataset bias in diagnostic ai systems: Guidelines for dataset collection and usage</title>
		<author>
			<persName><forename type="first">J</forename><surname>Vaughn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vadari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Boag</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM Conference on Health, Inference and Learning</title>
				<meeting>the ACM Conference on Health, Inference and Learning<address><addrLine>Toronto, ON, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="2" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Haenssle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schneiderbauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Toberer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Buhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Blum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kalloo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B H</forename><surname>Hassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Enk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of oncology</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="1836" to="1842" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A framework for understanding sources of harm throughout the machine learning life cycle</title>
		<author>
			<persName><forename type="first">H</forename><surname>Suresh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guttag</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization</title>
				<meeting>the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">A review of bias and fairness in artificial intelligence</title>
		<author>
			<persName><forename type="first">R</forename><surname>González-Sendino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Serrano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bajo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Novais</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Gan-based data generation for speech emotion recognition</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Eskimez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dimitriadis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gmyr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kumanati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">INTERSPEECH</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3446" to="3450" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Bias mitigation via synthetic data generation: A review</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Shahul Hameed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Qureshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kaushik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronics</title>
		<imprint>
			<biblScope unit="page">13</biblScope>
			<date type="published" when="2024">2079-9292. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Smote: synthetic minority over-sampling technique</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Chawla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Bowyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">O</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">P</forename><surname>Kegelmeyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of artificial intelligence research</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="321" to="357" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Bias in machine learning software: Why? how? what to do?</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Menzies</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering</title>
				<meeting>the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="429" to="440" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Borderline-smote: a new over-sampling method in imbalanced data sets learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B.-H</forename><surname>Mao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on intelligent computing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="878" to="887" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Class imbalances versus small disjuncts</title>
		<author>
			<persName><forename type="first">T</forename><surname>Jo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Japkowicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Sigkdd Explorations Newsletter</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="40" to="49" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Survey on synthetic data generation, evaluation methods and gans</title>
		<author>
			<persName><forename type="first">A</forename><surname>Figueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vaz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mathematics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">2733</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Veeramachaneni</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1811.11264</idno>
		<title level="m">Synthesizing tabular data using generative adversarial networks</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Modeling tabular data using conditional gan</title>
		<author>
			<persName><forename type="first">L</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Skoularidou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cuesta-Infante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Veeramachaneni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Generating multi-label discrete patient records using generative adversarial networks</title>
		<author>
			<persName><forename type="first">E</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Biswal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Malin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Duke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">F</forename><surname>Stewart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine learning for healthcare conference</title>
				<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="286" to="305" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Generation and evaluation of synthetic patient data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Goncalves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Soper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Stevens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Coyle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Sales</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC medical research methodology</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="1" to="40" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Asthma disease dataset</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Kharoua</surname></persName>
		</author>
		<ptr target="https://www.kaggle.com/datasets/rabieelkharoua/asthma-disease-dataset" />
		<imprint>
			<date type="published" when="2024-08-23">2024. 2024-08-23</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Heart disease prediction dataset</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ujeniya</surname></persName>
		</author>
		<ptr target="https://www.kaggle.com/datasets/krishujeniya/heart-diseae/data" />
		<imprint>
			<date type="published" when="2024-08-23">2024. 2024-08-23</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Cancer prediction dataset</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Kharoua</surname></persName>
		</author>
		<ptr target="https://www.kaggle.com/datasets/rabieelkharoua/cancer-prediction-dataset/data" />
		<imprint>
			<date type="published" when="2024-08-23">2024. 2024-08-23</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Equality of opportunity in supervised learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Price</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Srebro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A brief guide to propensity score analysis</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Valojerdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Janani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Medical journal of the Islamic Republic of Iran</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page">122</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The statistical fairness field guide: perspectives from social and formal sciences</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Carey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AI and Ethics</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Leveraging feature bias for scalable misprediction explanation of machine learning models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ahmed</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE</title>
				<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="1559" to="1570" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Ai pitfalls and what not to do: mitigating bias in ai</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Gichoya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Celi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Safdar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Banja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Seyyed-Kalantari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Trivedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Purkayastha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The British Journal of Radiology</title>
		<imprint>
			<biblScope unit="volume">96</biblScope>
			<biblScope unit="page">20230023</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Revolutionizing healthcare: the role of artificial intelligence in clinical practice</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Alowais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Alghamdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Alsuhebany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Alqahtani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Alshaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">N</forename><surname>Almohareb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Aldairem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alrashed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Saleh</surname></persName>
		</author>
		<author>
			<persName><surname>Badreldin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC medical education</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page">689</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Georges-Filteau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cirillo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.13510</idno>
		<title level="m">Synthetic observational health data with gans: from slow adoption to a boom in medical research and ultimately digital twins?</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Data augmentation and class imbalance compensation using ctgan to improve gas detection systems</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mahinnezhad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mahinnezhad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kaur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shih</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2024 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), IEEE</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Generating Synthetic Health Data Using Machine Learning GAN Methods</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Shourmasti</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">Master&apos;s thesis</note>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Multi-label clinical time-series generation via conditional gan</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Feature-limited prediction on the uci heart disease dataset</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Alfadli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">O</forename><surname>Almagrabi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers, Materials &amp; Continua</title>
		<imprint>
			<biblScope unit="volume">74</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Investigating the impact of data normalization on classification performance</title>
		<author>
			<persName><forename type="first">D</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Soft Computing</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page">105524</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Damage gan: A generative model for imbalanced data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Anaissi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Braytee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Naji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Alyassine</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Australasian Conference on Data Science and Machine Learning</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="48" to="61" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Gan-based approaches for generating structured data in the medical domain</title>
		<author>
			<persName><forename type="first">M</forename><surname>Abedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hempel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sadeghi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kirsten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">7075</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Gan-based one dimensional medical data augmentation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dekker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Traverso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soft Computing</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="10481" to="10491" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Causal-tgan: Modeling tabular data using causally-aware gan</title>
		<author>
			<persName><forename type="first">B</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Subbalakshmi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chandramouli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICLR Workshop on Deep Generative Models for Highly Structured Data</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
