Limitations and Applicability of GANs in Banking Domain Anubha Pandey 1 and Deepak Bhatt 2 and Tanmoy Bhowmik 3 Abstract. Threats due to payment-related frauds are always a pri- degree of class imbalance is observed in a variety of real-world ap- mary concern for financial institutions (FIs), often leading to huge plications like medical diagnosis, information retrieval system, bioin- losses and impacting consumer experience. To combat emerging formatics [31, 1, 34, 17, 21, 38, 39]. frauds and improve the system’s robustness, FIs need an efficient There exist several techniques for the class imbalance learning system to detect fraud while authorizing payments. The biggest chal- [35, 24, 15, 36]. [30] has done a comparative study of several super- lenge in developing a fraud detection system is a high degree of class vised and unsupervised machine learning algorithms to handle the imbalance between fraudulent and legitimate transactions. Recently, class-imbalance in credit card fraud detection. One of the solutions Generative Adversarial Networks (GANs) are employed as an over- to the class imbalance problem is to re-balance the training sets used sampling technique to augment the dataset with synthetic minority by the binary-classifier [2, 11, 21]. There exist several oversampling samples. techniques that have proved to be effective in handling class imbal- In this paper, we present a systematic study to train GANs for ance. The commonly used methods are variants of SMOTE(Synthetic synthetic fraud generation, demonstrating improved classifier perfor- Minority Oversampling Techniques) [9, 20, 8]. The SMOTE aims to mance detecting fraud. Training of GANs is conducted in various set- generate samples along the line between two samples of the minority tings, including min-max objective and with or without auxiliary loss class. However, these methods generate synthetic samples based on discriminating synthetic fraud and real fraud from non-fraud sam- the existing samples in the dataset and fail to capture minority class ples. Auxiliary loss is obtained using contrastive loss or triplet loss. distribution. Hence can’t detect new fraudulent transactions. Quality of trained GANs is estimated by evaluating the lift in clas- Recently, Generative Adversarial Networks (GAN) [16, 29] have sifier performance when trained with dataset augmented with syn- received a lot of attention from the research community of credit card thetic fraud. Further, the effect of Discriminator Rejection Sampling fraud detection. Several works [14, 33, 5, 40] have shown the efficacy (DRS) is studied in synthetic sample selection used for training data of GAN for augmenting the dataset with synthetic minority (fraud) augmentation. The performance comparison of different settings pro- samples. However, mode collapse is a common phenomenon that oc- posed in this study is evaluated using a publicly available Credit-Card curs with GANs. Mode collapse happens when GAN generates lim- dataset and showed an absolute improvement of up to 6% in Recall ited varieties of samples and hence fails to capture the whole data and 3% in precision. We hope this paper will help advance the ap- distribution. To overcome the issue of mode collapse, researchers plicability of GANs with a practical insight into the research that has [33, 5] have used different architectures of GAN like WGAN[3], been done on this topic so far and open doors to interesting future Least Square GAN[28], Relaxed WGAN[19] to augment the dataset research direction. and have shown an improvement in the classifier’s performance. On the other hand, [40] has trained a GAN based architecture to gener- 1 INTRODUCTION ate complementary samples of the majority class(legitimate transac- Credit card has become a ubiquitous method for online payment. tions). They have used a combination of two WGANs and two Au- Consequently, the increase in more sophisticated fraudulent transac- toencoders and use a three-phase training process for fraud detection. tions is alarming. The fraudulent transactions affect the user level and In this paper, a comprehensive study of several existing techniques business level, resulting in financial loss and customer trust. Banks to train GANs in fraud detection scenario is conducted along with and fintech companies need an efficient system to monitor the mas- highlighting their merits and demerits. We have shown experiments sive volume of transaction logs and detect the frauds[6, 27]. How- on conditional WGAN-GP for the generation of fraudulent data con- ever, it should not decline legitimate transactions affecting consumer ditioned separately on class labels for fraud samples obtained from experience. k-means clustering or non-fraud samples from the training set. It is The commonly used pipeline for the fraud detection system em- observed that just using GANs may lead to boundary distortion hence ploys a binary classifier to distinguish samples of fraud transactions leading to a drop in the performance for the majority class (legiti- from the legitimate ones [26, 12, 36]. The fraudulent transactions are mate transactions). We have proposed an auxiliary loss using Triplet rare; they represent a tiny fraction of activity within an organization, Network and Siamese Network separately on top of the WGAN-GP resulting in class imbalance. The class imbalance issue makes the bi- model to learn more discriminative fraud samples. Further, the ef- nary classifiers biased towards the majority class and hence makes fect in the quality of synthetic samples is studied when the WGAN- the fraud detection a challenging problem [23, 22, 7]. A similar high GP network is trained in an end to end fashion along with a neural network-based classifier and found to be useful for dealing with the 1 Mastercard, India, email: Anubha.Pandey@mastercard.com boundary distortion problem. All the models are simple architecture 2 Mastecard, India, email: Deepak.Bhatt@mastercard.com with few parameters and are trained end-to-end for the generation of 3 Mastercard, India, email: Tanmoy.Bhowmik@mastercard.com Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) fraudulent data as compared to [40]. We have further shown the ap- the training set. The loss functions for the Discriminator(D) and Gen- plicability of Discriminator Rejection Sampling [4] to improve the erator(G) module in conditional WGAN-GP are described below: n quality of the synthetic fraud samples used for data augmentation. In 1X LD = (DθD (x̃fi , yfi ) − DθD (xfi , yfi )+ the later section, we have highlighted an open problem in data aug- n i=1 (3) mentation, which is how to decide on the number of synthetic fraud samples for data augmentation. λ(||∇x̃ DθD (x˜fi , yfi )||2 − 1)2 n The paper is organized as follows. In section 2, we have described 1 X LG = (−DθD (GθG (zi , yfi ), yfi )) (4) several configurations used to train the WGAN-GP model for im- n i=1 proved data augmentation. The structural details of these configura- tions are provided in section 3 along with the dataset description, and other experimental settings followed. Section 4 compares the performance of all the models and visualizes the synthetic samples obtained for data augmentation. It also talks about the effect of in- creasing the number of synthetic samples in the augmented set on the classifier’s performance. Finally, section 5 concludes the article and provides a possible future research direction. Figure 1. Conditional GAN 2 METHODOLOGY 2.2.3 WGAN-GP with Siamese Network 2.1 Fraud detection framework Siamese Network [25] uses Contrastive divergence loss to minimize Fraud Detection is formulated as a binary classification problem. For the distance between positive pairs and maximize the distance be- each transaction record in the dataset, we have a feature vector and tween negative pairs. We use it on top of the underlying WGAN-GP corresponding class label (fraud or non-fraud).The commonly used model, as shown in Figure 2 to ensure the distribution learned by the pipeline for credit card fraud detection using generative models[13, generator for the fraud samples does not overlap with the non-fraud 33, 14, 5] is described below: samples. We train both the network in an end-to-end fashion. Siamese Network has two Neural Network with the shared weights 1. Train a GAN to generate the fraudulent samples from the train set. and maps the fraud (real and generated) and non-fraud samples into 2. Augment the training set with the synthesized fraud samples. a shared space such that the distance between them is preserved. We 3. Train a classifier on the original and augmented training set sepa- pass the pairs of generated, and real fraud samples as positive pairs rately and compare the performances. i.e., (xˆf , xf , l = 1)and generated fraud samples and real non-fraud samples as negative pairs i.e., (xˆf , xnf , l = 0) to the Siamese Net- 2.2 Data augmentation using different work S parameterized by θS and train the generator and Siamese configurations of WGAN-GP network on Contrastive divergence loss function as defined below: 2.2.1 WGAN-GP n 1X 1 LS = (li ) d(SθS (x̂fi ), SθS (xfi ))2 + We use a WGAN-GP [18] architecture to oversample from the fraud- n i=1 2 ulent (minority) class. It has a Generator module G : Z − → X pa- 1 rameterized by θG and a discriminator module D : X − → [0, 1] (1 − li ) {max(0, m − d(SθS (x̂fi ), SθS (xnf i )))}2 2 parameterized by θD . Where, Z is a set of random noise vector sam- (5) pled from unit Gaussian distribution N (0, 1) and X is a set of the where, d is the euclidean distance and m is the margin hyperparam- feature vector of the fraud samples. Below are the loss functions to eter. train the Discriminator(D) and Generator(G) module in WGAN-GP: 2.2.4 WGAN-GP with Triplet Network n 1 X The Triplet Network has three Neural Network with the shared LD = (DθD (x̃fi ) − DθD (xfi ) + λ(||∇x̃ DθD (x̃fi )||2 − 1)2 n i=1 weights and maps the fraud (real and generated) and non-fraud sam- (1) ples into a shared space such that the distance between them is pre- where, x̃fi = tx̂fi + (1 − t)xfi with 0 ≤ t ≤ 1 served using triplet loss function. The objective of the triplet loss[32] n is to minimize the distance between the generated fraud samples . 1X LG = (−DθD (GθG (zi ))) (2) and real fraud samples and simultaneously maximize the distance n i=1 between the generated fraud samples and real non-fraud samples; Where, x̂fi and xfi are the generated and real fraud samples re- hence, it is a max-margin framework. spectively and z is a random noise sample. We pass the triplet generated fraud samples, real fraud samples, and real non-fraud samples, i.e., (x̂f , xf + , xnf − ) to the Triplet Net- 2.2.2 Conditional WGAN-GP work T parameterized by θT and train the generator and Triplet Net- We add conditions to WGAN-GP [5] , as shown in Figure 1, and ex- work on Triplet loss function as defined below: tend the input space of the model. G : Z × Y − →X n 1X D :X×Y − → [0, 1] LT = max(0, m + d(TθT (x̂fi ), TθT (xfi )) Where Y is the set of conditions corresponding to the features in n i=1 (6) X set. We conduct two separate experiments with different condi- − d(TθT (x̂fi ), TθT (xnf i ))) tional variables, one with class labels of the fraud samples obtained using k-means clustering and second with the non-fraud samples in where, d is the euclidean distance and m is the margin hyperparame- ter. Figure 2. Different configurations used to train WGAN-GP architecture. 2.2.5 WGAN-GP with Classifier has time elapsed from the first transaction, and ’Class’ has label 1 for fraudulent transactions and 0 otherwise. There is no missing val- We train the WGAN-GP model with a binary Classifier, as shown ues in the dataset. We use Log transformed ’Amount’ values to give in Figure 2. We pass the generated fraud samples along with the more normal distribution and normalize the features between 0 and real non-fraud samples into a classifier C parameterized by θC and 1. We divide the dataset into a train and test set such that the train set train the generator on the classification loss. In this configuration, has 70% of the transactions in the dataset i.e., 199364 transactions, we have two different classifiers in the network, C tries to distin- and the test set has 30% of the transactions in the dataset i.e., 85443 guish samples from fraudulent (minority) and non-fraudulent (ma- transactions. 344 and 148 fraud samples account for 0.173% of the jority) transactions. Whereas, another classifier, discriminator(critic) total transactions in the training and testing set, respectively. D, tells how far is the learned distribution from the true distribution. This configuration of WGAN-GP and classifier ensures that the gen- erated fraud samples do not overlap with the real non-fraud samples 3.2 Architecture details and, simultaneously, follow the distribution of the minority(fraud) Lift in the performance of XGBoost classifier [10] is used as a metric class. We use binary cross-entropy loss to train the classifier and the to quantify the quality of generated synthetic samples when used for generator module of the architecture, as defined below: augmenting the training set. To evaluate various settings of WGAN- n X GP, we train it on different loss functions with the aim of generating L= − log(CθC (x̂fi )) − log(CθC (xnf i )) (7) more realistic fraud data. The architectural information of all the dif- i=1 ferent configurations used is described below: 1. WGAN-GP 2.2.6 WGAN-GP with Discriminator Rejection Sampling The generator module of the WGAN-GP model has four fully con- nected layers with 30, 128, 256, 512 neurons, respectively, Relu is In the standard GAN, it is a common practice to discard the dis- the activation function used after each layer except the last layer. criminator after training and generators are used for synthetic data The generator accepts the random noise of dimension 30 as input. generation. It is believed that after training a GAN, the generator The discriminator has a series of 5 fully connected layers with 30, perfectly captures the underlying data distribution. However, recent 512, 256, 128, 1 neurons respectively in each layer, Relu is used studies[4, 37] have shown that the GANs do not converge to the true as the activation function in all the layer except the last layer has data distribution, and the trained generator still generates samples Sigmoid activation. We train both the discriminator and generator that can be easily distinguished by the discriminator from the real module together on the loss function defined in Equation 1 and sample. They have also shown that the discriminator captures the 2, respectively. We set LAMBDA = 10.0 in the Gradient-Penalty data distribution more closely than the generators. Hence, we should term. We iterate over the discriminator module 5 times per genera- consider the distribution defined by both the generator and discrimi- tor module update. We train the model using Adam Optimizer for nator for better quality samples. 10000 epochs with mini-batch of size 64 and learning rate 2e-4. We use Discriminator Rejection Sampling(DRS) method [4], to We observe the convergence of the model at around 4000 epoch. sample from the distribution learned by the discriminator pd (x). We 2. Conditional WGAN-GP use DRS as a post-processing step, where we use the trained discrim- We use a k-means clustering algorithm to divide the fraud dataset inator D∗ to improve the synthetic fraud samples from the trained into 2 clusters and label the fraud samples as 0 or 1 accordingly. generator G∗ . We pass the labels of fraud samples as a condition to the WGAN- 3 EXPERIMENTAL SETUP GP along with random noise as input and train the WGAN-GP model with the same architecture as defined above. To form pairs 3.1 Dataset description of fraud and non-fraud samples, we randomly pick samples from All the experiments uses publicly available Credit-Card dataset [11]. the respective classes and pair them. We trained the model on the The dataset has transactions for two days done in September 2013 loss function defined in Equation 3 and 4 and observed the model’s by the European cardholders. There are 284807 transactions in the convergence at 4000 epoch. dataset, out of which 492 are fraudulent transactions, i.e., the frauds 3. WGAN-GP with Triplet Network account for 0.172% of the total transactions. 31 features represent the We form a triplet of the synthetic samples obtained from the gen- transactions, namely ’Amount,’ ’Time,’ ’Class,’ and 28 other numer- erator with the real fraud samples and real non-fraud samples and ical features obtained from PCA (V1, V2,... V28). Feature ’Time’ pass them to the Triplet Network. The network has three neural networks with shared weights. There are three fully connected Augmentation Method Precision Recall F1-Score layers with 30, 30, and 2 neurons, respectively. Each layer has Without Augmentation 0.90 0.76 0.83 a Relu activation except for the last layer. It uses Triplet loss to WGAN-GP 0.88 0.81 0.84 Labels from simultaneously ensure that the positive pair of generated and real Conditional 0.88 0.81 0.84 k-means fraud samples are close and the negative pair of generated fraud WGAN-GP Non-Fraud and real non-fraud samples are apart by some margin. We set the 0.86 0.81 0.82 Samples hyperparameter margin to 1 in the Triplet loss function defined in WGAN-GP + Triplet Network 0.89 0.82 0.85 the Equation6. We trained the triplet network with the WGAN-GP WGAN-GP + Siamese Network 0.88 0.82 0.85 WGAN-GP + Classifier 0.92 0.78 0.84 model end-to-end using the Adam optimizer for 5000 epochs and WGAN-GP + DRS 0.90 0.82 0.86 observed the convergence at around 3500 epoch. WGAN-GP + Classifier + DRS 0.93 0.79 0.85 4. WGAN-GP with Siamese Network We use the same architecture of the WGAN-GP model, as men- tioned above. We pair the synthetic samples obtained from the Table 1. Performance of XGBoost classifier trained on augmented set obtained from different configuration of WGAN-GP model. generator with the real fraud samples and real non-fraud samples to form positive and negative pairs simultaneously and pass it to a WGAN-GP model to learn the distribution of fraud samples and the Siamese Network. The Siamese Network has two neural net- used the trained generator of the WGAN-GP architecture to over- works with shared weights. There are three fully connected layers sample the minority class (fraud) data and augment the training set. with 30, 30, and 2 neurons, respectively. Each layer has Relu acti- We further train an XGBoost classifier on the augmented training set vation except for the last layer. It uses Contrastive divergence loss and report the performance on the testing set. We can observe from defined in Equation 5 to ensure that the positive pair of generated Table1 that there is an absolute improvement of 5% in Recall in the and real fraud samples are close and some margin separates the XGBoost Classifier trained on the dataset augmented by WGAN-GP negative pair of generated fraud and real non-fraud samples. We model as compared to the original dataset. set the hyperparameter margin to 1 and eps to 1e-9 in the Con- We also use the conditional WGAN-GP model to generate fraud trastive loss function defined in the Equation5. We train the entire samples based on some conditions like class labels and non-fraud network end-to-end using the Adam optimizer for 5000 epochs samples. Fraud samples are clustered into k classes using k-means and observe the model saturation at around 3000 epochs. clustering and corresponding cluster IDs are assigned as labels. In 5. WGAN-GP with Classifier our experiment fraud samples are classified into 2 clusters. We pass In this experiment, we add a binary classifier module on top of the these labels to the conditional WGAN-GP model as conditions to WGAN-GP model. We pass the generated fraud samples from the generate fraud samples. From Table 1, it can be observed that the per- generator to the classifier module along with the real non-fraud formance of the classifier remains the same when trained on the aug- samples from the training set. The classifier then distinguishes be- mented dataset obtained from the WGAN-GP model conditioned on tween the fraud and non-fraud samples. The classifier has three labels from k-means clustering. In another setting where conditional fully connected layers with 30, 30, and 2 neurons in each layer, WGAN-GP model is trained to learn the transformation of non-fraud respectively. All the layers have Relu activation, and the last layer samples to fraud did not perform better leading to absolute drop of has Softmax activation. We train the classifier and generator pa- 2% in Precision as observed in Table 1. Further investigation is re- rameters on the loss function defined in Equation 7 using Adam quired to identify the performance drop as the model output do not optimizer with a learning rate of 0.001. Initially, we train only conform to our hypothesis where generating fraud from non-fraud the WGAN-GP model for 1000 epochs. Later we train the entire samples should perform better. network end-to-end for 5000 epochs and observe the model satu- The study proposed training of GANs with auxiliary loss func- ration at around 2500 epochs. tions using triplet loss or siamese network loss for effective synthetic 4 RESULTS data generation. Experimental results using both the loss function demonstrated an improvement in Recall by 1%. However, an abso- 4.1 Performance metrics lute improvement of 2% and 1% can be observed in Precision with In credit card fraud detection, the class of interest is the fraud (mi- Triplet and Siamese Network, respectively, as compared to the simple nority) class. Here, the cost of false positive and false negative are WGAN-GP model. This further proves the benefits of incorporating not equal. An ideal system should precisely identify the fraud sam- auxiliary loss along with existing WGAN-GP training. ples while reducing the number of false positives. Accuracy is the In WGAN-GP with classifier, the generative module is trained on ratio of samples correctly classified by the classifier, i.e. (TP+TN)/N. two loss functions. The first loss corresponds to classifier which tries However, for the imbalanced dataset, accuracy is not the correct mea- to distinguish between the fraud and non-fraud samples and another sure of the classifier’s performance. We pay attention to the cat- classification loss for discriminator that distinguish between the real egorical prediction ability. Hence we report, Precision(specificity), and generated fraud samples. These two classifier modules, in turn, Recall(sensitivity) and F1-Score to evaluate the performance of the helps the generator to synthesize well-discriminative fraud samples model. Precision refers to the percentage of your results that are rel- that follow the fraud class distribution. Table 1 shows an absolute evant, i.e., TP/(TP+FP). Recall refers to the percentage of total rele- improvement of 3%in Precision and a reduction of 3% in the Recall vant results correctly classified by your algorithm. i.e., TP/(TP+FN). compared to the simple WGAN-GP model. However, as compared F1- score combine both the precision and recall metrics into one, and to the XGBoost classifier trained on the original dataset, there is an it is the harmonic mean of Precision and Recall. improvement of 2% in Recall and Precision. The results of all the different configurations of WGAN-GP em- The performance of the XGBoost classifier trained on the aug- ployed to solve the task of credit card fraud detection is illustrated mented dataset depends on the quality of the generated fraud sam- in Table 1. First, we train an XGBoost classifier on the training set’s ples. Hence to improve the Recall, the generated fraud samples transactions and test the performance on the testing set. Next, we use should be well-discriminative than the non-fraud samples. Recent studies [4, 37] have shown that the samples generated from the trained generator are not similar to the real class samples, which dis- criminator would have otherwise rejected easily. We employ the dis- criminator rejection sampling method, proposed in [4]. The trained discriminator is used to filter out the poor-quality samples from the generator as a post-processing step and are used for training dataset augmentation. Table 1 shows an absolute improvement of 2% in Precision and 1% in Recall using discriminator rejection sampling with WGAN-GP to augment the dataset over the simple WGAN-GP model. However, compared to the XGBoost classifier trained on the Figure 4. Effect of increasing generated fraud samples in the augmented original dataset, the performance is similar in Precision with a 6% set absolute improvement in Recall. A reduction in Precision may result in misclassification of legit- imate transactions as fraudulent transactions, hence penalizing the 4 shows the effect of increasing the generated fraud samples(N ) in banks in terms of customer trust and comfort. From the previous the augmented set on the classifier’s performance. We report the val- experiments, we have observed adding a classifier module on the ues of performance metrics at Epoch 4100 for the WGAN-GP model WGAN-GP model results in an improvement in Precision. To im- and Epoch 2000 for the WGAN-GP+Classifier model. From Table prove the quality of samples injected into the augmented set, we 2, we observe the best performance of the WGAN-GP model when used Discriminator Rejection Sampling (DRS) for all the configu- N = Nf , i.e., when the number of generated samples is equal to the rations of the WGAN-GP model discussed above. With DRS on the number of real fraud samples. And for the WGAN-GP+Classifier WGAN-GP model, we observe an absolute improvement of 2% and model, the best performance is observed when N = 2Nf , i.e., when 1% in Precision and Recall over the simple WGAN-GP model. In the number of generated samples is equal to twice the number of the case of the WGAN-GP+Classifier model, we observe an ab- real fraud samples. Also, from Figure 4, we observe that as the num- solute improvement of 1% in both the Precision and Recall. How- ber of generated fraud samples increases, the Recall of WGAN-GP ever, for WGAN-GP+Triplet Network and WGAN-GP+Siamese model increases, but Precision and F1-Score drops. However, for the Network model, no improvement was observed on applying DRS. WGAN-GP+Classifier model, we can observe that the Precision and Recall drops after 4Nf and Nf , respectively. WGAN-GP WGAN-GP+Classifier N Precision Recall F1 Score Precision Recall F1 Score 86 0.895 0.804 0.847 0.913 0.777 0.839 172 0.892 0.784 0.834 0.914 0.784 0.844 344 0.873 0.791 0.830 0.92 0.777 0.842 688 0.851 0.811 0.830 0.921 0.784 0.847 1376 0.811 0.811 0.811 0.926 0.764 0.837 2752 0.781 0.818 0.799 0.933 0.757 0.836 5504 0.753 0.824 0.787 0.925 0.75 0.828 11008 0.668 0.818 0.736 0.836 0.723 0.775 Figure 3. Samples generated from (a)WGAN-GP and 22016 0.541 0.845 0.660 0.886 0.736 0.804 (b)WGAN-GP+Classifier model 44032 0.313 0.858 0.458 0.886 0.736 0.804 88064 0.193 0.865 0.316 0.886 0.736 0.804 176128 0.123 0.878 0.215 0.908 0.736 0.813 4.2 Comparison of samples generated by different Table 2. Performance of XGBoost classifier as the number of generated models samples(N) is varied in the augmented set obtained from WGAN-GP and WGAN-GP+Classifier model. A visualization of the distribution of fraud transactions learned by the simple WGAN-GP model and the WGAN-GP+Classifier model 5 CONCLUSION AND FUTURE WORK is shown in as shown in Figure 3. For this a 10000 syhnthetic fraud The paper presented a detailed study on applicability and effective- samples are drawn from both the trained models and plotted it against ness of GANs. Various GANs variants along with ones proposed in real fraud samples and 10000 real non-fraud samples from the train- this study is compared to evaluate the efficacy of data augmentation ing set. Figure 3 illustrates that the WGAN-GP model learns a class for downstream classification task. Among different training proce- boundary from the fraud samples and sample synthetic fraud data dures WGAN-GP when trained with a classifier in an end-to-end from within the learned class boundary. Also, it can be observed that fashion performed well as shown in our study improving both preci- these samples are not uniformly distributed but are generated from sion and recall of XGBoost based fraud classifier. Further we found the high population area. In the case of the WGAN-GP+Classifier that Discrimiantor Rejection Sampling technique when applied for model, Figure 3 shows that the generated fraud samples are uni- selection of synthetic samples generated using WGAN-GP with clas- formly distributed and more spread out as compared to the simple sifier provided an incremental lift. Next we also demonstrated the WGAN-GP model. effect in the overall performance of fraud classifier with increase in synthetic samples used for training data augmentation. We believe 4.3 Effect of increasing the number of synthetic the outcomes presented in this study would help readers in quickly samples on the classifier’s performance identify the right settings of GANs utilised in fraud space. There are 344 fraud samples in the training set, let us denote it by A promising future research direction is to experiment with Re- Nf . We generate fraud samples in the multiples ([1/4, 1/2, 1, 2, 4, 8, inforcement Learning based algorithm to automatically identify the 16, 32, 64, 128, 256, 512]) of Nf to augment the dataset and study quality and count of samples to be used for augmenting the training the effect on the classifier’s performance. The Table 2 and Figure dataset leading to improved performance REFERENCES [22] Nathalie Japkowicz and Shaju Stephen, ‘The class imbalance problem: A systematic study’, Intelligent data analysis, 6(5), 429–449, (2002). [23] David Jensen, ‘Prospective assessment of ai technologies for fraud de- tection: A case study’, in AAAI Workshop on AI Approaches to Fraud [1] Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz, ‘Applying sup- Detection and Risk Management, pp. 34–38, (1997). port vector machines to imbalanced datasets’, in European conference [24] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, Daijin Kim, and on machine learning, pp. 39–50. Springer, (2004). Sung Yang Bang, ‘Constructing support vector machine ensemble’, [2] Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz, ‘Applying sup- Pattern recognition, 36(12), 2757–2767, (2003). port vector machines to imbalanced datasets’, in European conference [25] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov, ‘Siamese on machine learning, pp. 39–50. Springer, (2004). neural networks for one-shot image recognition’, in ICML deep learn- [3] Martin Arjovsky, Soumith Chintala, and Léon Bottou, ‘Wasserstein ing workshop, volume 2. Lille, (2015). gan’, arXiv preprint arXiv:1701.07875, (2017). [26] Victoria López, Alberto Fernández, Salvador Garcı́a, Vasile Palade, and [4] Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, and Francisco Herrera, ‘An insight into classification with imbalanced data: Augustus Odena, ‘Discriminator rejection sampling’, arXiv preprint Empirical results and current trends on using data intrinsic characteris- arXiv:1810.06758, (2018). tics’, Information sciences, 250, 113–141, (2013). [5] Hung Ba, ‘Improving detection of credit card fraudulent trans- [27] Sam Maes, Karl Tuyls, Bram Vanschoenwinkel, and Bernard Mander- actions using generative adversarial networks’, arXiv preprint ick, ‘Credit card fraud detection using bayesian and neural networks’, arXiv:1907.03355, (2019). in Proceedings of the 1st international naiso congress on neuro fuzzy [6] Barry G Becker, ‘Using mineset for knowledge discovery’, IEEE Com- technologies, pp. 261–270, (2002). puter Graphics and Applications, 17(4), 75–78, (1997). [28] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, [7] Siddhartha Bhattacharyya, Sanjeev Jha, Kurian Tharakunnel, and and Stephen Paul Smolley, ‘Least squares generative adversarial net- J Christopher Westland, ‘Data mining for credit card fraud: A com- works’, in Proceedings of the IEEE International Conference on Com- parative study’, Decision Support Systems, 50(3), 602–613, (2011). puter Vision, pp. 2794–2802, (2017). [8] Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok [29] Mehdi Mirza and Simon Osindero, ‘Conditional generative adversarial Lursinsap, ‘Dbsmote: density-based synthetic minority over-sampling nets’, arXiv preprint arXiv:1411.1784, (2014). technique’, Applied Intelligence, 36(3), 664–684, (2012). [30] Xuetong Niu, Li Wang, and Xulei Yang, ‘A comparison study of credit [9] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip card fraud detection: Supervised versus unsupervised’, arXiv preprint Kegelmeyer, ‘Smote: synthetic minority over-sampling technique’, arXiv:1904.10604, (2019). Journal of artificial intelligence research, 16, 321–357, (2002). [31] Saharon Rosset, Uzi Murad, Einat Neumann, Yizhak Idan, and Gadi [10] Tianqi Chen and Carlos Guestrin, ‘Xgboost: A scalable tree boosting Pinkas, ‘Discovery of fraud rules for telecommunications—challenges system’, in Proceedings of the 22nd acm sigkdd international confer- and solutions’, in Proceedings of the fifth ACM SIGKDD international ence on knowledge discovery and data mining, pp. 785–794, (2016). conference on Knowledge discovery and data mining, pp. 409–413, [11] Andrea Dal Pozzolo, Olivier Caelen, Reid A Johnson, and Gianluca (1999). Bontempi, ‘Calibrating probability with undersampling for unbalanced [32] Florian Schroff, Dmitry Kalenichenko, and James Philbin, ‘Facenet: A classification’, in 2015 IEEE Symposium Series on Computational In- unified embedding for face recognition and clustering’, in Proceedings telligence, pp. 159–166. IEEE, (2015). of the IEEE conference on computer vision and pattern recognition, pp. [12] Pedro Domingos, ‘Metacost: A general method for making classifiers 815–823, (2015). cost-sensitive’, in Proceedings of the fifth ACM SIGKDD international [33] Akhil Sethia, Raj Patel, and Purva Raut, ‘Data augmentation using gen- conference on Knowledge discovery and data mining, pp. 155–164, erative models for credit card fraud detection’, in 2018 4th International (1999). Conference on Computing Communication and Automation (ICCCA), [13] Georgios Douzas and Fernando Bacao, ‘Effective data generation pp. 1–6. IEEE, (2018). for imbalanced learning using conditional generative adversarial net- [34] Hua Shao, Hong Zhao, and Gui-Ran Chang, ‘Applying data mining to works’, Expert Systems with applications, 91, 464–471, (2018). detect fraud behavior in customs declaration’, in Proceedings. Interna- [14] Ugo Fiore, Alfredo De Santis, Francesca Perla, Paolo Zanetti, and tional Conference on Machine Learning and Cybernetics, volume 3, Francesco Palmieri, ‘Using generative adversarial networks for improv- pp. 1241–1244. IEEE, (2002). ing classification effectiveness in credit card fraud detection’, Informa- [35] Erik Sherman, ‘Fighting web fraud.’, Newsweek, 139(23), 32B–32B, tion Sciences, 479, 448–455, (2019). (2002). [15] Mikel Galar, Alberto Fernandez, Edurne Barrenechea, Humberto [36] Kai Ming Ting, ‘An instance-weighting method to induce cost-sensitive Bustince, and Francisco Herrera, ‘A review on ensembles for the trees’, IEEE Transactions on Knowledge and Data Engineering, 14(3), class imbalance problem: bagging-, boosting-, and hybrid-based ap- 659–665, (2002). proaches’, IEEE Transactions on Systems, Man, and Cybernetics, Part [37] Ryan Turner, Jane Hung, Eric Frank, Yunus Saatci, and Jason Yosinski, C (Applications and Reviews), 42(4), 463–484, (2011). ‘Metropolis-hastings generative adversarial networks’, arXiv preprint [16] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David arXiv:1811.11357, (2018). Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, [38] Wouter Verbeke, Karel Dejaeger, David Martens, Joon Hur, and Bart ‘Generative adversarial nets’, in Advances in neural information pro- Baesens, ‘New insights into churn prediction in the telecommunication cessing systems, pp. 2672–2680, (2014). sector: A profit driven data mining approach’, European Journal of Op- [17] Sarah J Graves, Gregory P Asner, Roberta E Martin, Christopher B An- erational Research, 218(1), 211–229, (2012). derson, Matthew S Colgan, Leila Kalantari, and Stephanie A Bohlman, [39] Xing-Ming Zhao, Xin Li, Luonan Chen, and Kazuyuki Aihara, ‘Protein ‘Tree species abundance predictions in a tropical agricultural landscape classification with imbalanced data’, Proteins: Structure, function, and with a supervised classification model and imbalanced data’, Remote bioinformatics, 70(4), 1125–1132, (2008). Sensing, 8(2), 161, (2016). [40] Panpan Zheng, Shuhan Yuan, Xintao Wu, Jun Li, and Aidong Lu, [18] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, ‘One-class adversarial nets for fraud detection’, in Proceedings of the and Aaron C Courville, ‘Improved training of wasserstein gans’, in AAAI Conference on Artificial Intelligence, volume 33, pp. 1286–1293, Advances in neural information processing systems, pp. 5767–5777, (2019). (2017). [19] Xin Guo, Johnny Hong, Tianyi Lin, and Nan Yang, ‘Relaxed wasser- stein with applications to gans’, arXiv preprint arXiv:1705.07164, (2017). [20] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao, ‘Borderline-smote: a new over-sampling method in imbalanced data sets learning’, in Inter- national conference on intelligent computing, pp. 878–887. Springer, (2005). [21] Haibo He and Edwardo A Garcia, ‘Learning from imbalanced data’, IEEE Transactions on knowledge and data engineering, 21(9), 1263– 1284, (2009).