A Novel Trio-Hybrid for Detecting Fraudulent Credit
Card Transactions
Sarika Jain1 , Shripriya Dubey1 , Namrata Tiwari1 , Yashvi Jain1 and Atef Shalan2
1
    National Institute of Technology Kurukshetra, Haryana, India
2
    Georgia Southern University, Georgia, United States


                                         Abstract
                                         In this era of digitization, credit card frauds shake down the spirits of not only the customers but also
                                         the merchants, which incurs a loss of billions of dollars globally. To combat such frauds, a robust and
                                         responsive system is needed that can flag the fraudulent transaction instantly before it happens. The
                                         existing systems are great at detecting and battling with fraud after it has happened but slouch in case of
                                         prevention of such crimes. They aren’t good¬¬ at optimization and also struggle in terms of response
                                         time. The inefficiency of existing systems is attributed to either working on a single machine learning
                                         technique, or just combining two of them. We present a Trio-Hybrid of K-means, Genetic algorithm, and
                                         artificial neural network approaches to deal with the aforementioned problems. The K-means algorithm
                                         helps in reducing the training time of neural networks and the genetic algorithm helps in feature selection
                                         to prevent the neural network from being over-trained, thereby reducing the cost of the system. We
                                         leverage the benefits provided by these three techniques and put them together into a trio for the first
                                         time and achieve an accuracy of 99.94% in detecting the fraudulent credit card transactions.


1. Introduction
The unauthorized and ill-intended use of credit cards to commit a crime and causing monetary
harm to its owner is defined as Credit Card fraud [1]. Frauds with credit cards contribute to a
major part in the domain of crime through digital resources. As digitalization is spreading its
roots in the global arena, the bulk of transactions take place via credit cards all over the world.
For this market to thrive, the credibility of credit cards is mandatory. As with the hike in the
number of credit card users, frauds with Credit Cards also rise over the globe. They cause the
loss of billions of dollars to companies and customers and impose a huge loss on the growth of
any business; and if not controlled they might dump harm on the country’s economy [2]. The
mannerism of committing credit card fraud keeps evolving and changing due to tech-savvy and
shrewd fraudsters; hence, there must be devised a way to put a leash on the ever-evolving fraud
techniques to save the world from huge economic losses. There are many ways in which credit
card fraud can be carried out such as skimming, stealing, robbing the details, putting chips in

ACI’22: Workshop on Advances in Computation Intelligence, its Concepts Applications at ISIC 2022, May 17-19, Savannah,
United States
*
  Corresponding author.
$ jasarika@nitkkr.ac.in (S. Jain)
 https://sites.google.com/view/nitkkrsarikajain/ (S. Jain)
 0000-0002-7432-8506 (S. Jain)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                          336
the ATM, cloning, phishing, spying on data from the merchant’s system, and erasing the old
data on-chip being a few of them. The pattern of credit card fraud is dynamic as tech-savvy
fraudsters pose typical challenges in curbing their acts [3]. A good and effective system for
the detection of fraudulent credit card transactions must have the robustness and adjustability
factor to the changing environment otherwise it might prove to be useless and futile [4]. The
characteristics of a good and effective fraud detection system are (a) Accuracy, i.e. the frauds
detected by the system must be correct, (b) The fraudulent transaction must be detected while
it’s being processed (or while it’s in transit) and not after its completion, and (c) The system
must not misconceive the non-fraudulent or genuine transactions as fraud [5]. Here we list the
envisioned applications and use cases of such a fraudulent credit card transaction detection
system and also the challenges involved in modelling such a system.
   Applications and Use Cases of a Credit Card Fraud Detection System

   1. Any terminal that provides the electronic hardware used to swipe cards such as the Point
      of Service terminals at retail stores.

   2. Assistance at the merchants’ site for example merchants like insurance companies will be
      able to identify a fraud client easily.

   3. Check at the bank’s end. Banks may take a step on detecting fraud taking place associated
      with an account in their bank.

   4. Real-time software systems for example e-ticketing services.

   5. Websites that provide e-commerce services such as online shopping etc.

  Challenges Involved in modelling a Fraud Detection System

   1. Inaccessible data or lack of availability of data. The data associated with a customer’s
      credit card and account information is highly confidential hence no bank or company
      easily avails the dataset of their customers. As a result, the data is not readily available.

   2. Imbalance in the data i.e. the number of fraud transactions is very less as compared to
      the genuine ones.

   3. The behavior of a fraudulent profile keeps changing and its nature is dynamic.

   4. The time taken by the system to decide whether a transaction is fraud or not must be
      very less.

   5. Overlapping of transactions i.e. a genuine transaction’s nature is very similar to a fraud
      one.

   6. Features and parameters to be processed are very large in number.

   7. Selecting the optimal parameters or features is a demanding and challenging task.

   8. The noise in the data needs to be managed and altered.


                                               337
   The pattern followed by all the fraudulent transactions is generally very similar and we can
categorize some transaction as fraud using any of the following pattern recognition systems as
K-Nearest Neighbour (KNN), Artificial Neural Networks, Fuzzy Logic Based System, Artificial
Immune System, Naïve Bayesian Network, Hidden Markov Model, Support Vector Machine
(SVM), Decision Trees, Ensemble Classifier and Logistic Regression. We should know their
advantages and disadvantages in order to leverage the benefits, when some technique is chosen
to be applied. Table 1 shows the certain advantages and disadvantages of various techniques
for credit card fraud detection.
   This work presents a novel trio-hybrid of artificial neural network, k-means clustering, and
genetic algorithm (GA) that can precisely and accurately detect and prevent fraudulent credit
card transactions while they are in transit. The major contributions and the objectives of the
proposed work are (i) to minimize the time required to train the neural network system by
using k-means clustering, (ii) utilizing a genetic algorithm to prevent the system from being
over-trained, thereby reducing the cost of the system. We have provided an algorithm for the
trio-hybrid of the three mentioned approaches and have found an Accuracy of 99.94% and Loss
Value of 0.561%. The paper has the following sections: Section II is the Literature Review of the
noteworthy existing works that are parallel to the proposed solution and a comparative analysis
between them followed by some of the benchmark systems in credit card fraud detection.
Section III contains an explanation of the methodologies used in the proposed model followed
by algorithms used. Section IV is the operational analysis which contains the architecture and
flowchart used in the system and the combined algorithm is also explained. Section V explains
the metrics on which the system is evaluated and summarizes various operational results and
findings. Section VI has the conclusion along with the future scope of the proposed solution.


2. Literature Review
A number of researchers are doing experiments to achieve better accuracy and reduce time
for detecting the credit card frauds [6, 7, 8, 9, 10]. [11] have provided a review on the different
credit card fraud detection practices. [12] proposed a hybrid of Bayesian Networks and Artificial
Neural Networks as a technique to detect frauds in credit cards. Their paper consists of a
discussion which states the speed of Bayesian networks was accelerated by ANN and after a
short training period, they gave good results. [13] put forward a hybrid technique of artificial
neural networks and decision trees. Firstly, the classification results obtained by the Decision
tree and Multilayer perceptron generated a new dataset. This new dataset is then fed into
Multilayer perceptron for the final classification of the data. High reliability is obtained by this
model as a result of a very low false detection rate. [14] in their paper have devised a hybrid of
K-means clustering combined with the Hidden Markov Model (HMM) and Multilayer Perceptron
(MLP). The dataset is fed to K-means and then its output is given to HMM and MLP for their
training which classifies the incoming transaction. As seen in the observations, the combination
of “MLP with K-means clustering” gives more accurate results or higher accuracy. However,
when used with 10-fold cross-validation the result is reversed. [15] in their paper Credit Card
Fraud Detection Using Autoencoder Neural Network have proposed a de-noising autoencoder
neural network (DEA) algorithm to handle the imbalanced nature of Credit Card datasets along


                                                338
  Techniques       Advantages                                   Disadvantages
                  • ANN has ability to learn from the past. It • High processing time in case of
  Artificial        does not need to be reprogrammed.            large neural networks.
  Neural          • ANN is capable of detecting the fraudu- • Excessive training required. It is dif-
  Network           lent activity during the transaction.     ficult to set up and operate.
                                                               • Sensitivity to data format.

                  • High detection and processing speed.       • Excessive training needed.
  Bayesian
  Network         • High accuracy                              • Expensive

                  • Deliver a unique solution, by choosing an
                                                              • Poor at processing large datasets
  Support           appropriate generalization code
  vector                                                       • Expensive. It has a low speed of
  machines        • Robust
                                                                 detection.
                                                               • Medium accuracy. It lacks trans-
                                                                 parency

                  • Very fast in detection/accurate            • Low speed of detection
  Fuzzy Logic     • High maintainability                       • Highly expensive

                                                               • Cannot detect fraud during the
                  • High flexibility. Explainable
                                                                 transaction.
  Decision Tree
                                                               • The algorithm is complex. Even a
                  • Easy to understand and implement             small change in data can distract
                                                                 the structure.
                  • Can handle nonlinear data as well

                  • Capable of detecting frauds at the time • Cannot detect fraud in the initial
                    of the transaction                        few transactions.
  Hidden
  Markov          • Reduces the false positive                 • Not scalable to large size datasets
                                                               • Expensive

                  • Predictive model is not required before • The method accuracy depends on
  K-nearest         classification                            the measure of distance
  Neighbour                                                    • Cannot detect fraud during the
                                                                 transaction.


with SMOTE (Synthetic Minority Oversampling Technique) and SoftMax function in neural
network classification to model the system. [16] in their paper have trained their model using
Artificial Neural Network and three different learning mechanisms Gradient Descent Adaptive
Learning, Bayesian Regularization (BR) and LM algorithm, and found that BR gave the best
results. Table 2 shows the various systems parallel to the proposed system.


                                                    339
 Literature
              Approach                 Finding       Remarks
 Reference
                                                     ANNs are faster in detecting frauds through
              Naïve Bayes and Ar-      83.14%
                                                     Bayesian network give better results with a
 [12]         tificial Neural Net-     detection
                                                     shorter training period but they are comparatively
              work                     rate
                                                     slower.
              Multi-Layer   Per-
                                       99.89%        The proposed intrusion detection system will have
              ceptron     Neural
 [13]                                  detection     a high detection rate and low false alarm rate.
              Networks and Deci-
                                       rate          Hence the results produced are reliable.
              sion Tree
                                       93.90%
              Fuzzy Clustering and                   By clubbing clustering technique with learning
 [17]                                  detection
              Neural Network                         aids in effective detection of frauds.
                                       rate
              K-means Clustering
                                                      It is observed that Multilayer Perceptron com-
              with Multilayer Per-     80.5%
                                                      bined with K - means clustering has outperformed
 [14]         ceptron Algorithm        detection
                                                      the combination of Hidden Markov Model with K
              and Hidden Markov        rate
                                                     - means clustering.
              Model
                                                     From the results, it can be concluded that the
              Autoencoder Neural       84% detec-    imbalance and noise in the minority class of the
 [15]
              Network                  tion rate     dataset could be removed using the autoencoder
                                                     method.
                                                     There are various training techniques available
                                       95.57%
              Artificial Neural Net-                 to apply on the network. It was found that the
 [16]                                  detection
              works                                  Bayesian Regularisation Technique provided re-
                                       rate
                                                     sults with the best accuracy comparatively.
                                                     The desired features were extracted from two dif-
                                                     ferent datasets taken from the Kaggle repository
                                       99.58%
                                                     using Principal Component Analysis (PCA) and
 [8]          LSTM-RNN                 detection
                                                     then preprocessed using Arbitrary Assignment
                                       rate
                                                     Method and Min–Max scalar algorithm for Nor-
                                                     malization.
Table 1
Various Systems Parallel to the Proposed System


3. Materials and Methods
Neural Networks have the powerful capability of identifying patterns and the correlations and
differences among those patterns. K-means Algorithm can detect outliers even in overlapping
pattern set and Genetic Algorithm uses powerful concepts of evolution to generate an optimized
system. Together they make a very fast, precise, and powerful fraud detection system. The
advantages provided by these three techniques are leveraged in this tri-hybrid approach. In this
section, the working of each of them is explained in detail.


                                                   340
3.1. K-means Clustering
Fuzzy Clustering also called soft clustering or K-means clustering is a clustering technique
that separates data points into different clusters based on how much similar are they to each
other and how much they differ from other data points [18]. In fuzzy clustering, one data point
can belong to more than one cluster. Clusters are differentiated based on similarity measures
such as intensity, connectivity, and distance. Fuzzy Clustering makes our system have higher
accuracy and Lower False Alarm Rates. Given n data points 𝑥1 , . . . , 𝑥𝑛 K-means aim to find k
no. of centers 𝑐1 , . . . , 𝑐𝑘 and assignments 𝑞1 , . . . ., 𝑞𝑛 of the data points to the centers such that
sum of distances is minimized.
                                                        𝑝 ∑︁
                                                       ∑︁  𝑘
                       𝐸(𝑐1 ......𝑐𝑘 , 𝑞1 ....𝑞𝑛 ) =             𝑛 * ||𝑥𝑖 − (𝑐𝑞𝑖 )||𝑝 * 𝑝
                                                       𝑖=1 𝑞=1

   The first center c1 is selected at random from the data points x1,. . . ,xn, and then the distance
between this center and all points xic1p is calculated. The second center c2 is then selected
from the data points with their probability proportional to the distance. Using the minimum
distance to the centers collected so far; repeat the procedure to obtain other centers. Each data
point is assigned to the cluster from which it has a minimum distance. To calculate the distance
of each data point from each of the centroids Euclidian Distance has been used. Other methods
that can be used are Cosine Distance (cosine of the angle between the data points), Manhattan
Distance (absolute difference between coordinates of the two data points), and Minkowski
Distance (average or generalized distance). Once each data point has been assigned a cluster,
we re-calculate the centroid as a mean of all its constituting data points. Then a cluster of all
the data points is re-calculated and this process repeats till no data point shifts between clusters.
The output of clustering is used to find out the transactions that are the outlier. An outlier is an
object that is inconsistent concerning our data. They do not confer to the normal data and hence
need to be evaluated. In the case of credit card fraud detection, we have a highly unbalanced
nature of data i.e. the number of data of frauds in comparison to a genuine transaction is very
minuscule. Hence finding transactions that are the outlier and sending only those to the neural
network for classification not only makes our system better trained but also faster. Outlier
detection is done by calculating the distance between each transaction in a cluster with the
center of the cluster. All the transactions that fall above a threshold value that is calculated as
the average of all the distances are assigned as the outlier.

3.2. Genetic Algorithm
A genetic algorithm is an evolutionary optimization technique [3]. It is a search algorithm
based on the mechanics of natural selection and genetics. Genetic algorithms simulate the
process of natural evolution, wherein each coming generation is made better and better by
selecting the fittest individuals for reproduction. It searches for an optimal solution by feeding
the candidate solutions into an algorithm, computing their fitness, and eliminating the worst-
performing members. It then selects the rest of the members and produces offspring from them
by performing some selection criteria such as crossover or mutation. These offspring become
the new population which is again fed into the algorithm and the entire procedure is repeated


                                                       341
till a stopping criterion is met, thus increasing the fitness of the system. The fitness function
used in the model is the mean square error method also known as cross-validation score in
genetic algorithm.
                                              𝑀
                                       1 ∑︁
                                𝑀 𝑆𝐸 =     (𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑 )2
                                       𝑀
                                              𝑖=1
  A Genetic Algorithm is used in the system to optimize four parameters which are:

   1. Number of layers (depth),

   2. Neurons in the layer (width),

   3. Dense layer activation function, and

   4. Network Optimizer.

   The aforementioned parameters are chosen over the others because they are the most crucial
parameters which play an important role in the correct classification. Hence, these parameters
need to be made stronger than the others so that they have a greater say in output. If the number
of layers is less than this could lead to weak computing and faulty processing while a large
number of layers would cause the neural network to slow down. Hence, it is optimized using a
genetic algorithm. Generally, how many neurons would comprise the input layer is decided by
the number of variables in the input dataset which is being processed. The calculation of the
number of neurons in the hidden layer is tricky but by the rule of the thumb, they should be
smaller than or equal to the minimum number of neurons in the input or output layer approx.
3 the size of the input layer plus the output layer. This is more accurately computed by feeding
2

data on synapses to the genetic algorithm. A genetic algorithm helps the neural network by
discarding the unnecessary as well as the insignificant neurons; thus, speeding up the learning.
The dense layer activation function plays a major role in deciding which neuron will be activated
which in turn has a significant impact on a correct classification. Network optimizers play a
significant role in minimizing the loss function thereby contributing towards the error-free
output. Hence, it needs to be chosen for optimization for genetic algorithms.

3.3. Neural Network
A neural network is a network that comprises several nodes (neurons) present in each layer.
Each node of a layer is connected to every other node in the next layer and each edge connecting
them has a weight assigned to them [19]. Activation of each node in the next layer depends on
the sigmoidal function which computes data in each node in the previous layer by using the
weights of the edges connecting them and some bias to give the activation of the next node.
The nodes in layer one activate the nodes in layer two and further the process continues until
finally a node in the final layer is activated, which is considered as the output. The total weight
on a neuron Y with three input neurons will be
                                              1
                                                   −𝑦
                                           (1 + 𝑒)


                                               342
And activation of neuron Y will be given by

                             𝑌 = 𝑊𝑖 𝐼𝑖 = 𝑊1 𝐼1 + 𝑊2 𝐼2 + 𝑊3 𝐼3

   Feed Forward Back Propagation Learning Algorithm In this fraud detection system, a five-layer
feed-forward back propagation neural network is being used with supervised learning which
is one of the most powerful learning algorithms [20]. The feed-forward will diagnose the
transactions while the back propagation will calculate errors generated and accordingly correct
the weights on the edges as they play the main role in the activation of nodes and thus in the
output. Supervised learning means that we will be provided with the pair of input and output
data values and will compare the generated output with the desired output to calculate the
percentage of error. 𝑊 𝑒𝑖𝑔ℎ𝑡+ = 𝐸𝑟𝑟𝑜𝑟 * 𝐼𝑛𝑝𝑢𝑡 * 𝑂𝑢𝑡𝑝𝑢𝑡(1 − 𝑂𝑢𝑡𝑝𝑢𝑡) The feed-forward
back propagation algorithm carries the data in one direction and does not allow loops either in
a forward direction or in a backward direction.


4. Operational Analysis
The dataset applied in the present study is taken from the Kaggle repository, a subsidiary
of Google LLC and is available at https://www.kaggle.com/mlg-ulb/creditcardfraud/home. It
contains transactions made by credit cards in September 2013 by European cardholders over
two days, having 492 frauds out of 284,807 transactions. The dataset is highly unbalanced with
the positive class (frauds) accounting for 0.172% of all transactions. It has in total 31 features
out of which 28 correspond to attributes of a customer like name, age, occupation, location,
account balance, type of card, etc. For security purposes, they have been PCA transformed. The
other three features are time, amount, and class.

4.1. System Architecture
Figure 1 depicts the architecture of our proposed model. The incoming credit card transaction is
first matched against the fraud history database. Then with that information, it is passed to the
customer profile analyst and then to the deviation analysts, who calculate the profile score and
deviation score of that particular incoming transaction respectively. Then that transaction is
fed as an input to the Fraud Detection System with those scores for future reference. This Fraud
Detection System is a hybrid of the three most efficient machine learning techniques used for
the detection of fraud namely K- Means Clustering, Genetic Algorithm, and Artificial Neural
Network. If the transaction is detected to be normal then it follows the regular flow and proceeds
to completion else if any anomaly is detected or risks are associated with that transaction then
the alarm is raised. An alarm is raised in form of maybe OTP, e-mail, call, or text message
to the customer as per the implementation. If the customer owns the transaction, then the
transaction proceeds to completion. If the customer denies having initiated that transaction
then the transaction is aborted then and there.


                                               343
Figure 1: Architecture


4.2. Flowchart
Figure 2 depicts the flowchart of the working of the system. The dataset used is partitioned
into two sets, Train Dataset and Test Dataset in the ratio 4:1. The training dataset then goes
into the K-means clustering algorithm where clusters are formed and the transactions are then
categorized to be of either low risk or high risk. If the transaction is of low risk then the bias
send to the Neural network is lower otherwise it is higher. This bias of each transaction is
input to a genetic algorithm which transfers this information to the neural network. A genetic
algorithm (GA) also optimizes the parameters and the initial weight matrix that is input to a
neural network for training. GA first randomly generates a generation, evaluates its fitness
using fitness scores, and generates further generation using selection criteria of crossover and
mutation. Once the stopping condition is met the dataset with the information of bias and
parameter selection (weight matrix) is send as input to Neural Network to get trained by using
the feed-forward back propagation algorithm. After training has been completed, a trained
system is achieved on which test dataset runs to find the accuracy of the system.


                                               344
Figure 2: Flowchart


4.3. Algorithm Development
The dataset is firstly pre-processed by splitting into train dataset and test dataset. Split the
dataset into 70:30 ratio for training and validation (testing) datasets i.e. 70% of the dataset rows
are used as train dataset and 30% as the test dataset. The training dataset contains the rows and
columns which would be used for training the network. The test data is then used against the
trained network to measure the accuracy and other evaluation metrics. The last column “class”
of the dataset is dropped as it is not needed during the training part. Removal of non-constant
features is useful for better training of the model. After forming the clusters in the dataset,
the distance of each point is calculated from its center, this is the K-means application. These
data points are nothing but different column names in the dataset like be the location of the


                                                345
transaction, amount of transaction, etc. The closest cluster to each point is predicted iteratively
until there is no shift between the clusters. This process optimally reduces the dataset for
the genetic algorithm as the assignment of high and low bias has been included in each data
point’s information. The genetic algorithm now plays its part on these data points for the best
feature selection. These features or parameters decide the structure of the neural network. The
genetic algorithm forms the initial generation by creating a random combination of features. For
example, for one neural network no. of layers might be 4 and for another, it might be 5. For each
of these combinations of genes, the genetic algorithm performs mutation on some, crossover
on some, and uses some (which have the highest score) directly for the next generation. As a
result, the best suitable structure for the network is decided and then this network is formed
using the Sequential() method which is a predefined method in the Sequential class of the Keras
library. As the neural network is formed using the parameters given by the genetic algorithm,
the test dataset comes into play. This data is then given to the neural network and the evaluation
metrics are measured as the network shows which transaction is fraud and which is genuine to
raise the alarm.
   Here are the steps involved in the algorithm explained above. The input and the final output
are mentioned before the algorithm begins. The input is a CSV file named creditcard.csv which
contains the PCA reduced values of various transactions. The column names denote the features
which will be clustered using the K-means clustering and the rows in the dataset are values
corresponding to different transactions.
   Algorithm: Trio-Hybrid Algorithm for Detection of Fraudulent Credit Card Transactions
Input: credicard.csv dataset
Output: Scalar consisting of loss and the values of the metrics.
1. Pre-process the dataset
    • Split the dataset into train dataset D and the test dataset.

    • Drop the column “class” of the dataset.

    • Scale the train dataset D for standardization i.e. remove non-constant features.

2. Apply the techniques i.e. K-means clustering, genetic algorithm, and neural network.
2a. K-Means Clustering
    • Form clusters and divide the sample data D among all clusters.

    • Calculate the distance of each point in sample space from all the clusters using K-means
      algorithm.

    • Predict the closest cluster to which each sample in D belongs.

    • If the sample point is already closest to its own cluster
         – Stop
      Else
         – Move the sample point to the cluster it has least distance with.

    • Repeat the above step until no sample shift between the clusters.


                                                346
    • The dataset has now been optimally reduced after the data points have been grouped
      accordingly into the clusters having similar data points.

2b. Genetic Algorithm
    • The data points with their cluster information i.e. high risk or low risk are then used by
      the genetic algorithm to form a random combination of parameters such as number of
      layers in the network, number of neurons in the layer etc. to be used in the formation of
      most suitable neural network. This random combination forms the first generation for
      the genetic algorithm.

    • Store this random combination of parameters into a variable GENES.

    • Calculate the fitness score upon each of the generated combination or GENES by mean
                                                 1 ∑︀𝑀
      squared error fitness () function: 𝑀 𝑆𝐸 = 𝑀 𝑖=1 (𝑦𝑡𝑟𝑢𝑒 − 𝑦𝑝𝑟𝑒𝑑 )2 Lesser the score, fitter
      the GENES.

    • Sort the population of these combinations into increasing order of fitness score.

    • If fitness score is found as 0
         – Stop
      Else
         – 10% of the current fittest population is taken to be included in the next generation
         – 50% are mutated to produce offspring i.e. take two combinations and permute the
           combination of parameters i.e. perform crossover
         – For the rest of the GENES, insert a random parameter into their combination i.e.
           perform mutation.
         – Repeat the above steps until a best score is achieved for a set of parameters.

    • The most suitable parameters for the neural network are thus obtained after best feature
      selection by the genetic algorithm. grouped accordingly into the clusters having similar
      data points.

2c. Artificial Neural Network
    • Create object of Sequential()

    • Use this object to add input and output layers to the model (making of the network).

    • Fit the data into the network i.e. train the network using Epoch=200, batch size=500

    • This network is then tested against the test dataset by considering the output produced
      which is a scalar consisting of loss value and metrics value to indicate the fraud and
      genuine transactions.

End


                                              347
                (a) with k=10                                          (b) with k=20


                (c) with k=15                             (d) Accuracy and FAR at different values

Figure 3: K-means (a) with k=10 (b) with k=20 (c) with k=15 (d) Accuracy and FAR at different values


5. Experimental Analysis
The results obtained by experiments and the metrics on which they are evaluated are explained
below in the two subsections. The results are analyzed and discussed through graphs and figures
obtained through experiments on different parametric values. No proposal can be modelled
into a system without some experiments to support it. The results and outputs included have
been produced by this system under various inputs and parameters. To get the optimal value of
K in the K-means algorithm we varied the number of clusters from 10 to 20. Figure 3a shows
clustering when K=10 and Figure 3b shows clustering when K=20. It depicts that with an increase
in the number of clusters there is a higher chance of forming better clusters. While running
the K-means clustering algorithm on different K values it was observed that the best accuracy
is given when K was 15. Figure 3c shows the output produced by the K-means clustering
algorithm when K=15. In figure 3d the accuracy of the algorithm at different K values is shown.
Fig. 4(a) and Fig. 4(b) shows the cross-validation score of best genes and average genes in
each generation at seed value 108 and 10000 respectively. The relationship between initial seed
values given to genetic algorithm and mean square error of cross-validation score generated


                                                348
Figure 4: Graphs showing different mean square error after feature selection


for that seed has been depicted. It can be observed from Fig.4(c), that when the seed is around
5000 the mean square error is minimum. It increases both decreasing the seed value as well
as increasing the seed value. It is shown that before feature selection of the most important
features, the mean square error was 37.13% which was reduced to 28.92%. Although the number
of generations taken for the average and best features in graphs Fig.4(a) and Fig.4(b) to reduce
the fitness score (lesser the score, better the generation) is a little lesser compared to Fig.4(c)
i.e. on a seed value of 5008, it is compensated by the reduced mean square error. This score
hugely depends on the initial seed value. Fig.5a shows the time required by the neural network
to train at different Epoch values. Fig.5b depicts that when we increase the batch size the time
taken to train the neural network gets reduced. With the increase in batch size, the loss value
decreases up to a certain batch size which is shown in Fig.5c. We get the least loss value at a
batch size equal to 500 and after that increasing the batch size though decrease the time taken
by the neural network to get trained but it also increases the loss value. After getting a trained
neural network we run the test data set on the system and the observations made on the output
are mentioned below. Accuracy = 99.94% Loss Value = 0.561%


6. Conclusion
The system described here is faster than the present systems as the training time of neural
networks is reduced by the use of the k-means algorithm and the efficiency is increased by
the use genetic algorithm. The genetic algorithm helped in the best parameter selection and
removed the redundant parameters. 100 epochs are ideal for this fraction of the dataset since
lesser than this will cause under-fitting of the system and if more epochs are used then the
system will be over-trained for that particular dataset. Henceforth, the results indicate that the
hybrid of these three techniques gave a faster and optimized system which is the need of the


                                                349
              (a) Time vs. Epoch                                     (b) Time vs. batch size


                                     (c) Loss value vs. Batch size

Figure 5: (a) Time vs. Epoch (b) Time vs. batch size (c) Loss value vs. Batch size


present global scenario.
  The loophole in the existing systems is that they aren’t able to adapt themselves quickly in
the changing environment which is compensated by the use of k-means clustering The only
setback of the aforementioned system as of now is the cost of its implementation since its
complex to implement because of its hybrid nature. Fraud detection is a field that will never be
dormant as there are always new strategies that can be found to commit fraud. Various other
hybrid techniques could be experimented as future scope for better results.


References
 [1] M. Zanin, M. Romance, S. Moral, R. Criado, Credit card fraud detection through parenclitic
     network analysis, Complexity 2018 (2018).
 [2] M. S. Kumar, V. Soundarya, S. Kavitha, E. Keerthika, E. Aswini, Credit card fraud detection
     using random forest algorithm, 2019 3rd International Conference on Computing and
     Communications Technologies (ICCCT) (2019) 149–153.
 [3] S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, Data mining for credit card fraud:
     A comparative study, Decision support systems 50 (2011) 602–613.


                                                  350
 [4] R. Patidar, L. Sharma, et al., Credit card fraud detection using neural network, International
     Journal of Soft Computing and Engineering (IJSCE) 1 (2011).
 [5] M. Zareapoor, K. Seeja, M. A. Alam, Analysis on credit card fraud detection techniques:
     based on certain design criteria, International journal of computer applications 52 (2012).
 [6] M. Seera, C. P. Lim, A. Kumar, L. Dhamotharan, K. H. Tan, An intelligent payment card
     fraud detection system, Annals of operations research (2021) 1–23.
 [7] N. K. Trivedi, S. Simaiya, U. K. Lilhore, S. K. Sharma, An efficient credit card fraud detection
     model based on machine learning methods, International Journal of Advanced Science
     and Technology 29 (2020) 3414–3424.
 [8] O. Owolafe, O. B. Ogunrinde, A. F.-B. Thompson, A long short term memory model for
     credit card fraud detection, in: Artificial Intelligence for Cyber Security: Methods, Issues
     and Possible Horizons or Opportunities, Springer, 2021, pp. 369–391.
 [9] N. Boutaher, A. Elomri, N. Abghour, K. Moussaid, M. Rida, A review of credit card fraud
     detection using machine learning techniques, in: 2020 5th International Conference on
     Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech),
     IEEE, 2020, pp. 1–5.
[10] R. Dash, R. Rautray, R. Dash, A legendre neural network for credit card fraud detection,
     in: Intelligent and Cloud Computing, Springer, 2021, pp. 411–418.
[11] Y. Jain, N. Tiwari, S. Dubey, S. Jain, A comparative analysis of various credit card fraud
     detection techniques, Int J Recent Technol Eng 7 (2019) 402–407.
[12] S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick, Credit card fraud detection using
     bayesian and neural networks, in: Proceedings of the 1st international naiso congress on
     neuro fuzzy technologies, volume 261, 2002, p. 270.
[13] J. Esmaily, R. Moradinezhad, J. Ghasemi, Intrusion detection system based on multi-layer
     perceptron neural networks and decision tree, in: 2015 7th Conference on Information
     and Knowledge Technology (IKT), IEEE, 2015, pp. 1–5.
[14] S. G. Fashoto, O. Owolabi, O. Adeleye, J. Wandera, Hybrid methods for credit card fraud
     detection using k-means clustering with hidden markov model and multilayer perceptron
     algorithm (2016).
[15] J. Zou, J. Zhang, P. Jiang, Credit card fraud detection using autoencoder neural network,
     arXiv preprint arXiv:1908.11553 (2019).
[16] C. Mishra, B. Gupta, R. Singh, Credit card fraud identification using artificial neural
     networks, International Journal of Computer Systems 4 (2017) 151–159.
[17] T. K. Behera, S. Panigrahi, Credit card fraud detection: a hybrid approach using fuzzy
     clustering & neural network, in: 2015 second international conference on advances in
     computing and communication engineering, IEEE, 2015, pp. 494–499.
[18] P. Chougule, A. Thakare, P. Kale, M. Gole, P. Nanekar, Genetic k-means algorithm for
     credit card fraud detection, International Journal of Computer Science and Information
     Technologies (IJCSIT) 6 (2015) 1724–1727.
[19] T. Razooqi, P. Khurana, K. Raahemifar, A. Abhari, Credit card fraud detection using fuzzy
     logic and neural network, in: Proceedings of the 19th Communications & Networking
     Symposium, 2016, pp. 1–5.
[20] F. Amato, N. Mazzocca, F. Moscato, E. Vivenzio, Multilayer perceptron: an intelligent
     model for classification and intrusion detection, in: 2017 31st International Conference on


                                                351
Advanced Information Networking and Applications Workshops (WAINA), IEEE, 2017, pp.
686–691.


                                     352