A Cost-Sensitive Cosine Similarity K-Nearest
          Neighbor for Credit Card Fraud Detection
                                Sara Makki∗§ , Rafiqul Haque† , Yehia Taher‡ , Zainab Assaghir§ ,
                                       Mohand-Saı̈d Hacid∗ and Hassan Zeineddine§
                                     ∗ Laboratoire LIRIS, Université de Lyon, Villeurbanne, France
                                       † Cognitus Centre for Big Data Science R&D, Paris, France
                                   ‡ Laboratoire DAVID, Université de Versailles, Versailles, France
                                                   § Lebanese University, Beirut, Lebanon


   Abstract—Credit card fraud commonly happens in financial                 prevent frauds specifically credit card fraud (e.g., [1], [2], [3]).
institutes such as banks. Fraud results in a huge financial damage          However, the major challenge is the skewed data distribution
that may reach to billions of dollars every year. Detecting and             also known as the class imbalance. In this problem, the
preventing credit card fraud manually is a labor intensive and
relatively ineffective approach. Therefore, a significant effort            dataset is extremely imbalanced and highly skewed [4]. This
was made to develop automated solutions for fraud detection.                problem can be summarized as having legitimate transactions
Researchers dedicated their works on designing and developing               examples in the training set a lot more than fraudulent ones.
models and systems in particular, the fraud anlaysis systems                This phenomenon makes a balanced training very difficult
that enable to detect different types of fraud in different sectors         and detection of fraud will be more challenging due to
including insurance, telecommunication, financial audit, financial
markets, money laundering, credit card, etc. However, some                  inadequate training examples. In the case of credit card fraud,
problems remains unsolved. Of all, the most prevalent one is the            the imbalance is often extreme (the fraudulent transactions in
extreme class imbalance. In this paper, we aimed at addressing              the training set are less than 10%). Several machine learning
this problem. We focused on the K-Nearest Neighbor (KNN)                    methods were investigated for the credit card fraud detection
classifier and investigated the cost-sensitive approaches used for          and for class imbalance problem. A significant research works
KNN. Also, we presented a novel cost-sensitive KNN approach
that we developed using Cosine Similarity (CoS). We compared                have been done concerning this problem (e.g., [5], [6], [7]).
our model with the other methods to verify its efficiency, and              However, an efficient solution is still missing in state of the
we proved using several performance measures that it’s a better             art. The available imbalance classification approaches that are
approach than other KNN algorithms.                                         used, often increase the detection of the minority group at the
   Index Terms—Fraud detection, K-Nearest Neighbor, Imbal-                  cost of generating false predictions for the other class, which
anced classification, Cosine similarity, Cost-sensitive learning.
                                                                            leads to an overall decrease of the model’s accuracy.
                         I. I NTRODUCTION                                   Our Contribution: In this paper, we focused on the K-Nearest
                                                                            Neighbor (KNN) classifier. We investigated the cost-sensitive
   Credit card fraud has always been a huge interest to financial           approaches used for KNN and presented a new cost-sensitive
institutes and credit card users due to its detrimental effect              KNN approach that we developed using Cosine Similarity
in case of fraudulent event. Credit card fraud is defined as                (CoS). We compared our model with the other methods to
an unauthorized use of a credit card account. Fraudster uses                verify its efficiency, and we proved using four performance
credit card information, often remotely, without the knowledge              measures (accuracy, sensitivity, PR curve and the F1 score)
of the card owner or the card issuer. There are different types             that it’s a better approach than other KNN algorithms.
of frauds. Simple Theft (offline fraud) occurs if the card is                  The remainder of the paper is organized as follows. In
stolen; application fraud occurs when credit card applicants                Section II, we review the related work to credit card fraud
obtain new credit cards using fake identity or information;                 using KNN and imbalanced classification specially the cost-
bankruptcy fraud sometimes is a result of application fraud;                sensitive approaches. In Section III, we describe briefly the
finally, counterfeit fraud occurs when only the details of                  KNN classifier and the methods compared, then in Section IV
a legitimate card are stolen (skimming or shoulder surfing)                 we will describe the approach we developed. In Section V,
and used remotely (mobile sales, online, etc.). Detecting and               we describe the data provided and we present the results and
preventing these frauds are challenging for financial institutes.           finally a conclusion and future work.
Human-driven fraud detection approaches are labor intensive.
   Over the years, researchers have been working on devel-                                        II. R ELATED W ORK
oping an automated fraud detection system to reduce and                        Many studies investigated the skewed data distribution in
  This work partially benefited from the support of the Lebanese National   classification, and introduced ways to tackle this problem.
Council for Scientific Research (CNRS-L).                                   There are three different ways to address the class imbalance


                                                                                                                                                   42
issue [8]. The first one is on data-level, as a preprocessing step   The norm commonly used to measure the distance between
to balance classes. Usually over- or under-sampling are done         two observations p and q ∈ Rn is the euclidean distance:
before applying a machine learning algorithm. The second                                        v
                                                                                                u n
one is an algorithm-level approach, like cost-sensitive learning                                uX
                                                                                      d(p, q) = t (pi − qi )2
or one-class classification, that alter the original algorithm                                      i=1
to focus on the minority group. The basic idea behind cost-
sensitive models is to give higher weights to the small class.       The KNN algorithm also depends on a distance rule for the
It’s equivalent to assigning higher costs to false negatives.        classification using the k nearest neighbors of a new sample.
However, in the one-class classification, the training is done          1) Simple Voting: The most straightforward rule used for
using only one class, usually the minority group. The third          the classification is the simple voting. The new observation is
approach consists of a combination of the two previous               assigned to the class of the majority of the k nearest points.
approaches.                                                          The classification is done according to the following formula:
As for credit card fraud detection, researchers used different
                                                                                                   X
                                                                                      ŷ = argmax      δ(c, class(k))
machine learning algorithms to classify credit card transactions                           c∈Y
                                                                                                  k∈K
as normal or fraudulent. A survey of these methods and their
                                                                     Where ŷ represents the foretasted value of a new observation
application to credit card datasets is presented by Tripathi and
                                                                     y, c represents the possible classification categories, and K is
Pavaskar [9] and SamanehSorournejad [10].
                                                                     the subset of the chosen nearest neighbors of y. The function
   KNN was rarely investigated for credit card fraud detection.
                                                                     δ is defined as follows:
Ganji and Mannem [11] proposed a credit card fraud detection                                     
system using a data stream outlier detection algorithm which                                        0 if x = y
                                                                                       δ(x, y) =
is based on Reverse k-nearest neighbors. Whitrow et al.                                             1 if x 6= y
[12] considered transaction aggregation and showed that it is          The function argmax returns the value of the category 0 or
effective, and compared multiple methods including K-nearest         1 at which the maximum is reached, i.e. the category of the
neighbors, support vector machines and logistic regression,          majority of the neighbors.
and they concluded that random forest outperforms all of them.         2) Distance Weighted: Another method is a distance
   Credit card fraud detection is a common example of im-            weighted voting. The idea behind this approach is to take into
balanced data classification problem. Sahin et al. [13] and          account the distance of the neighbors and to assign a higher
Bahnsen et al. [14] proposed a cost-sensitive decision tree          weight to the closest ones. The prediction is done as follows:
approach that splits the data by minimizing the sum of                                          X
                                                                                   ŷ = argmax      wk δ(c, class(k))
misclassification costs. The authors compared their approach                              c∈Y
                                                                                                 k∈K
to well-known methods using a real credit card fraud dataset.
   Kamaruddin and Vadlamani [15] proposed using one-class                                                1
                                                                                                 wk =
classification approach to solve the imbalance problem. They                                             d2k
proposed a hybrid system of Particle Swarm Optimization and          Where wk is the assigned weight and dk is the euclidean
Auto-Associative Neural Network (PSOAANN), and imple-                distance between y and the considered neighbor.
mented it in a Spark computational framework.
                                                                     B. Cost Sensitive KNN
   Qibei et al. [16] proposed Imbalance Class Weighted Sup-
port Vector Machine (ICW-SVM) to detect credit card fraud,              In [17], the authors introduced two cost-sensitive approaches
according to the imbalance characteristics of the data, after        for KNN. The first one called Direct Cost-Sensitive KNN
reducing its dimension using Principal Component Analysis            (DirectCS-KNN) is based on calculating class probabilities
(PCA). They applied their model and proved its efficiency on         from the original KNN algorithm using the following formula:
a real bank dataset.                                                                                   ki
                                                                                                 Pi =
                                                                                                       k
                        III. M ETHODS
                                                                     Where Pi is the probability that the class of a new observation
   In this section, we will describe the KNN classifier algo-        y is i, and ki is the number of neighbors of class i.
rithm using simple voting or distance weighted KNN and a             The other approach was distance weighted called Distance-CS-
cost-sensitive KNN approach that was introduced by Qin et al.        KNN. Considering two categories 1 and 0, the idea consists
[17]. We will also compare these methods with a cost sensitive       of calculating costs for both classes C1 and C0 as follows:
decision tree algorithm and a one class classification support
vector machine.                                                                 C1 = f p × P0      and         C 0 = f n × P1

A. K-Nearest Neighbour (KNN)                                         f p and f n represents respectively the costs of false positives
                                                                     and false negatives. The new sample is assigned to the class
   KNN is a data mining method widely used for classification        with lower Ci . In this case, the Pi are calculated in a cost-
and regression. It is a simple algorithm that consists of using      sensitive manner different than the one in DirectCS-KNN.
the k nearest points to the one we aim to predict [18]. A                                w1                        w0
specific norm is used to measure the distance between points.                   P1 =               and P0 =
                                                                                      w1 + w0                   w1 + w0


                                                                                                                                        43
Where wi are the cost sensitive weights calculated as shown        angle’s cosine between two vectors p and q representing two
here:                                                              observations:
                              ki                                                                  v
                                                                                                  u n
                             X                                                                    uX
                       wi =      wj                                                               t      pi qi
                                j=1
                                                                                                            i=1
                                                                                  CoS (p, q) = v      v
C. Cost-Sensitive Decision Tree (C5.0)                                                         u n 2 uX
                                                                                               uX     u n
                                                                                               t   pi t   qi2
   C5.0 is one of the most common decision tree algorithms                                            i=1         i=1
using cross entropy and information gain to create the partition
and the splits of the tree. The costs are implemented to           We replaced euclidean distance used in KNN with this metric
the decision boundaries, not in the training algorithm [19],       when calculating the distance between observations in order
so the new decision boundary considered for classifying an         to find the nearest neighbors.
observation into class 1 is:                                          Note that, this metric ranges between -1 and 1. If CoS
                                                                   is close to 1, it indicates that the angle between the two
                          p1   C1/0 π0                             observations is close to zero and therefore they are similar
                             >
                          p0   C0/1 π1                             (neighbors).
                                                                      The advantage of using CoS instead of euclidean distance
Where πi is the prior probability of an observation to be          is highlighted in the coming section when we compared KNN
in class i. pi and Cj/i are respectively the estimator of the      (whether simple voting or with weighted distance) with both
probability and the cost of wrongly classifying an observation     metrics and we found that CoS is better in terms of sensitivity.
of class i as j.
                                                                   Step 2: Introducing the score Sy
D. One Class Classification Support Vector Machine                    The second step of our approach, after finding the neighbors,
   Using only the minority cases, a one-class classification       is to introduce an imbalanced classification approach while
SVM is applied with the aim of learning only the charac-           using CoS. The idea is to evaluate the similarity of an
teristics of this class [20].                                      observation to its neighbors of the minority class, taking into
The purpose of this method is to find a “small” region             account the other class as well. This was done by calculating
that groups most of the minority training observations. This       the following score Sy for each sample y:
requires defining a function f that returns 1 if a point belongs                               k
                                                                                               X
in this region and -1 elsewhere [21]. This function is found                                         Ci . CoSi
by optimizing the following problem:                                                           i=1
                                                                                        Sy =     k
                                          l                                                      X
                           1            1 X                                                            CoSi
            minimize         ||w||2 +          ζi − ρ
                           2           νl i=1                                                    i=1
                            T                                     Where:
                              w Φ(xi ) ≥ ρ − ζi
             subject to                                            • k is the number of neighbors considered for y.
                              ζi ≥ 0, i = 1, . . . , l
                                                                   • C is a vector of length k, Ci ∈ {0, 1} that represents the
Where Φ : X → F is a mapping function. This problem is             classes of the neighbors. 0 is used to denote the class of the
solved by separating the points from the origin with maximum       majority group and 1 is used for the minority group.
           ρ
margin ( ||w|| ) using a hyperplane of equation: wT Φ(xi ) = ρ.    • CoS is another vector representing the Cosine similarity
                                                                   between y and its neighbors.
    IV. C OST- SENSITIVE KNN APPROACH (C O SKNN)                   This score ranges from 0 to 1. It works similarly to a
                                                                   probability, that describes the likelihood of an observation to
   We developed a Cost-Sensitive KNN approach. Our aim was         be in the minority group. When it’s close (or equal) to zero, it
to tackle the imbalance problem by using cosine similarity         indicates that the neighbors are mostly (or all) of the majority
as a distance metric and by introducing a score for the            group, which would lead to a majority group classification.
classification. In order to improve the method’s performance in    However, when at least one of the neighbors of y is of the
terms of imbalance, we also studied the choice of the score’s      minority class, this score will be higher than zero; and closer
thresholds and the number of neighbors to consider. These          to one, the more the neighbors are of the minority group.
steps are described in the following:
                                                                   A. The classification and the score thresholds
Step 1: The use of Cosine Similarity                                 The classification is done according to a certain threshold
                                                                   α ∈ [0, 1].
   The first step of our new approach is the change of the
                                                                                           
                                                                                             0 if Sy ≤ α
distance metric used. This metric consists of calculating the                         ŷ =
                                                                                             1 if Sy > α


                                                                                                                                      44
   The choice of α is not straightforward. A very low value          high accuracy. The choice of other performance measures will
will lead to a large number of false positives (observations         be discussed later. This data is divided between train (3999
of the majority group classed as minority). However, a high          transactions) and test (2001 transactions) with similar ratio
threshold value will lead to very low sensitivity rate. Therefore,   of imbalance. For all KNN methods, 10 nearest neighbors
α should have a slightly low value. Taking into account the          are considered to classify the new samples, and data is first
imbalance ratio of ≈ 5%, α was later chosen according to             normalized using the mean and standard deviation of the
the 95th percentile of Sy , and optimized by comparing the           variables, to avoid bias towards variables with large ranges
sensitivity according to the values of α.                            [18], because the explanatory variables are on widely different
                                                                     scales.
B. The choice of k
   The choice of k has an effect on many aspects of the              B. Results and Discussion
approach. The number of neighbors should be large enough               In this section, we show the results of the comparison of
to be informative about the sample’s neighborhood.                   our approach with the other existing in the state of the art.
On the other hand, due to the imbalance, a high number               The compared methods are given in Table I.
of neighbors will make the classification biased towards the
majority group and time consuming.
                                                                                                TABLE I
After trying several possible values of k, we found that                         C OMPARED METHODS AND THEIR DESCRIPTION
considering 10 neighbors is the best for our case.
                                                                         Abbreviation                      Method’s Description
                  V. E XPERIMENT AND R ESULTS                              EucKNN            Simple voting KNN using the euclidean distance
                                                                            CKNN                Simple voting KNN using cosine similarity
   To prove the efficiency of our approach, we compared it to             DEucKNN            Distance weighted KNN using euclidean distance
the original KNN classifier using both euclidean distance and              DCKNN             Distance weighted KNN using cosine similarity
cosine similarity, with simple voting and distance weighted                                     The distance based cost sensitive approach
                                                                       Distance-CS-KNN
                                                                                                       introduced by Qin et al. [17]
approach. We also compare it with cost sensitive C5.0, one                CS - C5.0                  Cost sensitive C5.0 decision tree
class classification SVM and Distance-CS-KNN [17].                       OCC - SVM           One Class classification Support Vector Machine
                                                                          CoSKNN                Cost sensitive cosine similarity based KNN
A. Data Description
   The data we used is extracted from a credit card fraud               The use of the accuracy alone as a performance measure
labeled data1 . The data contains 10 million observations (in        is misleading. Other measures should be considered as well,
our case, the observations are financial transactions) and 8         like the sensitivity, also known as recall. The challenge is that
variables described in the following.                                most imbalanced classification methods focus on increasing
The explanatory variables are: gender representing the client’s      the sensitivity, which will lead to a slight decrease in accuracy.
gender, state is a categorical variable denoting the state where     This decrease that can sometimes be just 1% may seem
the client lives, cardholder representing the number of cards        insignificant, but in fact, it hides a high number of false alarms.
that the client have, balance is a continuous variable that          Thus, we need to rely also on measures that finds a trade-off
indicates the balance on the credit card, the number of transac-     between the high accuracy and the sensitivity; the Area Under
tions and international transactions made to date, respectively      Precision-Recall Curve (AUPRC), is used for that purpose.
denoted by numTrans and numIntTrans, and creditLine which            However, since we may not always be able to plot this curve
is the credit limit of the client, and the variable to predict       like the case of CS - C5.0 and OCC - SVM, we will also
fraudRisk indicating if each observation is fraud (denoted           use the F1 score, also known as F-measure. To evaluate these
1) and legitimate (denoted 0). 596,014 (representing 5.96%)          measures, a confusion matrix (Table II) is calculated using the
are fraud cases and 9,403,986 (representing 94.03%) are              test set.
legitimate.
   In this paper, we extracted a part of the original data due to                                    TABLE II
the time consumption of the KNN method when calculating                                     F ORM OF CONFUSION MATRIX
the distance to all observations in the training set. The new
                                                                                                                Actual
data consists of 6000 credit card transactions (observations)                Predicted
                                                                                                Legitimate (0)           Fraud (1)
in which we have 5657 Legitimate(0) cases, and 343 fraud(1).               Legitimate (0)     True Negative (TN)    False Negative (FN)
This dataset takes into account the imbalance ratio and have                 Fraud (1)        False Positive (FP)    True Positive (TP)
the same characteristics as the original one.
These proportions exemplify the extreme imbalance. The fraud                                        TP + TN
cases represents 5.7% of the dataset. The fraud detection in                 Accuracy =                          × 100
                                                                                               TN + FN + TP + FP
this case is very challenging. In fact an accuracy rate less
                                                                     The accuracy shows the percentage of the legitimate and fraud
than 94.3% is not acceptable, because simply an algorithm
                                                                     transactions correctly predicted.
that classify all data points as legitimate will give us this
                                                                                      TP                                       TP
  1 Available at http://packages.revolutionanalytics.com/datasets/      Recall =                     and     P recision =
                                                                                    TP + FN                                  TP + FP


                                                                                                                                               45
However, the recall or sensitivity focus only on the fraud
detection and the true positive rate, and the precision measures
the fraction of examples classified as positive that are truly
positives.
   The Precision-Recall (PR) curve is obtained by plotting the
precision over recall rate through different class probabilities
thresholds. The closer the curve is to the upper-right-hand
corner the better the model is. It is not always straightforward
to find the class probabilities, so we will also use the F1 score.
The higher this score the better.


                                P recision × Recall
            F1 score = 2 ×                                                            Fig. 1. PR curves for all methods
                                P recision + Recall

                                                                                VI. C ONCLUSION AND F UTUR W ORK

  Table III shows the performance measures (the accuracy, the           Credit card fraud detection is one of the most challenging
sensitivity, the AUPRC and the F1 score) for all methods and         problems for financial institutes. The financial institutes such
our new approach. The PR curves are shown in the Figure 1.           as banks are losing billions of dollars every year due to
                                                                     fraudulent activities committed by the fraudsters. Over the
                                                                     years, a number of solutions has been been proposed in
                         TABLE III
                                                                     large bodies of literature. However, some problems are still
        TABLE SUMMARIZING THE PERFORMANCE MEASURES                   open. The class imbalance problem is one of the most critical
                                                                     problems that is yet to be solved.
       Method        Accuracy    Sensitivity   AUPRC   F1 score
       EucKNN         95.8%         0.30        0.39     0.45           In this paper, we aimed at addressing this problem. We
        CKNN          95.8%         0.42        0.53     0.53        investigated the use of KNN in fraud detection. We proposed a
      DEucKNN         95.6%         0.32        0.22     0.46        cost-sensitive KNN approach to tackle the imbalance problem.
       DCKNN          95.4%         0.34        0.54     0.46
  Distance-CS-KNN     94.5%         0.53          -      0.52        We provided a comprehensive detail of our new approach. In
      CS - C5.0       93.3%         0.65          -      0.53        our approach we used cosine similarity instead of the euclidean
    OCC - SVM         88.9%         0.22          -      0.18        distance to find the neighbors, and then calculated a certain
      CoSKNN          95.5%         0.51        0.54     0.56
                                                                     score to evaluate the probability of fraud risk.
                                                                        We also presented a comparative study in this paper. In our
                                                                     study, we compared the performance of simple voting KNN
                                                                     and distance weighted KNN using both euclidean distance
This table shows that the accuracy is higher than 94.3%              and cosine similarity, with another cost sensitive KNN, de-
for all models except the CS - C5.0 and OCC - SVM.                   cision tree approach, one class classification SVM and our
The slight differences of the accuracy between all the other         new approach. The comparison was done by applying these
methods shows how much information this measure hides                methods to a credit card fraud dataset with imbalance, using
when the imbalance is extreme. We can conclude from the              multiple performance measures, mostly relying on AUPRC
table when comparing the performance measures of EuCKNN              and F1 score. This experiment shows that our approach is
with CKNN and DEuCKNN with DCKNN that the use of                     outperforming all the other methods.
cosine similarity is improving the classification according             We encountered several challenges in our study. The most
to the sensitivity, AUPRC and F1 score, with a reasonable            prevalent one is the elapsed time that restricted us the use of
decrease in accuracy.                                                number of observations for the training set which could not
  Our approach CoSKNN is outperforming all the methods               exceed 3999. This is an obvious limitation of our experiment.
according to the AUPRC and F1 score. It is considerably                 Several works have been lined up to extend our current
improving the sensitivity when compared to the simple KNN.           work. We planned to work on implementing our approach in
The other cost-sensitive models are performing better in terms       a big data environment in order to use the massive amount
of sensitivity, but at the cost of raising false alarms and          of data. Another interesting task which we planned is finding
decreasing the accuracy sometimes to a less than acceptable          an optimized threshold α that can be selected automatically
value, like the case of CS-C5.0 and OCC-SVM.                         instead of letting user to investigate and find it.


                                                                                                                                        46
                               R EFERENCES
 [1] P. Richhariya and P. K. Singh, “Evaluating and emerging payment card
     fraud challenges and resolution,” International Journal of Computer
     Applications, vol. 107, no. 14, 2014.
 [2] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, “Data
     mining for credit card fraud: A comparative study,” Decision Support
     Systems, vol. 50, no. 3, pp. 602–613, 2011.
 [3] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and
     G. Bontempi, “Learned lessons in credit card fraud detection from
     a practitioner perspective,” Expert systems with applications, vol. 41,
     no. 10, pp. 4915–4928, 2014.
 [4] C. Phua, D. Alahakoon, and V. Lee, “Minority report in fraud detection:
     classification of skewed data,” ACM SIGKDD explorations newsletter,
     vol. 6, no. 1, pp. 50–59, 2004.
 [5] Z.-H. Zhou and X.-Y. Liu, “Training cost-sensitive neural networks with
     methods addressing the class imbalance problem,” IEEE Transactions on
     Knowledge and Data Engineering, vol. 18, no. 1, pp. 63–77, 2006.
 [6] S. Ertekin, J. Huang, and C. L. Giles, “Active learning for class
     imbalance problem,” in Proceedings of the 30th annual international
     ACM SIGIR conference on Research and Development in Information
     Retrieval. ACM, 2007, pp. 823–824.
 [7] M. Wasikowski and X.-w. Chen, “Combating the small sample class
     imbalance problem using feature selection,” IEEE Transactions on
     knowledge and data engineering, vol. 22, no. 10, pp. 1388–1400, 2010.
 [8] B. Krawczyk, “Learning from imbalanced data: open challenges and
     future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp.
     221–232, 2016.
 [9] K. K. Tripathi and M. A. Pavaskar, “Survey on credit card fraud
     detection methods,” International Journal of Emerging Technology and
     Advanced Engineering, vol. 2, no. 11, pp. 721–726, 2012.
[10] S. Sorournejad, Z. Zojaji, R. E. Atani, and A. H. Monadjemi, “A survey
     of credit card fraud detection techniques: Data and technique oriented
     perspective,” CoRR, 2016.
[11] V. R. Ganji and S. N. P. Mannem, “Credit card fraud detection using
     anti-k nearest neighbor algorithm,” International Journal on Computer
     Science and Engineering (IJCSE), vol. 4, no. 6, pp. 1035–1039, 2012.
[12] C. Whitrow, D. Hand, J. Juszczak, D. Weston, and N. M. Adams,
     “Transaction aggregation as a strategy for credit card fraud detection,”
     Data Mining and Knowledge Discovery, pp. 30–55, 2009.
[13] Y. Sahin, S. Bulkan, and E. Duman, “ A cost-sensitive decision tree
     approach for fraud detection,” Expert Systems with Applications, vol. 40,
     pp. 5916–5918, 2013.
[14] A. C. Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost
     sensitive credit card fraud detection using bayes minimum risk,” in
     Proceedings of the 2013 12th International Conference on Machine
     Learning and Applications, vol. 1, 2013, pp. 333–338.
[15] S. Kamaruddin and V. Ravi, “Credit Card Fraud Detection using Big
     Data Analytics : Use of PSOAANN based One-Class Classification,”
     Proceedings of the International Conference on Informatics and Ana-
     lytics, pp. 33:1–33:8, 2016.
[16] Q. Lu and C. Ju, “Research on credit card fraud detection model based
     on class weighted support vector machine,” Journal of Convergence
     Information Technology, vol. 6, pp. 62–68, 2011.
[17] Z. Qin, A. T. Wang, C. Zhang, and S. Zhang, “Cost-sensitive classifica-
     tion with k-nearest neighbors,” in Knowledge Science, Engineering and
     Management. Springer Berlin Heidelberg, 2013, pp. 112–131.
[18] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques.
     Morgan Kaufman Publishers, 2003.
[19] M. Kuhn and K. Johnson, Applied Predictive Modeling. Springer, 2013.
[20] A. Nicholas and R. Daniel, “One-class support vector machines: Meth-
     ods and applications,” 2008.
[21] B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C.
     Williamson, “Estimating the support of a high-dimensional distribution,”
     Neural Computation, vol. 13, no. 7, pp. 1443–1471, Jul. 2001.


                                                                                   47