Rule Learning from Time-Dependent Data
             Applied to Fraud Detection

                                   Marine Collery1,2
                                   1
                                      IBM France Lab
                             2
                                 Inria Saclay Ile-de-France


        Abstract. In financial environment, fraud detection is a challenging
        problem with tremendous financial impacts where data is highly unbal-
        anced, sequential and timestamped. An additional constraint comes from
        the fact that common machine learning methods cannot be used alone
        for fraud detection, as every decision made in order to label a trans-
        action as fraudulent needs to be explainable and the complete model
        understandable.The use of a symbolic language, such as understandable
        classification rules, is therefore preferred or even required.

        Keywords: Rule Learning · Fraud Detection · Time-Dependant Data ·
        Business Rules and Interpretability.


1     Introduction

For few decades now, rule systems have been widely adopted in different in-
dustrial fields. Business Rule Management Systems (BRMS) offer an intuitive,
human readable and comprehensible way to define business rules and hides the
computational aspect for the business user.
    With the growth of machine learning in the past years due to the newly
available computational power combined with a growing number of accessible
datasets, improving quality of a learned predictive model was an important re-
search interest. Today, impressive models are learned but can lack transparency,
interpretability and understandability characteristics that are required and es-
sential for numerous application fields. Those models, and especially the ones
based on neural networks, are commonly referred to as “black boxes”. Focus is
progressively shifting towards providing an explanation for decisions a learned
model took as well as building interpretable, understandable and transparent
models from scratch.
    Combining comprehensibility of business rules and machine learning power to
tackle the problem, is the approach we are focusing on for this research project.
    This strategy is considered in the context of fraud detection that comes with
a complex learning problem as well as a full transparency requirement.
    Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2   Related work
Interpretability, interpretation, and explainability With the growth of
high performance non interpretable black-box models, an important question is
raised: to what extent a model can be considered trustworthy, especially for high-
stakes decision making ? Different terms are commonly used when referring to
this problem, we clarify their meaning here for further use. Model interpretability
is the ability (of the model) to explain or to present in understandable terms to
a human [15, 8]. Model rationale is how the model takes decisions. Interpretation
and explanation (methods) will be considered equivalent in this paper (subtle
differences are not considered). They both refer to methods that explain or
translate the model rationale.

The context of fraud detection There are multiple types of financial frauds
from credit card fraud to insurance fraud which come with different detection
solutions as described by J. West et al. [23]. Credit card fraud detections were
for example studied with sequential and non-sequential learning methods by J.
Jurgovsky et al. [13] and lead to different types of frauds detected with both
approaches. A spatio-temporal attention-based neural network for fraud detec-
tion on credit card was recently introduced by D. Cheng et al. [5] and brought
promising results for detecting ‘suspicious transactions and mining fraud pat-
terns’. However as pointed by J. Guo et al. [11], allowing for more long-range
dependencies than common machine learning models can help identify repeated
or cyclical appearances of fraudulent events which seems to be the harder to
catch. Very recently, tensor networks were used for anomaly detection [22] where
the model outperformed deep and classical algorithms on tabular datasets and
achieved competitive results on image datasets.

Rule learning Another approach to detect anomalies in runtime process logs
took by K. Böhmer et al., is rule mining [3]. It comes with some specific benefits,
especially explainability. In opposition to machine learning models, rules are
symbolic and key to bring understandable artificial intelligence.
    Interpretable models should even be preferred to explaining uninterpretable
model a posteriori for any high stakes decisions according to C. Rudin [20]. How-
ever, in some context rule based models are not considered as fully-interpretable.
Indeed, as presented by Z. C. Lipton in [16], given the limited capacity of hu-
man cognition, when we reach a sufficient high dimension, we could consider the
model to be less interpretable than a simple compact neural network.
    Combining logic rules and deep neural networks is proposed by Z . Hu et al.
[12] to enhance the neural network capabilities. This approach could actually
also be used for rule learning. We can also mention recent work from I. Kraiem
who applied rule learning for multiple anomaly detection [14] and G. Bert who
presented an association rule learning approach for temporal noisy data [10].
    More global approaches are proposed in [21] to induce if-then-else rules to
explain predictions of supervised learning models, or in [18] to learn composi-
tional rules with very little data. As explained in [9], there are two main base
families of methods to induce ruleset from training data: extracting rules from
a decision tree (examples: CART [4] and C4.5 [19]) or sequential covering that
is learning rules directly from data (examples: CN2 [6] and RIPPER [7]).
    We can also refer to Inductive Logic Programming (ILP) introduced by S.
Muggleton in 1991 [17] where an ILP system is a program that combines positive
and negative examples with background knowledge and outputs a correct logical
hypothesis. ILP systems result of two main steps: searching for hypothesis and
then selecting the best one.


3   Problem, goals and method
Problem statement Modeling data in an interpretable and understandable
way is very challenging when working with large-scale and real-world datasets.
Interpretable models are commonly simple and have difficulties learning com-
plex patterns. Rule-based approaches typically tend to overfit complex patterns
because of the inappropriate simplicity of the rule language available (operators,
aggregates...). Dimensionality of overfitted models make human understanding
of the model much harder. In the context of fraud detection, with imbalanced
datasets, evolving patterns and time dependency, those limitations are high-
lighted.

Problem How can we learn accurate, understandable and time-dependent rules
for decision making and in particular for fraud detection problems?

Hypothesis The hypothesis on which the project helds are:
 – rule based-models are fully-interpretable or at least more interpretable than
   other models;
 – machine learning models bring relevant statistical information to learn rules
   from;
 – sequential models (Hidden Markov Models, Matrix Product State based
   model, ...) can bring interesting statistical information to learn rules from;
 – fraud detection is a relevant application domain to illustrate the problem.
 – an ideal trade off between bias and variance can be found to generate rules
   out of different fraud patterns (the more complex patterns are, the harder it
   is to learn rules and generalize).

Purpose The purpose of this project is to induce sets of accurate and under-
standable rules with or from machine learning models in time dependent data.
It will help achieving fraud detection and prediction in the challenging context
of finance and banking environments where full interpretability is required. A
longer term objective is to be able to integrate the induction solutions found in
IBM products (Operational Decision Manager (ODM) and Automation Decision
Service (ADS)).
Goals The goal of the project is to build, tune, test and validate one or multiple
solid models and rule learning solutions to detect fraudulent patterns and events
resulting in a fraudulent event. This main project goal can be divided in multiple
goals:
  – Acquiring expertise in fraud detection, rule induction and machine learning
    models.
  – Building one or more models and rule learning solutions as well as an eval-
    uation process to answer the stated problem.
  – Experimenting and validating proposed solutions with synthetic and real
    data.
  – Sharing results.

Tasks The following tasks will take part of this project:
 – Write a state of the art analysis of the fraud detection models and solutions
    as well as an inventory of known fraud patterns.
 – Write a state of the art analysis of rule learning algorithms as well as existing
    solutions to optimize parameters values.
 – Propose a mathematical model of the problem by specifying inputs and
    outputs.
 – Analyze available open source datasets applicable to the stated problem.
 – Experiment with different supervised and unsupervised models found in
    state-of-the-art papers (reproduce when possible).
 – Define an evaluation and test protocol.
 – Work deeply on different approaches of the problem to improve results.
 – Experiment on external synthetic data before experimenting in vivo on real
    data.
 – Present and make available proof-of-concepts.
 – Write papers for conferences, workshops, journals (attend when possible).
 – Write final thesis.
   The project will use empirical methods [2]. The work will be based on exper-
imenting with specific datasets, performance metrics will be defined in order to
evaluate and draw conclusions.

4     Preliminary findings
4.1   Fraud detection data
This research project benefits from the fact that an IBM partner in the financial
area comes with a perfect use case for the project: detection of fraudulent events
in bank transfers and credit card transactions. Experiments with real data will be
feasible but with no access to the dataset, only resulting metrics will be shared.
It provides a good final testing experimental environment but is not satisfactory
at the research level.
    Due to difficulties to generate or collect data for fraud detection for obvious
confidentiality reasons, we have not found an existing reference dataset that
combines all the following conditions:
  – Data should be composed of events which are financial transactions (ideally
    not just credit card payment transactions).
  – Profile of users should be extractable : we need to have the historical data
    of a client in order to predict fraudulent behavior.
  – As a consequence, data should include a notion of time.
    However, we could still use existing datasets that are not verifying the fol-
lowing conditions. For example, we can mention Kaggle Dataset : Synthetic Fi-
nancial Datasets For Fraud Detection [1]. We learned the importance of features
preprocessing with the use of this dataset as shown later in section 4.3.
    We are currently searching for appropriate datasets to work on. An alterna-
tive we selected if we are not able to found fraud detection viable data, is to
start with anomaly detection data which comes with comparable characteristics:
temporal, unbalanced and evolving patterns (not known when appearing).

4.2    Rule language
What rule learning state-of-the-art analysis highlighted, is that there is an im-
portant rule conditions limitation when it comes to existing learning algorithms.
Algorithms like RIPPER [7] or CN2 [6] for example, are not scalable for others
than basic conditions operators. This comes from the fact that they have a weak
internal data representation that is only based on original attributes. With this
conclusions in mind, we list below increasingly complex rule structures. They
reflect rules we want to be able to learn, in order to describe complex model like
fraud detection. In the following rules, x are data attributes, {a, b, c, d, e, f } are
fixed values (numerical or categorical valid according to the operator in use in
condition) and ypred is the target class.
 1. Base rule structure. CN2 and RIPPER -like rules.
      if x1 < a and x2 > b and x3 = c
      then ypred = d

 2. Simple features comparisons.
      if x1 < a and x2 > x1 and x3 = c
      then ypred = d

 3. Linear combinations.
      if x1 < a and b1 ∗ x2 > b and x3 /c1 = c2
      then ypred = d

 4. Adding aggregates. For example sum, count, min, max, average ... that are
    applied to a set of data. This is particularly useful when working with time
    dependent data. We define α, a set of aggregation functions that can have
    parameters.
      if α1 < a and b1 ∗ x2 > x1 + b2 and α2 (c1 ) = c2
      then ypred = d
5. Complex structures with aggregates.
      if sum(a ∗ e.x2 − e.x1 ) > d
      for e ∈ events where e.x1 > b over timewindow(c)
      then ypred = d
6. Complex temporal expression between events e1 and e2 .
      if ∃e1 : e1 .x1 > 10 and ∃e2 : e2 .x1 = e1 .x2
      where e1 .time ∈ [e2 .time, now]
      then ypred = d
7. Program induction extension. That is increasing complexity of the right part
   of the rule, by adding chaining or symbolic regression for example. A new
   variable var is defined.
    – Chaining
          if x1 < a and x2 > b and x3 = c
          then var = x2 + d
          if var = e
          then ypred = f
       – Symbolic regression
          if x1 < a and x2 > b and x3 = c
          then ypred += x2 + d


4.3     First approach
The first approach took to learn rules with linear combinations (step 3), is to
use a data-driven preprocessing approach. As pointed out by Li et al. [15], data
preprocessing such as augmentation or regularization can impact interpretabil-
ity considerably. Very few preprocessing techniques can be used without loss
of interpretability, therefore a simple linear approach is chosen. It consists in
adding new features to the data provided for the learning step. Those new fea-
tures are actually linear combination of original features. This approach was
chosen following first experiments done with Synthetic Financial Datasets For
Fraud Detection dataset [1], that showed the difficulty of RIPPER and CN2
algorithms to model data that are not ruled by original features individually.
With the manual introduction of a new feature, results improved considerably
as shown in Table 1. An automated feature generation process is created with
sum and difference operations. Interpretability is maintained thanks to a dimen-
sional consistency filter. However this approach is not scalable for more complex
operations and can have impacts on some learning algorithms (for example on
RIPPER stopping criteria that depends on data dimensions).

4.4     Future work and ideas
An approach that we would like to develop is the use of intermediary models.
Rather than working on the dataset directly, we want to try modeling the data
Table 1. Experiments with Synthetic Financial Datasets For Fraud Detection [1]
dataset with and without manual preprocessing with CN2 [6] and RIPPER [7] al-
gorithms

              Dataset Model                        Metrics

                                 acc     bal acc    f1     precision recall

              raw       cn2      0.999   0.798     0.691     0.822   0.596
              processed cn2        1     0.994     0.993     0.997   0.988

              raw       ripper    1      0.873     0.839     0.956   0.748
              processed ripper    1      0.998     0.996     0.997   0.996


first into an intermediary model (tensor networks, bayesian model etc.), before
learning rules for that new representation of the data. Additionally further work
on how to approach the temporal aspect of the data needs to be completed. With
a fraud detection dataset, it would be interesting to apply anomaly detection
strategy (supervised and unsupervised) as both domains share data characteris-
tics (unbalanced, temporal).


5   Conclusion
In this paper, we presented the doctoral research project. There is a growing need
for understandable AI models. A rule based approach is one potential solution,
but they no longer have the same research interest as black boxes models do. We
believe that this approach is a solution for many different kind of applications
especially financial applications. Modeling with rules, a time-dependent dataset
requires a rule language complexity that is not currently possible to learn with
available methods. This research project aims at going in that direction.

Acknowledgements This thesis project is supported by PSPC AIDA 2019-
PSPC-09. It is supervised by Philippe Bonnard at IBM France Lab and François
Fages at Inria Saclay.


References
 1. Synthetic Financial Datasets For Fraud Detection. https://kaggle.com/ntnu-
    testimon/paysim1
 2. Bock, P.: Getting It Right: R&D Methods for Science and Engineering. Elsevier
    Science (Apr 2020)
 3. Böhmer, K., Rinderle-Ma, S.: Mining association rules for anomaly detection in
    dynamic process runtime behavior and explaining the root cause to users. Infor-
    mation Systems 90, 101438 (May 2020). https://doi.org/10.1016/j.is.2019.101438
 4. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression
    Trees. Taylor & Francis (Jan 1984)
 5. Cheng, D., Xiang, S., Shang, C., Zhang, Y., Yang, F., Zhang, L.: Spatio-Temporal
    Attention-Based Neural Network for Credit Card Fraud Detection. Proceedings
    of the AAAI Conference on Artificial Intelligence 34(01), 362–369 (Apr 2020).
    https://doi.org/10.1609/aaai.v34i01.5371
 6. Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning 3(4), 261–
    283 (Mar 1989). https://doi.org/10.1023/A:1022641700528
 7. Cohen, W.W.: Fast Effective Rule Induction. In: In Proceedings of the Twelfth
    International Conference on Machine Learning. pp. 115–123. Morgan Kaufmann
    (1995)
 8. Doshi-Velez, F., Kim, B.: Towards A Rigorous Science of Interpretable Machine
    Learning. arXiv:1702.08608 [cs, stat] (Mar 2017)
 9. Fürnkranz, J., Gamberger, D., Lavrač, N.: Foundations of Rule Learning. Springer
    Science & Business Media (Nov 2012)
10. Guillame-Bert, M.: Apprentissage de règles associatives temporelles pour les
    séquences temporelles de symboles p. 158
11. Guo, J., Liu, G., Zuo, Y., Wu, J.: Learning Sequential Behavior Representations for
    Fraud Detection. In: 2018 IEEE International Conference on Data Mining (ICDM).
    pp. 127–136 (Nov 2018). https://doi.org/10.1109/ICDM.2018.00028
12. Hu, Z., Ma, X., Liu, Z., Hovy, E., Xing, E.: Harnessing Deep Neural Net-
    works with Logic Rules. In: Proceedings of the 54th Annual Meeting of the
    Association for Computational Linguistics (Volume 1: Long Papers). pp. 2410–
    2420. Association for Computational Linguistics, Berlin, Germany (Aug 2016).
    https://doi.org/10.18653/v1/P16-1228
13. Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.E.,
    He-Guelton, L., Caelen, O.: Sequence classification for credit-card fraud
    detection. Expert Systems with Applications 100, 234–245 (Jun 2018).
    https://doi.org/10.1016/j.eswa.2018.01.037
14. Kraiem, I.B.: Détection d’Anomalies Multiples par Apprentissage Automatique
    de Règles dans les Séries Temporelles. Ph.D. thesis, Université de Toulouse-Jean
    Jaurès (Jan 2021)
15. Li, X., Xiong, H., Li, X., Wu, X., Zhang, X., Liu, J., Bian, J., Dou, D.: Inter-
    pretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Be-
    yond. arXiv:2103.10689 [cs] (May 2021)
16. Lipton, Z.C.: The mythos of model interpretability: In machine learning, the con-
    cept of interpretability is both important and slippery. Queue 16(3), 31–57 (Jun
    2018). https://doi.org/10.1145/3236386.3241340
17. Muggleton, S.: Inductive logic programming. New Generation Computing 8(4),
    295–318 (Feb 1991). https://doi.org/10.1007/BF03037089
18. Nye, M.I., Solar-Lezama, A., Tenenbaum, J.B., Lake, B.M.: Learning Composi-
    tional Rules via Neural Program Synthesis. arXiv:2003.05562 [cs] (Mar 2020)
19. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publish-
    ers Inc., San Francisco, CA, USA (1993)
20. Rudin, C.: Stop Explaining Black Box Machine Learning Models for High Stakes
    Decisions and Use Interpretable Models Instead. arXiv:1811.10154 [cs, stat] (Sep
    2019)
21. Sushil, M., Šuster, S., Daelemans, W.: Rule induction for global explanation of
    trained models. arXiv:1808.09744 [cs, stat] (Aug 2018)
22. Wang, J., Roberts, C., Vidal, G., Leichenauer, S.: Anomaly Detection with Tensor
    Networks. arXiv:2006.02516 [quant-ph, stat] (Jun 2020)
23. West, J., Bhattacharya, M., Islam, R.: Intelligent Financial Fraud Detection Prac-
    tices: An Investigation. arXiv:1510.07165 [cs] (Oct 2015)