Unraveling Anomalies: Explaining Outliers with
                                DTOR⋆
                                Riccardo Crupi1,* , Daniele Regoli1 , Alessandro Damiano Sabatino1 ,
                                Immacolata Marano2 , Massimiliano Brinis2 , Luca Albertazzi2 , Andrea Cirillo2 and
                                Andrea Claudio Cosentini1
                                1
                                    Data Science & Artificial Intelligence, Intesa Sanpaolo, Italy
                                2
                                    Audit Data & Advanced Analytics, Intesa Sanpaolo, Italy


                                               Abstract
                                               Explaining outliers’ occurrence and mechanisms is crucial across various domains, as malfunctions,
                                               frauds, and threats require valid explanations for effective countermeasures. With the increasing use of
                                               sophisticated Machine Learning techniques to identify anomalies, explaining their presence becomes
                                               more challenging. Our proposed Decision Tree Outlier Regressor (DTOR) addresses this challenge by
                                               providing rule-based explanations for individual data points using anomaly scores from a detection model.
                                               By leveraging a Decision Tree Regressor to compute estimation scores and extracting relative paths,
                                               DTOR illustrates its effectiveness across different anomaly detectors and diverse datasets, including
                                               those with numerous features.

                                               Keywords
                                               Outlier detection, Explainability, Decision Tree


                                1. Introduction
                                Internal audit in the banking sector is crucial for evaluating operational integrity and efficiency,
                                assessing internal controls, risk management processes, and regulatory compliance. Anomaly
                                detection techniques play a vital role in identifying atypical patterns and outliers within data
                                populations analyzed for audit purposes, assisting in risk mitigation and fraud detection.
                                However, ensuring the effective utilization of these techniques requires the ability to explain
                                why certain records are considered anomalies, particularly for internal auditors with limited
                                data analytics expertise [1, 2].

                                  Among various anomaly detection techniques, Isolation Forest [3], One-Class SVM [4], and
                                Gaussian Mixture Models [5] are prominent anomaly detection techniques widely employed in
                                practical applications [6, 7]. These methods leverage diverse mathematical principles to detect
                                anomalies efficiently. However, their interpretability may be limited, necessitating explainable

                                Late-breaking work, Demos and Doctoral Consortium, colocated with The 2nd World Conference on eXplainable Artificial
                                Intelligence: July 17–19, 2024, Valletta, Malta
                                ⋆
                                  The views and opinions expressed are those of the authors and do not necessarily reflect the views of Intesa
                                  Sanpaolo, its affiliates or its employees.
                                *
                                  Corresponding author.
                                $ riccardo.crupi@intesasanpaolo.com (R. Crupi)
                                 0009-0005-6714-5161 (R. Crupi); 0000-0003-2711-8343 (D. Regoli); 0000-0002-1336-2057 (A. D. Sabatino)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
artificial intelligence (XAI) techniques to elucidate model decisions, ensure transparency, and
enhance trust in AI-driven decisions [8, 9, 10].

  To meet this requirement, we introduce a novel model-agnostic XAI framework specifically
designed for anomaly detection in the banking sector. Unlike conventional XAI methods that
primarily focus on feature importance (e.g., SHAP and DIFFI [8, 9]), our framework generates
easily understandable rules to elucidate model predictions, thereby enhancing transparency
and fostering trust in AI-driven decisions. Notable techniques such as LORE, RuleXAI, and
Anchors [11, 12, 13] exemplify this approach.

   Our approach aims to bridge the divide between interpretability and effectiveness in anomaly
detection by offering human-understandable rules that clarify the rationales behind anomalous
predictions. Relevant works in this domain include [14] and [15], focusing on online anomaly
explanation and providing a survey of explainable anomaly detection methods, respectively. By
harnessing rule-based explanations, our XAI framework ensures transparency and accessibility
in the decision-making process of anomaly detection models for data scientists, domain experts,
and colleagues in the banking industry.


2. Method
Our novel XAI method, inspired by the principles of the Isolation Forest algorithm, takes
advantage of the concept of isolating anomalies with minimal cuts in the feature space. To
provide clear explanations for anomaly detection decisions, we use decision tree regressors.
In our approach, a decision tree regressor is trained to learn the anomaly scores assigned to
each data point generated by the Anomaly Detector. Notably, during training, we introduce
a weighted loss function that gives a significantly higher weight to the data point under
consideration. This weighting scheme ensures that the decision tree regressor prioritizes
accurate estimation of the anomaly score for the target data point, thereby improving the
interpretability and reliability of the local explanation. After training the decision tree, extracting
the path of the datapoint can provide an interpretable rule for the anomaly score (algorithm 1).
The implementation of DTOR at the following link can be accessed online 1


3. Experiments
This section delineates the configurations of three Anomaly Detector models trained on two
public datasets and one private dataset from Intesa Sanpaolo (see Table 1), offering explanations
using both Anchors and DTOR. The DTOR method and the experiments conducted on public
datasets are available in the GitHub repository accessible via the following link: https://github.
com/rcrupiISP/DTOR.
    Algorithm 1: The DTOR approach generates explanations for a given instance.
     def explain_instance:
        input : (𝑥e , 𝑦ˆe ): the sample to be explained along its corresponding score from the
                 AD;
                 (𝑋t , 𝑦ˆt ): a train set and its corresponding scores from the AD;
                 𝛽: training weight associated to 𝑥e ;
                 h: list of parameters of the decision tree;
        output : a list of rules explaining the instance (𝑥e , 𝑦ˆe )
        𝑁 ← len(𝑋t );
        model ← DecisionTreeRegressor(h);
        /* append the sample 𝑒 in the train set                                             */
        𝑋 ← concat 𝑋t with 𝑥e ;
         ˆ
        𝑦ˆ ← concat 𝑦ˆt with 𝑦ˆe ;
         /* build the array of weights that gives more importance in the
            loss function to the sample 𝑥e                            */
          𝜔 ← concat 1𝑁 with 𝛽;
         /* train the DT to the weighted configuration                                     */
          model.fit((𝑋
                     ˆ , 𝑦ˆ), sample_wights=𝜔);
         /* retrieve the path taken by 𝑥e in the decision tree                             */
          rule ← 𝑒𝑥𝑡𝑟𝑎𝑐𝑡_𝑝𝑎𝑡ℎ(model, 𝑥e );
          return rule


3.1. Datasets and AD models
Utilizing the novel XAI technique across various datasets aims to assess its effectiveness in
explaining different types of anomalies learned by unsupervised Machine Learning models. The
chosen Anomaly Detector models include IF, One-class SVM, and GMM [17]. Default parameters
were opted for, as the primary objective of this study is to comprehend the explanation rather
than optimize a performance metric specific to the dataset problem. Therefore, three distinct
models were chosen to reason in different ways. The dataset was partitioned into training and
testing sets. Specifically, the test set comprises 50 samples from each dataset, containing both
anomalies and normal data points. The anomalies for GMM are defined to represent 5% of the
training set, as well as for the isolation forest using the contamination hyperparameter set to
0.05. Default hyperparameters were retained for the SVM (kernel: radial basis function, 𝜈 =
0.5, representing the upper bound on the fraction of training errors), resulting in anomalies
representing about 50% of the training set.

3.2. Rule-based XAI
We explore various explainability techniques, focusing on rule-based explanations due to
challenges in interpreting feature importance methods like SHAP and DIFFI, especially with
high-dimensional datasets. Initially, Anchors were used to explain the banking dataset, but we
1
    https://github.com/rcrupiISP/DTOR.
Table 1
Summary of Dataset Characteristics: Each item comprises information about a dataset, including
its identifier (Dataset), dataset size (# instances), variables count (# columns), and a brief description
(Description). The datasets were collected from the UCI Machine Learning Repository [16].

       Dataset                     # instances     # columns      Description

       Banking (B)                    100,000           26        Dataset obtained from Intesa
                                                                  Sanpaolo Bank was used for
                                                                  anomaly identification and im-
                                                                  proved client analysis to dis-
                                                                  cover probable instances of
                                                                  fraud or criminal conduct.
       Glass Identification (GI)        214             9         This information comes from
                                                                  the USA Forensic Science Ser-
                                                                  vice and includes six different
                                                                  glass kinds, each distinguished
                                                                  by its oxide composition.
       Lymphography (L)                 148             19        The lymphography dataset was
                                                                  obtained from the University
                                                                  Medical Center, Institute of On-
                                                                  cology, in Ljubljana, Yugoslavia.


found limitations, such as the inability to reason on regression tasks and constraints in model
implementation, leading to the development of DTOR. In addition to Anchors and DTOR, we
considered LORE and RuleXAI. However, LORE requires extensive hyperparameter tuning,
increasing implementation complexity. Additionally, RuleXAI is not actively maintained, with
outdated Python library requirements. For future work, we plan to compare DTOR with other
explainability techniques.

   We adopt a perspective of providing rule-based explanations to Data Scientists, summarizing
examples in Table 2 with four key metrics: execution time, coverage, and rule length. For
DTOR, we set specific hyperparameters tailored to the banking dataset, ensuring both quantity
and quality of explanations. However, a dataset-specific approach is crucial to identifying the
optimal anomaly detector and evaluating explanation quality effectively. The hyperparameters
for DTOR are carefully chosen, with the max depth set to 8, the min impurity decrease
to 10−5 , and the weight 𝛽 for learning the rule to 0.1 * 𝑁 , suitable for unbalanced datasets
with anomalies. DTOR estimates the anomaly score rather than a binary output, and the same
threshold used in anomaly detection models is applied to determine anomalies. While not
detailed here, each rule output by DTOR provides both precision and average anomaly score,
enhancing informativeness.
Table 2
Examples of anomaly detection exlpaination on different datasets. The table includes information such
as dataset name, example ID, anomaly detection (AD) model used, AD score, whether the instance
is predicted by the AD model as an anomaly, coverage percentage, length of the detection rule, the
detection rule itself, and the execution time in seconds.
  Dataset Example AD    AD            Anomaly Coverage Rule   Rule                       Execution
          ID      model score                 (%)      length                            time (s)
  GI      1         SVM     0.32      True     19        2       Mg ≤ 3 AND K > 0        2.1
  GI      1         IF      -0.69     True     1.8       3       Si ≤ 71.3 AND Na ≤      3.9
                                                                 13.4 AND Na > 12
  GI      1         GMM -650          True     0.6       2       K > 7 AND Al > 3.37     2.4
  B       2         IF  -0.53         True     0.4       2       ‘Appraisal time’ > 47   16
                                                                 AND ‘Flag proposal’
                                                                 = True
  L       3         IF      -0.49     False    8.1       7       ‘bl. of lymph. s’ ≤     3.5
                                                                 1.5 AND ‘lym.nodes
                                                                 enlar’ > 2 AND ‘re-
                                                                 generation of’ ≤ 1.5
                                                                 AND ‘dislocation of’
                                                                 > 1.5 AND ‘changes
                                                                 in stru’ > 1.5 AND
                                                                 ‘by pass’ > 1.5 AND
                                                                 ‘special forms’ > 1.5


4. Discussion and conclusion
The findings derived from the DTOR algorithm provide significant insights into both anomaly
detection and explainability methodologies. Notably, we observed a consistent trend towards
shorter explanations for anomalies across various anomaly detection (AD) models and datasets,
as evidenced by examples in Table 2, particularly instances with IDs 1 and 2. Conversely,
instance ID 3 presents a lengthier explanation. This observation may align with the strategy
employed by the Isolation Forest, which aims to isolate anomalies through a minimal number
of steps. DTOR, by design, follows a similar path, leveraging the locally trained decision
tree to isolate the sample. If the sample is an outlier, it can be easily separated with fewer
steps, whereas non-outliers may require more complex separation. It’s worth noting that
our comparison was conducted against a surrogate classifier model, while our contribution
introduces a surrogate regressor model. This distinction allows us not only to provide the
rule but also to estimate the anomaly detection (AD) score, offering nuanced insights beyond
binary classification tasks. Instance ID 1 showcases three distinct explanations, underscoring
the variability introduced by different AD models and potential feature correlations. This
phenomenon illustrates the Rashomon effect in explainability [18], where multiple plausible
explanations coexist.
   Although the execution time for generating explanations typically falls within seconds, it
slightly increases for the banking dataset due to its larger sample size, necessitating additional
computational resources. Looking ahead, further analysis is warranted to delve into these
explanations in depth and compare them with state-of-the-art rule-based explainability tech-
niques. Key metrics such as precision, coverage, and stability will be evaluated to assess the
effectiveness of DTOR and its potential advantages over existing methods. For a more detailed
analysis on the state of the art, performance and comparison experiments with Anchors, please
refer to [19].
References
 [1] J. Nonnenmacher, J. M. Gómez, Unsupervised anomaly detection for internal auditing:
     Literature review and research agenda., International Journal of Digital Accounting
     Research 21 (2021).
 [2] A. Basile, R. Crupi, M. Grasso, A. Mercanti, D. Regoli, S. Scarsi, S. Yang, A. C. Cosentini,
     Disambiguation of company names via deep recurrent networks, Expert Systems with
     Applications 238 (2024) 122035. doi:10.1016/j.eswa.2023.122035.
 [3] F. T. Liu, K. M. Ting, Z.-H. Zhou, Isolation forest, in: 2008 eighth ieee international
     conference on data mining, IEEE, 2008, pp. 413–422.
 [4] B. Schölkopf, R. C. Williamson, A. Smola, J. Shawe-Taylor, J. Platt, Support vector method
     for novelty detection, Advances in neural information processing systems 12 (1999).
 [5] D. A. Reynolds, et al., Gaussian mixture models., Encyclopedia of biometrics 741 (2009).
 [6] Y. Zhao, Z. Nasrullah, Z. Li, PyOD: A python toolbox for scalable outlier detection, Journal
     of Machine Learning Research 20 (2019) 1–7. URL: http://jmlr.org/papers/v20/19-011.html.
 [7] N. Kumar, D. Venugopal, L. Qiu, S. Kumar, Detecting anomalous online reviewers: An un-
     supervised approach using mixture models, Journal of Management Information Systems
     36 (2019) 1313–1346.
 [8] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb,
     N. Bansal, S.-I. Lee, From local explanations to global understanding with explainable ai for
     trees, Nature machine intelligence 2 (2020) 56–67. doi:10.1038/s42256-019-0138-9.
 [9] M. Carletti, M. Terzi, G. A. Susto, Interpretable anomaly detection with diffi: Depth-based
     feature importance of isolation forest, Engineering Applications of Artificial Intelligence
     119 (2023) 105730. doi:10.1016/j.engappai.2022.105730.
[10] R. Crupi, A. Castelnovo, D. Regoli, B. San Miguel Gonzalez, Counterfactual explanations
     as interventions in latent space, Data Mining and Knowledge Discovery (2022) 1–37.
     doi:10.1007/s10618-022-00889-2.
[11] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, F. Giannotti, Local rule-based
     explanations of black box decision systems, arXiv preprint arXiv:1805.10820 (2018). URL:
     https://arxiv.org/abs/1805.10820.
[12] D. Macha, M. Kozielski, Ł. Wróbel, M. Sikora, Rulexai—a package for rule-based explana-
     tions of machine learning model, SoftwareX 20 (2022) 101209.
[13] M. T. Ribeiro, S. Singh, C. Guestrin, Anchors: High-precision model-agnostic explanations,
     Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018). doi:10.1609/
     aaai.v32i1.11491.
[14] R. P. Ribeiro, S. M. Mastelini, N. Davari, E. Aminian, B. Veloso, J. Gama, Online anomaly
     explanation: a case study on predictive maintenance, in: Joint European Conference on
     Machine Learning and Knowledge Discovery in Databases, Springer, 2022, pp. 383–399.
[15] Z. Li, Y. Zhu, M. Van Leeuwen, A survey on explainable anomaly detection, ACM
     Transactions on Knowledge Discovery from Data 18 (2023) 1–54.
[16] A. Frank, Uci machine learning repository, http://archive. ics. uci. edu/ml (2010).
[17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
     P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the
     Journal of machine Learning research 12 (2011) 2825–2830.
[18] M. G. M. M. Hasan, D. Talbert, Mitigating the rashomon effect in counterfactual explana-
     tion: A game-theoretic approach, in: The International FLAIRS Conference Proceedings,
     volume 35, 2022.
[19] R. Crupi, A. D. Sabatino, I. Marano, M. Brinis, L. Albertazzi, A. Cirillo, A. C. Cosentini,
     Dtor: Decision tree outlier regressor to explain anomalies, arXiv preprint arXiv:2403.10903
     (2024).