Unveiling Opaque Predictors via Explainable
                                Clustering: The CReEPy Algorithm⋆
                                Federico Sabbatini1,* , Roberta Calegari2
                                1
                                    Department of Pure and Applied Sciences, University of Urbino Carlo Bo
                                2
                                    Department of Computer Science and Engineering (DISI), Alma Mater Studiorum–University of Bologna


                                                                         Abstract
                                                                         Machine learning black boxes, as deep neural networks, are often hard to explain because their predictions
                                                                         depend on complicated relationships involving a huge amount of internal parameters and input features.
                                                                         This opaqueness from the human perspective makes their predictions not trustable, especially in critical
                                                                         applications. In this paper we tackle this issue by introducing the design and implementation of CReEPy,
                                                                         an algorithm performing symbolic knowledge extraction based on explainable clustering. In particular,
                                                                         CReEPy relies on the underlying clustering performed by the ExACT or CREAM procedures to provide
                                                                         human-interpretable Prolog rules mimicking the behaviour of the opaque model. Experiments to assess
                                                                         both the human readability and the predictive performance of the proposed algorithm are discussed here,
                                                                         using existing state-of-the-art techniques as benchmarks for the comparison.

                                                                         Keywords
                                                                         Explainable clustering, Explainable artificial intelligence, Symbolic knowledge extraction, PSyKE


                                1. Introduction
                                In recent years, there has been a growing demand for transparency, particularly in critical
                                domains [1, 2]. This demand has led to a lack of trust among humans in predictions obtained
                                from machine learning (ML) models that lack interpretability. Such models are often referred to
                                as opaque or black boxes (BBs) due to their inscrutability. While complex ML models tend to
                                offer superior predictive performance, they pose challenges when it comes to human inspection.
                                Consequently, the use of opaque models for high-stakes decisions necessitates the derivation of
                                human-intelligible knowledge to ensure accountability and understanding.
                                   To not renounce the impressive predictive capabilities of ML models, many strategies to
                                obtain explainable behaviours have been proposed in the literature [3, 4], for instance, the
                                adoption of interpretable ML predictors [5] or mechanisms to reverse-engineer the predictors’
                                behaviour [6]. Symbolic knowledge-extraction (SKE) techniques are exploited to this end,
                                acting in a post-processing phase to extract interpretable knowledge out of a BB predictor.


                                2nd Workshop on Bias, Ethical Al, Explainability and the role of Logic and Logic Programming, BEWARE-23, co-located
                                with AlxIA 2023, Roma Tre University, Roma, Italy, November 6, 2023
                                ⋆
                                  Original research paper.
                                *
                                  Corresponding author.
                                $ f.sabbatini1@campus.uniurb.it (F. Sabbatini); roberta.calegari@unibo.it (R. Calegari)
                                 0000-0002-0532-6777 (F. Sabbatini); 0000-0003-3794-2942 (R. Calegari)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Following the current development on the SKE field [7], we developed CReEPy, a new general-
purpose knowledge-extraction procedure based on interpretable clustering and applicable to
any kind of BB predictor. CReEPy is built upon ExACT and CREAM and it pedagogically
explains BBs performing classification or regression tasks and operating on continuous input
features. CReEPy proves the effectiveness of exploiting explainable clustering to achieve
the interpretability of BBs. Indeed, it enables the extraction of more concise and accurate
explanations compared to analogous state-of-the-art techniques.
  The paper is organised as follows: Section 2 introduces background information on the
topics discussed here and related works present in the literature. Section 3 describes the
CReEPy algorithm. Experiments and benchmark comparisons are discussed in Section 4. Finally,
conclusions are drawn in Section 5.


2. Related Works
2.1. Symbolic Knowledge Extraction
SKE consists of obtaining human-interpretable rules out of BB predictors by means of a surrogate,
explainable model that is able to mimic the BB, named in this context underlying model. The
underlying model may be a classifier, a regressor, a clustering technique, or any other opaque
predictor. The mimicking capabilities of the surrogate model are assessed via the comparison
of the outputs provided by the underlying and surrogate models w.r.t. the same inputs. SKE
techniques are currently applied in a wide range of contexts [8, 9, 10, 11, 12, 13, 14].
   The construction of the surrogate predictor may be performed in a decompositional or
pedagogical way [15]. In the former case, the BB kind and internal structure are considered, so
these algorithms are not general and can be applied only to a subset of BBs, e.g., RefAnn [16]
accepts as underlying predictors only neural networks having a single hidden layer. On the
other hand, pedagogical techniques only consider the underlying BB input/output relationship
and thus they are more general and present no constraints on the BB type and complexity.
   In the following we provide a brief description of the SKE algorithms chosen as benchmarks
for the experiments presented in this work.

2.1.1. Iter
Iter [17] is a pedagogical knowledge-extraction algorithm explicitly designed for black-box
regressors. It extracts knowledge in the form of rule lists while imposing no constraint on the
nature, structure, or training of the underlying opaque model.
  To extract rules, the Iter algorithm steps through the creation and iterative expansion of
several disjoint hypercubes, covering the whole input space the regressor has been trained
upon. In other words, Iter accepts as input a regressor and the data set used for its training,
then iteratively partitions the input feature space following a bottom-up strategy.
  At the end of the process, each partition is converted into a human-interpretable rule associ-
ated with a constant output value.
2.1.2. GridEx and GridREx
The GridEx algorithm [18] is a pedagogical technique performing symbolic knowledge extraction
from BBs designed for regression tasks. Thanks to the generalisation proposed in [19, 20] it can
be also applied to explain classifiers. In both cases, data sets have to be described by continuous
features. It is inspired to Iter, with the aim of removing the issues deriving from its possible
slow convergence as well as the small input space coverage and fidelity when dealing with high-
dimensional data sets. GridEx satisfies this goal by relying on a top-down partitioning strategy,
thus achieving good results in terms of both the number of extracted rules and corresponding
predictive performance w.r.t. the underlying BB and the data.
   The partitioning strategy adopted by GridEx consists of the recursive input feature space
splitting into smaller subregions according to a similarity threshold. At the end of the parti-
tioning, each region is translated into a human-readable rule, having preconditions describing
the region and a postcondition representing the associated output value, which is a constant
obtained by averaging the underlying model predictions for the samples included in the region.
   Unfortunately, in some real-world applications, the undesired discretisation introduced by the
constant outputs of GridEx may hinder the predictive performance of the extractor. GridREx [21]
overcomes this issue by training a linear model inside each identified hypercubic region. Linear
models are fitted on the instances contained in the corresponding hypercubes and each cube
is associated with a rule having a set of conditions on the input variables as antecedent part,
equally to Iter and GridEx, and a linear combination of the input variables (given by the linear
model) as consequent part. As a result, output predictions given by the extracted rules are no
more averaged output values of the samples contained in the corresponding hypercubes, but
more accurate linear equations.
   A disadvantage shared by GridEx and GridREx is that they perform a symmetric partitioning
– i.e., during a given iteration, they split each input dimension in a given number of congruent
partitions. This strategy may lead to suboptimal solutions when applied to real-world data sets.

2.1.3. Cart
Cart [22] is not properly a SKE technique, since it is based on the induction of binary decision
trees on data set instances. However, it may be applied as well to the output of a BB predictor to
obtain a decision tree representing the BB behaviour. Starting from the tree, it is straightforward
to extract human-comprehensible rules by converting each possible path from the tree root to the
leaves into a logic rule. Cart can be applied to both BB classifiers and regressors, however also
in this case the output value is a constant, so predictions suffer from an undesired discretisation
when used in regression tasks.

2.2. Explainable Clustering via ExACT and CREAM
ExACT [23] is an algorithm performing explainable clustering. It combines together the ag-
gregation strategies proper of traditional clustering techniques and the cluster assignment via
decision trees as done by other explainable or interpretable clustering procedures. For this
reason with ExACT it is possible to obtain explainable clusters by inducing a top-down decision
tree over the training data according to a strictly hierarchical strategy. Indeed, identified clusters
Algorithm 1 CReEPy pseudocode
Require: underlying clustering parameters Π
Require: input feature importance set Φ
Require: feature importance threshold Θ
 1: function CReEPy(𝑃 , 𝐷)
 2:    𝐷′ ← CreateDataset(𝑃, 𝐷)
 3:    𝑟𝑒𝑔𝑖𝑜𝑛𝑠 ←⋃︀clustering(Π, 𝐷′ )
 4:    return         { RegionToRule(𝑟) }
               𝑟∈𝑟𝑒𝑔𝑖𝑜𝑛𝑠


 5: function RegionToRule(𝑟𝑒𝑔𝑖𝑜𝑛)
 6:    return a Prolog rule describing 𝑟𝑒𝑔𝑖𝑜𝑛 in terms of its relevant features, by comparing Φ and Θ

 7: function CreateDataset(𝑃 , 𝐷)
 8:    return data set 𝐷 with output feature predicted by 𝑃


have the peculiarity of being concentric. The strategy adopted for the tree’s internal nodes is to
use hypercubic splits to separate whole clusters of data while avoiding the presence of instances
from multiple clusters inside the same hypercubic region.
   Explainability is obtained by approximating each identified cluster with a hypercube. The
concentric nature of the ExACT’s hierarchical approximations enables the creation of a global
interpretable clustering in the form of a rule list, where each cluster is simply expressed through
a rule having a single hypercube inclusion constraint, starting from the innermost cluster
through the outermost. The same structure may be used to provide local explanations for single
clustering assignments.
   CREAM [24] extends ExACT by providing a more complex splitting strategy based on the
iterative greedy minimisation of the predictive error measured for each possible split.


3. Symbolic Knowledge Extraction via Explainable Clustering
In this section we propose the design and implementation of a new knowledge-extraction
technique, named CReEPy, that is able to obtain human-interpretable rules in Prolog syntax out
of BB models of any sort and applicable to both classification or regression tasks. Following the
idea proposed in [25], CReEPy performs the knowledge extraction by applying a preliminary
interpretable clustering technique (i.e., ExACT or CREAM) to the training data, where the
output feature is substituted with the opaque predictions. In the experiments reported here the
ExACT procedure has been employed.

3.1. The CReEPy Algorithm
CReEPy (Clustering-based REcursive Extraction as a PYramid) is a general-purpose pedagogical
SKE technique applicable to any kind of BB predictor performing classification or regression
tasks. It relies on the cluster approximation performed by ExACT or CREAM with the goal of
providing human-readability to the underlying BB predictions and it is resumed in Algorithm 1.
   CReEPy has been envisaged to be unbounded w.r.t. the underlying clustering. For this reason,
it may be executed together with different clustering techniques if these provide hypercubic
input space approximations. Furthermore, CReEPy may be extended in the future to become
compliant with other tree-based clustering approaches, since they basically slice the input
feature space with cuts that are perpendicular to the axes and each path from the tree root to a
leaf may be translated into a hypercube.
   Being explicitly designed to work in synergy with ExACT and CREAM, CReEPy produces
logic knowledge in the form of a theory of Prolog clauses (examples are shown in the experiment
section). The theory mimics the decisions of the underlying BB model and each clause is related
to an approximated cluster identified with ExACT or CREAM, enhancing its readability. Since
these clusters are hierarchical cubes and difference cubes, thus defined as interval inclusions
and exclusions, Prolog theories are particularly suited, due to the fact that clauses are ordered.
As a consequence, it is possible to associate each clause only to preconditions referring to the
inclusion in a hypercube, assuming as true the exclusion from all the cubes described by the
preceding clauses. The expressiveness of this semantics is thus exploited at its peak by ordering
the Prolog rules starting from the one associated with the innermost hypercubic region and
then following the hierarchy up to the outermost region—equivalent to the surrounding cube
of the data set at hand.

3.2. User-Defined Parameters
CReEPy’s theory readability depends on the extracted rule amount and may be adjusted by
tuning the underlying clustering instance parameters. Readability also depends on the number
of preconditions per clause. ExACT and CREAM assign a precondition to each input dimension,
i.e., an interval inclusion constraint for each input feature. Therefore, in the default version of
CReEPy each Prolog clause has 𝑛 preconditions for 𝑛-dimensional data sets. This may be limiting
when dealing with high-dimensional data sets. For such a reason, users can provide CReEPy
with the input feature relevance and a corresponding threshold, to limit the rule preconditions
to the only features with relevance greater than the threshold. We highlight here that the input
feature relevance is calculated outside CReEPy, so users are not bounded to a specific method,
as far as they provide the feature relevance set normalised in the [0, 1] interval. A relevance
score for each input feature is mandatorily required. A suitable and fast solution to obtain
these scores can be found within the Python Scikit-Learn library.1 It is worthwhile to point
out that the feature relevance threshold does not affect the underlying clustering, but only the
translation into Prolog rules performed by CReEPy starting from the tree provided by ExACT
or CREAM (cf. RegionToRule procedure in Algorithm 1).
   The translation into Prolog rules is executed according to the following criteria:

        • for each leaf of the tree identified via the underlying clustering technique a rule is created;

        • individual rules are if-then logic rules where the conditional part is a conjunction of
          interval inclusion constraints on the input features and the corresponding action is a
          constant value (e.g., a class label or a number) or a linear combination of the input
          variables;

        • constraints are defined in the internal nodes of the tree;
1
    cf. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
     • actions are described in the leaves of the tree;

     • all variables having relevance smaller than the user-defined threshold are removed from
       the conditional part of the logic rules;

     • the resulting rules are converted into a theory having Prolog format, both human- and
       agent-interpretable.

  From a predictive perspective, the quality of CReEPy’s rules can be assessed via standard
scores generally adopted for ML classification and regression tasks, e.g., accuracy and F1 score
for the former and mean absolute/squared error and R2 score for the latter. Also dedicated
scoring metrics for symbolic knowledge, as FiRe, may be used [26].


4. Experiments
Experiments to assess the capabilities of CReEPy applied to classification and regression tasks
in comparison with state-of-the-art clustering and other ML and SKE techniques are reported
in the following. The adopted ExACT and CReEPy implementations are included in the PSyKE
framework2 [27, 28, 29, 30].

4.1. Predictive Performance and Readability Assessments
To assess the capabilities of CReEPy in explaining opaque ML predictors we carried out several
experiments involving real-world data sets. We selected the Iris data set3 [31] as a case study
for classification and 6 data sets from real use cases taken from the StairwAI EU Project4 as case
studies for regression.

4.1.1. Classification: The Iris Data Set Case Study
The Iris data set represents a simple classification task with 4 continuous input features express-
ing as many characteristics of iris flowers. The target is the species of the flowers, which in this
specific context may assume 3 possible distinct values. The data set is reported in Figure 1a.
Only the 2 most relevant features are shown, i.e., petal length and width expressed in cm.
   Our experiments on this data set are based on a 𝑘-nearest neighbours (𝑘-NN) opaque predictor
parametrised with 𝑘 = 7. The corresponding decision boundaries are reported in Figure 1b.
   Decision boundaries identified by CReEPy and other SKE techniques are reported in the other
panels of Figure 1. The parameters used for the SKE techniques are reported in the following
list.

GridEx We adopted 2 different instances of GridEx. The one reported in Figure 1c produces
     3 output rules (one per possible output class) and performs 14 slices only along input
     features having importance greater than 0.99—i.e., only along the most important feature,
2
  Code available at https://github.com/psykei/psyke-python
3
  https://archive.ics.uci.edu/ml/datasets/iris
4
  https://cordis.europa.eu/project/id/101017142; data sets are publicly available at https://zenodo.org/record/5838437
                      (a) Iris data set.                                 (b) 7-NN.


        (c) GridEx, 3 rules.                   (d) GridEx, 5 rules.                  (e) Cart.


             (f) Iter.                     (g) CReEPy, feature rele-      (h) CReEPy, feature rele-
                                               vance threshold = 0.99.        vance threshold = 0.80.
Figure 1: Symbolic knowledge extraction performed on the Iris data set.


      the petal length. Conversely, the GridEx instance shown in Figure 1d performs 5 slices
      along each input dimension having importance greater than 0.80—i.e., petal length and
      width. This results in the 5 output rules depicted in the figure.

Cart The decision boundaries reported in Figure 1e are obtained by growing an unbounded
    decision tree (no constraints on the tree depth, nor on the leaf amount).

Iter The Iter instance producing the input space partitioning reported in Figure 1f has been
     parametrised with a minimum cube update of 0.15 and an error threshold of 0.1. The
Listing 1 Rules extracted with CReEPy for the Iris data set. Feature relevance threshold = 0.99.
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, virginica) :-
    PetalLength in [4.75, 6.90].
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, versicolor) :-
    PetalLength in [2.90, 6.90].
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, setosa) :-
    PetalLength in [1.10, 6.90].


Listing 2 Rules extracted with CReEPy for the Iris data set. Feature relevance threshold = 0.80.
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, virginica) :-
    PetalLength in [4.75, 6.90], PetalWidth in [1.55, 2.60].
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, versicolor) :-
    PetalLength in [2.90, 6.90], PetalWidth in [0.90, 2.60].
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, setosa) :-
    PetalLength in [1.10, 6.90], PetalWidth in [0.00, 2.60].


      maximum number of allowed iterations and the minimum amount of samples to consider
      in each cube have been fixed to 600 and 150, respectively. The algorithm started from a
      single random cube.

CReEPy Figures 1g and 1h correspond to CReEPy instances with feature relevance thresholds
    equal to 0.99 and 0.80, respectively. The former implies to consider only the most relevant
    feature when performing the knowledge extraction. By relaxing the threshold to 0.80
    the second most important input feature is considered, as for GridEx. The input space
    partitioning reported in Figure 1g is equivalent to the Prolog theory shown in Listing 1.
    The Prolog theory corresponding to the decision boundaries reported in Figure 1h is
    shown in Listing 2.

   Table 1 summarises the predictive performance measured for each SKE technique. The
number of extracted rules is also reported as an index of the human-interpretability extent.
From the results reported in the table, it is evident that CReEPy, compared to other state-of-the-
art analogous techniques, is able to achieve comparable or slightly better predictive performance.
As for the amount of extracted rules, CReEPy provides 3 rules, the optimum result given that it
is applied to a classification task having 3 possible outcomes.

4.1.2. Regression: The StairwAI EU Project Case Study
Thanks to the versatility of the underlying clustering constituting the core of CReEPy, it is
possible to apply this latter to regression tasks as well. All the data sets used as case studies
are composed of continuous features; 2 of them have 5 input features, and the remaining
have 1 input feature. A different BB has been applied to each data set to draw predictions. A
comparison between the knowledge extraction performed on the aforementioned data sets by
CReEPy and other state-of-the-art analogous methods (namely, GridEx, GridREx and Cart)
has been reported in Table 3. Each measurement has been averaged on 5 different executions
run under analogous conditions. Results provided by different executions are almost identical
or very close, so we omitted the results’ standard deviation in the table. For each data set, the
                        Model                       Extracted       Predictive performance
                                                      rules     F1 score (data) F1 score (BB)

                         7-NN                           –                     0.95
                       GridEx                           3             0.97             0.94
                       GridEx                           5             0.94             0.92
                        Cart                            3             0.95             0.97
                         Iter                           3             0.94             0.97
      CReEPy, feature relevance threshold = 0.99        3             0.95             0.97
      CReEPy, feature relevance threshold = 0.80        3             0.94             0.96

Table 1
Assessments for the SKE techniques applied on a 7-NN performing classification on the Iris data set.


                 ID       Name        Input variables   Output variables     BB MAE

                 #1    Anticipate            5               cost              0.4
                 #2    Anticipate            1              memory             4.1
                 #3    Anticipate            1               time              8.3
                 #4   Contingency            5               cost              1.5
                 #5   Contingency            1              memory             3.6
                 #6   Contingency            1               time              0.8

Table 2
Regression data sets used to test CReEPy and compare it to analogous techniques. For each data set are
reported: an unique identifier, the name of the data set, the amount of considered input features, the
name of the considered output feature, and the mean absolute error measured for the BB trained on the
data set.


number of input variables and the mean absolute error (MAE) of the corresponding BB model
are reported. For each extractor, the number of output rules (R), the mean absolute error w.r.t.
the actual data (D), and the BB predictions are reported. For all these experiments we chose
local linear combinations of the input variables as outputs for the regions approximated by the
underlying ExACT instances. Indeed, the adoption of constant outputs resulted in more concise
output rules having, however, far worse predictive performance.
   It is important to note that only the mean absolute error is reported as a measure of predictive
performance even though other metrics are available, such as the mean squared error or the
R2 score. The number of extracted rules is taken as a readability measure since readability for
humans decreases if the amount of rules increases. Another index used to assess and compare the
quality of extractors is the completeness of the extracted knowledge [32], but in this particular
case study is not relevant, since all the procedures achieve a level of completeness above 99%.
   CReEPy proved to be superior to Cart from a predictive performance perspective since local
linear combinations of input variables better approximate the data set/BB outputs than constant
values. Furthermore, a readability comparison between the two extractors shows that CReEPy
is able to halve the extracted rule amount in 50% of experiments. Analogous considerations
hold for the comparison with GridEx, with even a more evident readability enhancement when
                      CReEPy             GridREx              GridEx              Cart
        Data set         MAE                 MAE                 MAE                MAE
                    R  D     BB        R  D      BB       R    D     BB      R    D    BB

           #1       3     1.5   1.5    5    1.9   2.0     5   14.6   14.6    4   14.7   14.7
           #2       2     4.9   3.3    5    4.9   3.2     5   15.0   14.7    4   17.4   17.0
           #3       3    11.5   7.9    4   11.1   6.6     5   17.7   15.0    4   16.7   12.9
           #4       4    26.6   26.8   4   24.4   24.6    5   28.5   28.6    4   25.1   25.1
           #5       2     4.7   2.3    4    4.5   2.3     4    4.7   2.3     4    4.5   2.3
           #6       2     1.0   0.7    5    1.0   0.7     5    3.1   3.1     4    3.9   3.8

Table 3
Results of CReEPy applied to the 6 data sets described in Table 2. For each data set the number of
extracted rules (R) and the MAE w.r.t. the data (D) and the underlying BB model are provided. Results
are compared with those of GridREx, GridEx, and Cart applied to the same data sets.


considering CReEPy.
   The most interesting comparison is with GridREx, able as well to provide local approximations
in the form of linear input variable combinations. By exploiting CReEPy it is possible to achieve
approximately the same predictive performance shown by GridREx with far better readability
(for instance, 2 output rules instead of 5 or 4, by considering experiments on data sets #2, #5 and
#6). In conclusion, our proposed knowledge extractor performing an upstream interpretable
clustering via ExACT is absolutely competitive with state-of-the-art SKE algorithms.

4.2. Computational Time Assessments
Our experiments are completed by a quantitative assessment of the computational time required
by CReEPy to perform the knowledge extraction. Tests consider data set #1, by performing both
row and column slicing on it. In particular, a comparison of the computational time required to
handle the data set with different amounts of input features and instances has been performed.
Results are reported in Figure 2. Measurements have been averaged upon 100 executions.
   From Figure 2a it is clear that the execution time grows by augmenting the number of training
instances. Clues on the independence of required time w.r.t. the number of input features may
be found in the same figure. Such independence is clearly noticeable in Figure 2b, showing
that the computational time is always smaller than 2, 1 and 0.5 seconds for 10 000, 7 000 and
4 000 instances, respectively, regardless of the input feature amount. In conclusion, we suggest
fastening CReEPy, when necessary, by reducing the amount of training data points instead of
the number of input features.


5. Conclusions
In this paper we present a SKE technique named CReEPy, applicable to any kind of opaque ML
classifier or regressor working upon data sets described by continuous input features. CReEPy
is able to outperform existing techniques from both the predictive performance and human-
readability perspectives. CReEPy is a two-phase algorithm since it performs an explainable
                              2.00         1 feat.                                                           2.00
                                           2 feat.
                                           3 feat.
                              1.75         4 feat.                                                           1.75
                                           5 feat.

                              1.50                                                                           1.50                                     1k inst.
                                                                                                                                                      2k inst.
                                                                                                                                                      4k inst.
                              1.25                                                                           1.25                                     7k inst.
         Execution time (s)


                                                                                        Execution time (s)
                                                                                                                                                      10k inst.

                              1.00                                                                           1.00

                              0.75                                                                           0.75

                              0.50                                                                           0.50

                              0.25                                                                           0.25

                              0.00                                                                           0.00
                                     0       2000     4000       6000    8000   10000                               1      2          3           4          5
                                                     Number of instances                                                       Number of features

                                         (a) Feature amount as param-                                               (b) Instance amount as pa-
                                             eter.                                                                      rameter.
Figure 2: Execution time of CReEPy w.r.t. the number of input features of the domain and the number
of instances adopted for the training.


clustering technique on the training data before the proper knowledge-extraction phase. The
human readability of the extracted knowledge is ensured since it is provided to users in the
form of a logic theory adhering to the Prolog syntax.
   The upstream clustering techniques designed for CReEPy and also described here are ExACT
and CREAM. These algorithms takes advantage of GMMs and DBSCAN to detect clusters and
approximate them with human-interpretable hypercubic regions described in terms of interval
inclusion constraints on the input features. ExACT and CREAM may as well be used as a
stand-alone explainable clustering procedure to perform clustering other than classification
and regression.
   Our future works will be focused on enhancing the rationale behind the ExACT’s and
CREAM’s region approximation and possibly on the adoption of deep clustering techniques
instead of GMMs and DBSCAN, as in the current versions. On the other hand, CReEPy may
benefit from an automatic technique enabling parameter auto-tuning and in particular we plan
to implement a procedure aimed at highlighting the best values for the maximum depth and the
predictive error threshold parameters.


Acknowledgments
This work has been supported by the EU ICT-48 2020 project TAILOR (No. 952215).
References
 [1] European Commission, C. Directorate-General for Communications Networks, Technology,
     Ethics guidelines for trustworthy AI, Publications Office, 2019. doi:doi/10.2759/346720.
 [2] European Commission, AI Act – Proposal for a regulation of the european parliament and
     the council laying down harmonised rules on artificial intelligence (Artificial Intelligence
     Act) and amending certain union legislative acts, https://eur-lex.europa.eu/legal-content/
     EN/TXT/?uri=CELEX:52021PC0206, 2021.
 [3] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey
     of methods for explaining black box models, ACM Computing Surveys 51 (2018) 1–42.
     doi:10.1145/3236009.
 [4] S. Ayache, R. Eyraud, N. Goudian, Explaining black boxes on sequential data using
     weighted automata, in: International Conference on Grammatical Inference, PMLR, 2019,
     pp. 81–103.
 [5] C. Rudin, Stop explaining black box machine learning models for high stakes decisions
     and use interpretable models instead, Nature Machine Intelligence 1 (2019) 206–215.
     doi:10.1038/s42256-019-0048-x.
 [6] E. M. Kenny, C. Ford, M. Quinn, M. T. Keane, Explaining black-box classifiers using post-hoc
     explanations-by-example: The effect of explanations and error-rates in XAI user studies,
     Artificial Intelligence 294 (2021) 103459. doi:10.1016/j.artint.2021.103459.
 [7] R. Calegari, G. Ciatto, A. Omicini, On the integration of symbolic and sub-symbolic tech-
     niques for XAI: A survey, Intelligenza Artificiale 14 (2020) 7–32. doi:10.3233/IA-190036.
 [8] G. Bologna, C. Pellegrini, Three medical examples in neural network rule extraction,
     Physica Medica 13 (1997) 183–187. URL: https://archive-ouverte.unige.ch/unige:121360.
 [9] Y. Hayashi, R. Setiono, K. Yoshida, A comparison between two neural network rule
     extraction techniques for the diagnosis of hepatobiliary disorders, Artificial intelligence
     in Medicine 20 (2000) 205–216. doi:10.1016/s0933-3657(00)00064-6.
[10] B. Baesens, R. Setiono, C. Mues, J. Vanthienen, Using neural network rule extraction
     and decision tables for credit-risk evaluation, Management Science 49 (2003) 312–329.
     doi:10.1287/mnsc.49.3.312.12739.
[11] A. Hofmann, C. Schmitz, B. Sick, Rule extraction from neural networks for intrusion
     detection in computer networks, in: 2003 IEEE International Conference on Systems,
     Man and Cybernetics, volume 2, IEEE, 2003, pp. 1259–1265. doi:10.1109/ICSMC.2003.
     1244584.
[12] M. T. A. Steiner, P. J. Steiner Neto, N. Y. Soma, T. Shimizu, J. C. Nievola, Using neural
     network rule extraction for credit-risk evaluation, International Journal of Computer
     Science and Network Security 6 (2006) 6–16. URL: http://paper.ijcsns.org/07_book/200605/
     200605A02.pdf.
[13] L. Franco, J. L. Subirats, I. Molina, E. Alba, J. M. Jerez, Early breast cancer prognosis
     prediction and rule extraction using a new constructive neural network algorithm, in:
     Computational and Ambient Intelligence (IWANN 2007), volume 4507 of LNCS, Springer,
     2007, pp. 1004–1011. doi:0.1007/978-3-540-73007-1_121.
[14] F. Sabbatini, C. Grimani, Symbolic knowledge extraction from opaque predictors applied to
     cosmic-ray data gathered with LISA Pathfinder, Aeronautics and Aerospace Open Access
     Journal 6 (2022) 90–95. URL: https://doi.org/10.15406/aaoaj.2022.06.00145. doi:10.15406/
     aaoaj.2022.06.00145.
[15] R. Andrews, J. Diederich, A. B. Tickle, Survey and critique of techniques for extracting
     rules from trained artificial neural networks, Knowledge-Based Systems 8 (1995) 373–389.
     doi:/10.1016/0950-7051(96)81920-4.
[16] R. Setiono, W. K. Leow, J. M. Zurada, Extraction of rules from artificial neural networks
     for nonlinear regression, IEEE Transactions on Neural Networks 13 (2002) 564–577.
     doi:10.1109/TNN.2002.1000125.
[17] J. Huysmans, B. Baesens, J. Vanthienen, ITER: An algorithm for predictive regression
     rule extraction, in: Data Warehousing and Knowledge Discovery (DaWaK 2006), Springer,
     2006, pp. 270–279. doi:10.1007/11823728_26.
[18] F. Sabbatini, G. Ciatto, A. Omicini, GridEx: An algorithm for knowledge extrac-
     tion from black-box regressors, in: D. Calvaresi, A. Najjar, M. Winikoff, K. Främ-
     ling (Eds.), Explainable and Transparent AI and Multi-Agent Systems. Third Interna-
     tional Workshop, EXTRAAMAS 2021, Virtual Event, May 3–7, 2021, Revised Selected
     Papers, volume 12688 of LNCS, Springer Nature, Basel, Switzerland, 2021, pp. 18–38.
     doi:10.1007/978-3-030-82017-6_2.
[19] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Hypercube-based methods for symbolic
     knowledge extraction: Towards a unified model, in: A. Ferrando, V. Mascardi (Eds.),
     WOA 2022 – 23rd Workshop “From Objects to Agents”, volume 3261 of CEUR Workshop
     Proceedings, Sun SITE Central Europe, RWTH Aachen University, 2022, pp. 48–60. URL:
     http://ceur-ws.org/Vol-3261/paper4.pdf.
[20] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Towards a unified model for symbolic
     knowledge extraction with hypercube-based methods, Intelligenza Artificiale 17 (2023)
     63–75. URL: https://doi.org/10.3233/IA-230001. doi:10.3233/IA-230001.
[21] F. Sabbatini, R. Calegari, Symbolic knowledge extraction from opaque machine learn-
     ing predictors: GridREx & PEDRO, in: G. Kern-Isberner, G. Lakemeyer, T. Meyer (Eds.),
     Proceedings of the 19th International Conference on Principles of Knowledge Repre-
     sentation and Reasoning, KR 2022, Haifa, Israel. July 31 - August 5, 2022, 2022. URL:
     https://proceedings.kr.org/2022/57/. doi:10.24963/kr.2022/57.
[22] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and Regression Trees, CRC
     Press, 1984.
[23] F. Sabbatini, R. Calegari, ExACT explainable clustering: Unravelling the intricacies of
     cluster formation, in: Proceedings of the 2nd International Workshop on Knowledge
     Diversity, KoDis 2023, Rhodes, Greece, September 2–8, 2023 (to appear), 2023.
[24] F. Sabbatini, R. Calegari, Explainable Clustering with CREAM, in: Proceedings of the 20th
     International Conference on Principles of Knowledge Representation and Reasoning, 2023,
     pp. 593–603. URL: https://doi.org/10.24963/kr.2023/58. doi:10.24963/kr.2023/58.
[25] F. Sabbatini, R. Calegari, Bottom-up and top-down workflows for hypercube- and
     clustering-based knowledge extractors, in: D. Calvaresi, A. Najjar, A. Omicini, R. Aydogan,
     R. Carli, G. Ciatto, K. Främling (Eds.), Explainable and Transparent AI and Multi-Agent
     Systems. Fifth International Workshop, EXTRAAMAS 2023, London, UK, May 29, 2023,
     Revised Selected Papers, volume 14127 of LNCS, Springer Cham, Basel, Switzerland, 2023,
     pp. 116–129. doi:10.1007/978-3-031-40878-6_7.
[26] F. Sabbatini, R. Calegari, Symbolic knowledge-extraction evaluation metrics: The FiRe
     score, in: K. Gal, A. Nowé, G. J. Nalepa, R. Fairstein, R. Rădulescu (Eds.), Proceedings of the
     26th European Conference on Artificial Intelligence, ECAI 2023, Kraków, Poland. Septem-
     ber 30 – October 4, 2023, 2023. URL: https://ebooks.iospress.nl/doi/10.3233/FAIA230496.
     doi:10.3233/FAIA230496.
[27] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, On the design of PSyKE: A platform
     for symbolic knowledge extraction, in: R. Calegari, G. Ciatto, E. Denti, A. Omicini,
     G. Sartor (Eds.), WOA 2021 – 22nd Workshop “From Objects to Agents”, volume 2963
     of CEUR Workshop Proceedings, Sun SITE Central Europe, RWTH Aachen University,
     2021, pp. 29–48. 22nd Workshop “From Objects to Agents” (WOA 2021), Bologna, Italy,
     1–3 September 2021. Proceedings.
[28] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Symbolic knowledge extraction from
     opaque ML predictors in PSyKE: Platform design & experiments, Intelligenza Artificiale
     16 (2022) 27–48. URL: https://doi.org/10.3233/IA-210120. doi:10.3233/IA-210120.
[29] F. Sabbatini, G. Ciatto, A. Omicini, Semantic Web-based interoperability for intelligent
     agents with PSyKE, in: D. Calvaresi, A. Najjar, M. Winikoff, K. Främling (Eds.), Ex-
     plainable and Transparent AI and Multi-Agent Systems, volume 13283 of Lecture Notes
     in Computer Science, Springer, 2022, pp. 124–142. URL: http://link.springer.com/10.1007/
     978-3-031-15565-9_8. doi:10.1007/978-3-031-15565-9_8.
[30] R. Calegari, F. Sabbatini, The PSyKE technology for trustworthy artificial intelli-
     gence 13796 (2023) 3–16. URL: https://doi.org/10.1007/978-3-031-27181-6_1. doi:10.1007/
     978-3-031-27181-6_1, xXI International Conference of the Italian Association for Arti-
     ficial Intelligence, AIxIA 2022, Udine, Italy, November 28 – December 2, 2022, Proceedings.
[31] R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics
     7 (1936) 179–188. doi:10.1111/j.1469-1809.1936.tb02137.x.
[32] F. Sabbatini, R. Calegari, On the evaluation of the symbolic knowledge extracted from
     black boxes, in: AAAI 2023 Spring Symposium Series (to appear), San Francisco, California,
     2023.