<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Computer Journal 15 (1972)
326-332. URL: https://doi.org/10.1093/comjnl/15.4.326. doi:10.1093/comjnl/15.4.326.
arXiv:https://academic.oup.com/comjnl/article</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1093/comjnl/15.4.326</article-id>
      <title-group>
        <article-title>ExACT Explainable Clustering: Unravelling the Intricacies of Cluster Formation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federico Sabbatini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberta Calegari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Alma Mater Studiorum-University of Bologna</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Pure and Applied Sciences, University of Urbino Carlo Bo</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>97</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Cluster assignments, in particular the deep clustering ones, are often hard to explain, partially because they depend on all the features of the data in a complicated way, so it is dificult to determine why a particular row of data is classified in a particular bucket. This opaqueness makes their predictions not trustable, as for many predictors based on black boxes. This paper aims to tackle the aforementioned issues by introducing the design and implementation of ExACT, a new explainable clustering algorithm based on the induction of decision trees and performing hypercubic approximations of the input feature space in order to provide output human-interpretable clusters. Furthermore, ExACT is versatile enough to perform explainable classification and regression as well, as demonstrated in this work, proving to be a competitive alternative to existing analogous algorithms.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable clustering</kwd>
        <kwd>Explainable artificial intelligence</kwd>
        <kwd>PSyKE</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Clustering is one of the most fundamental optimisation techniques constituting the heart of
many applications in machine learning (ML) and data mining. However, in the past few years
due to the increasing need for transparency [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] – in particular in critical domains related to
human health, safety, and wealth – people do not trust clusterings, or more generally learning
models that are not interpretable by humans. Models lacking interpretability are defined opaque
or black boxes (BBs), regardless of their nature (e.g., supervised neural network classifiers as
well as unsupervised deep clustering techniques). ML models able to achieve the best predictive
performance are generally the most complex and thus dificult to be inspected by humans and,
therefore, the adoption of opaque models for high-stakes decisions is mandatorily subject to
the derivation of some kind of human-intelligible knowledge.
      </p>
      <p>
        To not renounce the impressive predictive capabilities of ML models, many strategies to obtain
explainable behaviours have been proposed in the literature [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. When possible, interpretable
ML predictors as decision trees are exploited [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Conversely, when these interpretable models
do not have a satisfying performance (e.g., shallow decision trees) or their complexity hinders
their actual readability (e.g., deep decision trees), it is possible to reverse-engineer the predictors’
behaviour [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Symbolic knowledge-extraction (SKE) techniques are exploited to this end, acting
in a post-processing phase to extract interpretable knowledge out of a BB predictor.
      </p>
      <p>
        Inspired by the works on explainable clustering [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and the ones on SKE [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
        ],
in this paper we provide a human-interpretable model blending both topics and resulting in
the development of ExACT, a new explainable clustering technique also suitable to perform
explainable classification and regression tasks. ExACT may be applied to continuous input
features and both categorical and numerical output data. The technique is able to describe
clusters of instances in terms of human-comprehensible rules derived from a binary tree built
according to a hierarchical hypercubic partitioning of the input feature space.
      </p>
      <p>The paper is organised as follows: Section 2 introduces background information on the
topics discussed here and related works present in the literature. Section 3 describes the
ExACT algorithm. Experiments and benchmark comparisons are discussed in Section 4. Finally,
conclusions are drawn in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <sec id="sec-2-1">
        <title>2.1. Traditional Clustering</title>
        <p>
          A large amount of diferent clustering techniques have been proposed in the literature during
the decades, each one providing some peculiar advantages but at the same time bound to
specific limitations, usually regarding the properties of the clusters they can identify (e.g., shape,
density). For such a reason, there is no widely acknowledged optimum technique for achieving
the best predictive performance in every possible application. Amongst the most
predictionefective techniques, it is worth mentioning the following ones: Gaussian mixture models
(GMMs; [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]), DBSCAN and DBSCAN++ [14, 15, 16], OPTICS [17], BIRCH [18], k-means [19],
Mean shift [20] and spectral clustering [21, 22]. The main drawback of these techniques in
terms of explainability is to rely on an opaque model.
        </p>
        <p>In the following, further details for the traditional clustering techniques exploited within the
ExACT algorithm are provided.
2.1.1. Gaussian Mixture Models
GMMs can be applied to perform (soft) clustering since they are probabilistic models assuming
that all data points have been generated by a mixture of a finite number of Gaussian distributions
having parameters to be determined. GMMs are more flexible for clustering than (for instance)
kmeans, since they can find clusters of data that are not only spherical. In addition, soft clustering
is provided, since each GMM prediction is associated with a corresponding probability.</p>
        <p>The performance of a GMM is strongly impacted by the number of Gaussian components to
consider during the model training. The tuning of this parameter can be automated by using
the Bayes information criterion (BIC). It is suficient to train several instances of a GMM, with a
diferent number of components, then calculate the BIC score for every instance and finally
pick the one with the lowest associated BIC score.
2.1.2. DBSCAN
The DBSCAN algorithm (Density-Based Spatial Clustering of Applications with Noise; [14, 15])
is an unsupervised clustering technique to find subsets of data having arbitrary shapes. It
is based on a density criterion, so clusters are created by aggregating data samples w.r.t. a
user-defined parameter, usually called , representing the maximum distance between two data
points inside the same cluster. For this reason, the  parameter is the most important to tune for
DBSCAN. An automated procedure for this purpose has been proposed in [23]. The peculiarities
of DBSCAN make it a good choice to perform outliers removal from clusters found by other
clustering techniques.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explainable Clustering</title>
        <p>During the last decades, a number of researchers have focused their attention on the
explanation of clusters, in particular in the field of medical applications, one of the most relevant
critical areas. Several works in the literature are related to tree-based clustering, such as the
IMM algorithm [24] and others [25, 26, 27]. For inducing a decision tree, there are two main
approaches: top-down and bottom-up. The top-down is the most used and it will also be used in
our proposal. This approach starts building a root node, which contains all objects of a training
database. After that, the root node is split into partitions (usually named child nodes) and this
is recursively repeated over the child nodes until a stopping criterion is met. It is worth noting
that these approaches share a common trait, i.e., they partition the input feature space with
cutting hyperplanes perpendicular to the most relevant features, taking into account one feature
for each cut.</p>
        <p>Cluster explanation via rectangular input space partitioning has been proposed in [28],
whereas a less human-readable density-based clustering has been recently proposed in [29] for
the CLASSIX algorithm. The former enables higher degrees of human interpretability since
it describes clusters in terms of only 2 interval inclusion preconditions. As a drawback, it
may combine input features in the output description to create new composite features, thus
hindering interpretability from the human perspective.</p>
        <p>Other examples of explainable clustering techniques, in particular applied to image data sets
and medical time series, have been presented in [30].</p>
        <p>W.r.t. the aforementioned techniques, ExACT is able to consider all the input features –
similarly to tree-based clustering methods as IMM – to create a density-based partitioning—as
DBSCAN and CLASSIX. The main diference with all the existing methods is the highly
humanreadable format of the identified clusters, that are approximated via hypercubic regions. Thus,
ExACT ofers a more compact and human-readable representation than existing techniques, even
though the hypercubic approximation may hinder its performance when applied to overlapping
clusters.</p>
        <p>ExACT provides global explanations about the input feature space partitioning into disjoint
clusters, that may be used to obtain local explanations about single cluster assignments.</p>
        <p>In the following we provide an overview of two explainable clustering techniques used as
benchmarks in the experiments presented here, namely CLASSIX and IMM.
2.2.1. CLASSIX
CLASSIX (contrived acronym defined by the authors as “CLustering by Aggregation with
Sorting-based Indexing” and the letter “X” for “eXplainability”) has been recently proposed
in [29] as an explainable clustering procedure based on two phases and denoted by small
computational time requirements. During the first phase, a greedy aggregation is performed
in order to create groups of training instances having small distances from each other—where
“small” is defined via an input parameter. A preceding sorting step is required to complete the
aggregation. The second phase consists of merging the groups into definitive clusters. The
merging phase may be density- or distance-based and it is described in detail in [29].</p>
        <p>Users adopting CLASSIX need to provide a pair of parameters to define the minimum size of
the clusters, intended as the number of instances, and the maximum distance between training
samples belonging to the same group (with reference to the aggregation phase).</p>
        <p>CLASSIX is able to provide both local and global explanations. The global explanation is
based on the coordinates of the initial points for each one of the groups created at the end of
the first phase. Local explanations may describe the reason behind the cluster assignment for
a single instance as well as why two instances are assigned to the same cluster or not. Local
explanations are provided by listing the operations performed during CLASSIX’s merging phase.
2.2.2. IMM
In [24] the IMM (Iterative Mistake Minimization) clustering procedure is presented as an
accurate, eficient, and interpretable method based on the induction of decision trees. Induced
decision trees are binary and their internal nodes are associated with training data partitions.
Node splits involve single features. The algorithm requires growing  leaves to identify as many
clusters, trying to keep the tree size as small as possible. During the tree construction, the
cluster’s fragmentation is minimised. Fragmentation is intended as spreading instances from a
single cluster over multiple subtrees.</p>
        <p>To provide explanations for a cluster assignment it is suficient to describe the complete path
from the tree root through the leaf associated with that assignment. As for the tree growth
complexity, a clustering identifying  clusters may be described by a tree with depth equal to
 − 1, in the worst case. This implies describing any clustering assignment with at most  − 1
constraints on the input features.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Explainable Clustering with ExACT</title>
      <p>In this section we propose the design and implementation of a new explainable clustering
technique. ExACT (EXplainable Automated Clustering Technique) is a hierarchical clustering
algorithm based on the induction of binary trees where each node represents an input space
region approximating a cluster and having the shape of a hypercube (or of a diference of two
hypercubes). ExACT is a supervised technique since it distinguishes between input and output
features. It is suitable to be applied to continuous input attributes and continuous, discrete, or
categorical output variables.</p>
      <p>A key peculiarity of the algorithm is to keep the memory of the output associated with the
clusters, other than the mere membership of instances to clusters. In the case of regression
data sets, the clusters’ outputs are real values or regression laws involving the input variables,
instead of the discrete outputs adopted for classification and clustering data sets. More in detail,
we stick to the generalisation proposed in [31, 32], associating: (i) the most common label to
regions containing classification or clustering data points; (ii) the mean value calculated on
the data points’ outputs to regions containing regression data points if there is a high degree
of similarity between these output values; (iii) a linear combination of the input variables to
regions containing regression data points, otherwise. For this reason ExACT has predictive
capabilities that go beyond the usual clustering assignments and its identified clusters may be
evaluated with classical clustering scores but also with metrics borrowed from classification
and regression tasks (i.e., predictive error can be evaluated through the F1 or R2 scores).</p>
      <p>In the following we use the notion of predictive error for the regions detected at the end of
the procedure. The predictive error is evaluated through the diference between the outputs
associated with the ExACT’s clusters and the corresponding expected predictions (i.e., the
ground truth). The predictive error assessment difers based on the kind of output feature at
hand. Indeed, it is defined as inversely proportional to the accuracy score for data sets having
discrete outputs and as the mean absolute error for real-valued outputs.</p>
      <sec id="sec-3-1">
        <title>3.1. Properties of ExACT</title>
        <p>The algorithm relies on two well-known clustering techniques—namely, GMMs and DBSCAN.
In a nutshell, GMMs are exploited for detecting the clusters inside an input space region,
then DBSCAN is applied to the detected clusters for the deletion of possible outliers. Finally,
hypercube approximation is performed to approximate each cluster to a hypercube that can
“explain” the cluster in a human-interpretable format. The goal of ExACT is to approximate the
set of clusters detected by GMMs (and cleaned by DBSCAN) with as many regions (cubes), with
the following criteria:
exhaustivity of the approximations all the input feature space is covered by regions, i.e.,
every input instance belongs to at least one region;
disjointness of the regions each input instance inside the input feature space belongs to at
most one region;
strict hierarchy of the regions each found hypercubic region lies within a wider region
having the same shape.</p>
        <p>The resulting decision tree provides an explanation of the clusters, explainability via
interpretability [33]. The strict hierarchy of the regions trivially implies that instances belonging to
an inner region also belong to the outer, enclosing ones, apparently violating the disjointness
property. However, when calculating the membership of data points to regions, ExACT assigns
every instance to the smallest region containing that point, so predictions are completely
unambiguous and easily human-understandable, following the induced tree structure from the
rightmost leaf through the root node, and disjointness is preserved.</p>
        <p>ExACT is a recursive algorithm, starting from the surrounding cube – i.e., the minimal
hypercube enclosing all the data set samples – and iteratively building smaller, inner hypercubes,
inducing a binary tree structure. The rationale behind our method is to create at every iteration
a diference cube enclosing data points belonging to a single cluster, with the goal of minimising
the total amount of cubes and the cluster fragmentation. As detailed in the following, the
diference cube is the one obtained by subtracting the best cube (right child) from the starting
hypercube (parent node).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Algorithm and Parameters</title>
        <p>The algorithm details are summarised in Algorithm 1 and an example of the performed
partitioning on an artificial data set described by 3 concentric clusters is reported in Figure 1.
In particular, Figure 1a depicts the input data set, having 2 continuous input features and 1
continuous output feature. The approximation provided by ExACT is reported in Figure 1b,
whereas the binary tree induced by the algorithm during the recursive input space partitioning
is reported in Figure 1c.</p>
        <p>Binary trees built by ExACT are obtained by partitioning the input feature space into
hypercubes. ExACT starts by taking into consideration all the input space, that represents the initial
hypercube. To build the tree the following steps are recursively executed:
Algorithm 1 ExACT pseudocode
Require: maximum depth  , default value: 2
Require: predictive error threshold  , default value: 0.1
Require: maximum amount of clusters  , default value: 2
Provide: the root node of the induced tree
1: function ExACT()
2: 0 ← SurroundingCube()
3: 0 ← NewNode(0, )
4: Split(0, 1)
5: return 0
6: function SurroundingCube()
7: return the minimal cube enclosing all the points of cluster 
8: function Split(, ℎ)
9:  ← CreateClusters(.)
10:  ← ClustersToNodes(, .)
11: if  = ∅ then return
12:  ← arg max { Volume(.) }</p>
        <p>∈
13: .ℎ ← 
14: .  ← NewNode(., . ∖ .)
15:  ← PredictiveError(.ℎ)
16: if ( &gt;  ) ∧ (ℎ &lt;  ) then Split(.ℎ, ℎ + 1)
17: function NewNode(, )
18:  ← new Node()
19: . ← , . ← 
20: .ℎ ← ∅ , .  ← ∅
21: return 
22: function CreateClusters()
23: return at most  clusters containing the data of 
24: function ClustersToNodes(, )
25:  ← ∅
26: for all  ∈  do
27:  ←  ∖ {  ∈  |  is an outlier }
28:  ← SurroundingCube()
29: if  ̸=  then  ←  ∪ { NewNode(, ) }
30:</p>
        <p>return 
31: function Volume()
32: return the volume of hypercube 
33: function PredictiveError()
34: return average predictive error for 
◁ Recursion
1. given a hypercubic portion of the data set, associated with a tree node (a.k.a., the current
node), apply GMMs to determine both the optimal number of clusters and the clusters
themselves;
2. for each found cluster, apply DBSCAN to remove the outliers for that cluster (to avoid
creating dirty, too big hypercubes);
3. construct the minimal surrounding cube enclosing the data points of each cluster (i.e.,
approximate clusters to hypercubes): each cluster is associated with the smallest hypercube
containing all the points within it;
4. select the best cube amongst all the created hypercubes, that is the one having the biggest
volume, excluding hypercubes equivalent to the current node’s cube (i.e., the one enclosing
the whole data set when performing the first algorithm recursion);
5. assign the best cube, with all the contained data points, to the right child of the current
node;
6. assign the diference cube to the left child of the current node—the diference cube is
the one obtained by subtracting the best cube (right child) from the starting hypercube
(parent node);
7. repeat all the steps above considering the right child as the current node if the predictive
error measured for the right child’s cube is greater than a user-defined threshold  .</p>
        <p>Note that the algorithm is not going to split the left child, since the diference hypercube is
assumed to be suficiently precise. The diference hypercube, in fact, typically approximates a
single cluster, or a cluster portion if ExACT is not able to avoid fragmentation via hypercubic
approximation.</p>
        <p>The algorithm terminates when the node assigned to the right child contains a cube whose
predictive error is smaller than the  threshold or, otherwise, after a number of recursive
iterations equal to the maximum user-defined depth  . The set of hyper-parameters to tune
for ExACT is completed by  , which represents the maximum number of clusters that it is
possible to find with the GMMs. We stress here the fact that  is only an upper bound since
the optimal number of clusters to be identified is automatically assessed through the BIC score
during the execution of ExACT. It is recommended to keep  larger than the actual cluster
amount, if known, in order not to perform a wrong clustering. Very large values for  do
not afect the performance of ExACT, since they do not imply selecting with the automated
BIC-based procedure a large number of clusters to be detected with the GMMs. Analogously, the
 parameter of DBSCAN is automatically set as suggested in [23], so it is not a parameter to be
chosen by users executing ExACT. The  parameter should be set according to the consideration
that a depth equal to  produces at most  + 1 explainable clusters. It is important to notice that
only one explainable cluster (the innermost in the hierarchy) is a hypercube, all the others are
diference cubes. Finally,  strongly depends on the task at hand. When dealing with categorical
output features, as for ExACT applied on clustering or classification data sets,  needs to be
defined as a predictive accuracy threshold. In this case, a cube is further partitioned only if its
predictive accuracy is lower than the given threshold. On the other hand, when performing
clustering on regression data sets, the threshold represents the maximum mean absolute error
allowed for individual cubes. Consequently, hypercubes whose predictive error exceeds the 
threshold are further partitioned.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>Experiments to assess the capabilities of ExACT applied to clustering, classification, and
regression tasks in comparison with state-of-the-art clustering and other predictors are
reported in the following. The adopted ExACT implementation is included in the PSyKE
framework1 [34, 35, 36, 37].</p>
      <sec id="sec-4-1">
        <title>4.1. ExACT for Explainable Clustering</title>
        <p>Spectral
Ground truth Clustering</p>
        <p>Gaussian</p>
        <p>Mixture
Ward</p>
        <p>BIRCH</p>
        <p>CLASSIX</p>
        <p>IMM</p>
        <p>ExACT
1
S
D
2
S
D
3
S
D
rs
iI
C
B
W
1Code available at https://github.com/psykei/psyke-python
Listing 1 Clustering rules provided by ExACT for the Iris data set.</p>
        <p>Cluster 1 if PetalWidth in [1.6, 2.5] and PetalLength in [4.8, 6.9] and</p>
        <p>SepalWidth in [2.5, 3.8] and SepalLength in [5.7, 7.9].</p>
        <p>Cluster 2 if PetalWidth in [1.0, 2.5] and PetalLength in [3.0, 6.9] and</p>
        <p>SepalWidth in [2.2, 3.8] and SepalLength in [4.9, 7.9].</p>
        <p>Cluster 3 otherwise.</p>
        <p>The capabilities of ExACT in clustering labelled data have been assessed on six diferent
data sets. Three of them are synthetic clustering data sets included in the Scikit-Learn library2.
These data sets are described by 2 continuous input features and they have 2 or 3 clusters to be
identified. The other three are real-world classification data sets:
Iris data set [38], composed of 4 continuous input features and a categorical output feature
assuming 3 diferent values. Only the petal length and width are reported in the figures
shown in this section;
Wine data set [39], having 13 real-valued features and 3 possible discrete output values
representing as many classes. Only alcohol and proline input features are shown in the figures
reported here;
Wisconsin breast cancer (WBC) data set [40], a binary classification task described by 30
continuous input features. Worst smoothness and worst symmetry are the input features
reported in the figures.</p>
        <p>
          Figure 2 depicts our experiments involving clustering. All the data sets are represented in the
leftmost column of Figure 2a. The other columns report the results of traditional and explainable
clustering techniques applied to the same data sets. We selected spectral clustering, Ward,
BIRCH, and GMMs as traditional clustering benchmarks, whereas CLASSIX and IMM are the
explainable alternatives. The clustering assignments performed by ExACT are reported in
the rightmost column. In Figure 2b the performance assessments for all the aforementioned
clustering techniques applied to all the selected data sets are summarised. The adopted metrics
are the following: (i) adjusted rand index (ARI; [41]); (ii) adjusted mutual score (AMI; [42]);
(iii) Fowlkes-Mallows index (FMI; [43]); (iv) V-measure (V; [44]); (v) computational time, reported
in seconds and averaged over 100 runs. The other 4 indices are defined in the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] interval, with
values close to 1 identifying good clustering assignments. Being metrics explicitly designed for
clustering tasks, they are not susceptible to permutations and/or renaming of the clusters’ labels.
For this reason, they are not applicable to evaluate the accuracy score of clustering techniques
applied to perform classification tasks.
        </p>
        <p>Figure 2a enables a qualitative assessment of the performance achieved by the clustering
techniques. ExACT, CLASSIX, and spectral clustering appear as the best procedures. A
quantitative assessment can be carried out on the results of Figure 2b, highlighting the superiority of
these 3 algorithms. Unfortunately, none of these techniques can achieve the best performance
on all the data sets. A drawback of ExACT is that it may be slower than the others. However, it
completes its task in less than 1 second in all the case studies.
2https://scikit-learn.org/stable/modules/clustering.html</p>
        <p>Clustering rules extracted via ExACT for the Iris data set are exemplified in Listing 1.</p>
        <p>Amongst the 3 explainable clustering techniques, IMM is the one having the lowest associated
scores. On the other hand, CLASSIX seems to be equivalent or only slightly worse than ExACT
from the clustering performance standpoint. The interpretability of these techniques cannot
be easily assessed, since they do not provide the same representation. IMM and ExACT are
comparable since they follow a tree structure and therefore their clusters can be described by
reading the tree paths from the root to the leaves. CLASSIX, conversely, provides a diferent
sort of explainability. For instance, when queried about an individual assignment, CLASSIX
provides the numeric code associated with the corresponding output cluster and the one of the
group identified during its first grouping phase. No further information about the clusters or
groups is provided unless to query CLASSIX for a global explanation. In this case, the centroid
coordinates of each group are listed, together with the indication of the final cluster where
the groups have flowed into. It is clear how this kind of explanation is not straightforward to
be understood by humans, since it involves a chain of concepts encoded with numbers and
coordinates, such as the radius of the CLASSIX groups. Furthermore, if CLASSIX is executed on
normalised data, also its response will contain normalised coordinates. Conversely, ExACT may
be provided with the normalisation schema in order to obtain cluster explanations that do not
require further manipulation to enable human analysis. Finally, ExACT is more general-purpose
than the other techniques, as discussed in the following.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. ExACT for Explainable Classification</title>
        <p>Given its ability to output (explainable) predictions when queried with samples to be classified,
ExACT is also suitable to carry out classification tasks. Figure 3 depicts the results of ExACT
applied to perform classification on the Iris, Wine, and WBC data sets. Its predictions are
compared with those of state-of-the-art machine learning predictors, namely: a k-nearest
neighbours (k-NN), a decision tree (DT), and a random forest (RF) model. The performance of
these models is assessed and compared for each data set through the classification accuracy
score, representing the rate of correct predictions over all the predictions. Predictions and
measured accuracy scores are shown in Figures 3a and 3b, respectively.</p>
        <p>ExACT achieves a comparable or better predictive performance w.r.t. the other models. In
addition, its predictions are more valuable, since they are human-interpretable. We exemplify in
Table 1 the clusters obtained for the Iris data set and in Figure 4 the corresponding explainable
tree. The ExACT instance to obtain these results has been parametrised with a maximum depth
 = 2, an error threshold  = 0.1 and a maximum amount of clusters  = 3. The selected 
value enables the creation of up to 4 clusters. Being a classification data set,  = 0.1 implies
that hypercubic regions having an accuracy score smaller than 1 −  = 0.9 are further split
during the recursive iterations of ExACT. The provided clustering is human-interpretable since
for each possible Iris output class an associated hypercubic input space region is provided and
such regions are described in terms of interval inclusion constraints over input variables.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. ExACT for Explainable Regression</title>
        <p>ExACT has been applied to several regression data sets, namely:
1.00
0.75
0.50
0.25
0.00
0
1</p>
        <p>Setosa
Versicolor
Virginica
1
2
3
Iris
(a) Output predictions for classification tasks.</p>
        <p>Wine</p>
        <p>WBC
k-NN</p>
        <p>DT RF
Accuracy score</p>
        <p>ExACT
k-NN</p>
        <p>ExACT</p>
        <p>k-NN
DT RF
Accuracy score</p>
        <p>DT RF
Accuracy score</p>
        <p>ExACT
(b) Classification performance assessments.</p>
        <p>Combined Cycle Power Plant (CCPP) data set [45], having 4 input continuous features. Only
ambient temperature and exhaust vacuum input features are reported in the following
ifgures;
Istanbul Stock Exchange (ISE) data set [46], described by 7 continuous input features. Only
the stock market return index of UK and the MSCI European index are shown in the
ifgures;</p>
        <p>P
P
C
C
E
S
I
s
e
t
e
b
a
i</p>
        <p>D</p>
        <p>Diabetes data set [47], containing 10 input variables. In the figures the S1 and S5 features are
reported.</p>
        <p>We applied to these data sets ExACT and a pool of ML regressors (a k-NN, a DT, and a RF) as
for the classification case study. Ground truth and predictions are reported in Figure 5a. To assess
the predictive performance of the algorithms the R2 value has been adopted. Corresponding</p>
        <sec id="sec-4-3-1">
          <title>Ambient Exhaust temp. (AT) vacuum (EV)</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>Cluster 1</title>
          <p>Cluster 2
Cluster 3
Cluster 4
measurements are shown in Figure 5b. Once again, ExACT has a predictive performance
comparable or even superior to that of ML models, and its predictions are human-interpretable
due to its hypercubic approximation strategy.</p>
          <p>An example of ExACT’s explainable clustering applied to a regression task is reported in
Table 2 for the CCPP data set. The corresponding algorithm parameters are  = 3,  = 0.02
and  = 2. We stress here that when ExACT is applied for regression the recursive refinement
of hypercubic approximations is performed only for cubes having mean absolute predictive
error greater than the  threshold. It is worthwhile to notice that the outputs shown in Table 2
are linear combinations of the input variables. It is as well possible to obtain constant outputs,
to the benefit of human interpretability but at the expense of predictive performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper we present an explainable clustering technique named ExACT, applicable to
any kind of task described by a data set having continuous input features. No constraints are
assumed on the output feature. This algorithm takes advantage of GMMs and DBSCAN to
detect clusters and approximate them with human-interpretable hypercubic regions described
in terms of interval inclusion constraints on the input features. Our experiments prove the
efectiveness of ExACT performing explainable clustering, classification and regression in
comparison to other state-of-the-art traditional and explainable clustering techniques, but also
w.r.t. ML classifiers and regressors.</p>
      <p>Our future works will be focused on enhancing the rationale behind the ExACT’s region
approximation and possibly on the adoption of deep clustering techniques instead of GMMs
and DBSCAN, as in the current version. Furthermore, ExACT may benefit from an automatic
technique enabling parameter auto-tuning and in particular we plan to implement a procedure
aimed at highlighting the best values for the maximum depth and the predictive error threshold
parameters.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been supported by European Union’s Horizon Europe AEQUITAS research and
innovation programme under grant number 101070363.
[26] R. Fraiman, B. Ghattas, M. Svarc, Interpretable clustering using unsupervised binary trees,
Adv. Data Anal. Classif. 7 (2013) 125–145. URL: https://doi.org/10.1007/s11634-013-0129-3.
doi:10.1007/s11634-013-0129-3.
[27] D. Bertsimas, A. Orfanoudaki, H. M. Wiberg, Interpretable clustering via optimal trees,</p>
      <p>CoRR abs/1812.00539 (2018). URL: http://arxiv.org/abs/1812.00539. arXiv:1812.00539.
[28] J. Chen, Y. Chang, B. Hobbs, P. J. Castaldi, M. H. Cho, E. K. Silverman, J. G. Dy, Interpretable
clustering via discriminative rectangle mixture model, in: F. Bonchi, J. Domingo-Ferrer,
R. Baeza-Yates, Z. Zhou, X. Wu (Eds.), IEEE 16th International Conference on Data Mining,
ICDM 2016, December 12-15, 2016, Barcelona, Spain, IEEE Computer Society, 2016, pp.
823–828. URL: https://doi.org/10.1109/ICDM.2016.0097. doi:10.1109/ICDM.2016.0097.
[29] X. Chen, S. Güttel, Fast and explainable clustering based on sorting, CoRR abs/2202.01456
(2022). URL: https://arxiv.org/abs/2202.01456. arXiv:2202.01456.
[30] L. Manduchi, M. Hüser, M. Faltys, J. E. Vogt, G. Rätsch, V. Fortuin, T-DPSOM: an
interpretable clustering method for unsupervised learning of patient health states, in:
M. Ghassemi, T. Naumann, E. Pierson (Eds.), ACM CHIL ’21: ACM Conference on Health,
Inference, and Learning, Virtual Event, USA, April 8-9, 2021, ACM, 2021, pp. 236–245. URL:
https://doi.org/10.1145/3450439.3451872. doi:10.1145/3450439.3451872.
[31] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Hypercube-based methods for symbolic
knowledge extraction: Towards a unified model, in: A. Ferrando, V. Mascardi (Eds.),
WOA 2022 – 23rd Workshop “From Objects to Agents”, volume 3261 of CEUR Workshop
Proceedings, Sun SITE Central Europe, RWTH Aachen University, 2022, pp. 48–60. URL:
http://ceur-ws.org/Vol-3261/paper4.pdf.
[32] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Towards a unified model for symbolic
knowledge extraction with hypercube-based methods, Intelligenza Artificiale 17 (2023)
63–75. URL: https://doi.org/10.3233/IA-230001. doi:10.3233/IA-230001.
[33] A. Blanco-Justicia, J. Domingo-Ferrer, Machine learning explainability through
comprehensible decision trees, in: International Cross-Domain Conference for Machine Learning
and Knowledge Extraction, Springer, 2019, pp. 15–26.
[34] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, On the design of PSyKE: A platform
for symbolic knowledge extraction, in: R. Calegari, G. Ciatto, E. Denti, A. Omicini,
G. Sartor (Eds.), WOA 2021 – 22nd Workshop “From Objects to Agents”, volume 2963
of CEUR Workshop Proceedings, Sun SITE Central Europe, RWTH Aachen University,
2021, pp. 29–48. 22nd Workshop “From Objects to Agents” (WOA 2021), Bologna, Italy,
1–3 September 2021. Proceedings.
[35] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Symbolic knowledge extraction from
opaque ML predictors in PSyKE: Platform design &amp; experiments, Intelligenza Artificiale
16 (2022) 27–48. URL: https://doi.org/10.3233/IA-210120. doi:10.3233/IA-210120.
[36] F. Sabbatini, G. Ciatto, A. Omicini, Semantic Web-based interoperability for intelligent
agents with PSyKE, in: D. Calvaresi, A. Najjar, M. Winikof, K. Främling (Eds.),
Explainable and Transparent AI and Multi-Agent Systems, volume 13283 of Lecture Notes
in Computer Science, Springer, 2022, pp. 124–142. URL: http://link.springer.com/10.1007/
978-3-031-15565-9_8. doi:10.1007/978-3-031-15565-9_8.
[37] R. Calegari, F. Sabbatini, The PSyKE technology for trustworthy artificial
intelligence 13796 (2023) 3–16. URL: https://doi.org/10.1007/978-3-031-27181-6_1. doi:10.1007/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Directorate-General for Communications Networks, Technology, Ethics guidelines for trustworthy AI</article-title>
          ,
          <string-name>
            <surname>Publications</surname>
            <given-names>Ofice</given-names>
          </string-name>
          ,
          <year>2019</year>
          . doi: doi/10.2759/346720.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <source>AI</source>
          Act -
          <article-title>Proposal for a regulation of the european parliament and the council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts</article-title>
          , https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=CELEX:52021PC0206,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <article-title>A survey of methods for explaining black box models</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>51</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          . doi:
          <volume>10</volume>
          .1145/3236009.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ayache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eyraud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goudian</surname>
          </string-name>
          ,
          <article-title>Explaining black boxes on sequential data using weighted automata</article-title>
          , in: International Conference on Grammatical Inference,
          <string-name>
            <surname>PMLR</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>
          ,
          <source>Nature Machine Intelligence</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>206</fpage>
          -
          <lpage>215</lpage>
          . doi:
          <volume>10</volume>
          .1038/s42256-019-0048-x.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Kenny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Quinn</surname>
          </string-name>
          , M. T. Keane,
          <article-title>Explaining black-box classifiers using post-hoc explanations-by-example: The efect of explanations and error-rates in XAI user studies</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>294</volume>
          (
          <year>2021</year>
          )
          <article-title>103459</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.artint.
          <year>2021</year>
          .
          <volume>103459</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Moshkovitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rashtchian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Frost</surname>
          </string-name>
          ,
          <article-title>Explainable k-means and k-medians clustering</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>7055</fpage>
          -
          <lpage>7065</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabbatini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Calegari</surname>
          </string-name>
          ,
          <article-title>Explainable clustering with CREAM</article-title>
          , in: P. Marquis,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kern-Isberner</surname>
          </string-name>
          (Eds.),
          <source>20th International Conference on Principles of Knowledge Representation and Reasoning (KR</source>
          <year>2023</year>
          ), IJCAI Organization, Rhodes, Greece,
          <year>2023</year>
          , pp.
          <fpage>593</fpage>
          -
          <lpage>603</lpage>
          . doi:
          <volume>10</volume>
          .24963/kr.2023/58.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Calegari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>On the integration of symbolic and sub-symbolic techniques for XAI: A survey</article-title>
          ,
          <source>Intelligenza Artificiale</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>7</fpage>
          -
          <lpage>32</lpage>
          . doi:
          <volume>10</volume>
          .3233/IA-190036.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huysmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baesens</surname>
          </string-name>
          , J. Vanthienen,
          <string-name>
            <surname>ITER:</surname>
          </string-name>
          <article-title>An algorithm for predictive regression rule extraction, in: Data Warehousing and Knowledge Discovery (DaWaK</article-title>
          <year>2006</year>
          ), Springer,
          <year>2006</year>
          , pp.
          <fpage>270</fpage>
          -
          <lpage>279</lpage>
          . doi:
          <volume>10</volume>
          .1007/11823728_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabbatini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Omicini,
          <string-name>
            <surname>GridEx:</surname>
          </string-name>
          <article-title>An algorithm for knowledge extraction from black-box regressors</article-title>
          , in: D.
          <string-name>
            <surname>Calvaresi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Najjar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Winikof</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Främling (Eds.),
          <source>Explainable and Transparent AI</source>
          and
          <string-name>
            <surname>Multi-Agent Systems</surname>
            . Third International Workshop, EXTRAAMAS 2021,
            <given-names>Virtual</given-names>
          </string-name>
          <string-name>
            <surname>Event</surname>
          </string-name>
          , May 3-
          <issue>7</issue>
          ,
          <year>2021</year>
          , Revised Selected Papers, volume
          <volume>12688</volume>
          <source>of LNCS</source>
          , Springer Nature, Basel, Switzerland,
          <year>2021</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>38</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -82017-
          <issue>6</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabbatini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Calegari</surname>
          </string-name>
          ,
          <article-title>Symbolic knowledge extraction from opaque machine learning predictors: GridREx &amp; PEDRO</article-title>
          , in: G.
          <string-name>
            <surname>Kern-Isberner</surname>
          </string-name>
          , G. Lakemeyer, T. Meyer (Eds.),
          <source>Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning</source>
          , KR 2022, Haifa,
          <source>Israel. July 31 - August 5</source>
          ,
          <year>2022</year>
          ,
          <year>2022</year>
          . URL: https://proceedings.kr.org/
          <year>2022</year>
          /57/. doi:
          <volume>10</volume>
          .24963/kr.2022/57.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>Machine learning - A probabilistic perspective, Adaptive computation and machine learning series</article-title>
          , MIT Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>