<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>WOA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Symbolic Knowledge Comparison: Metrics and Methodologies for Multi-Agent Systems⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federico Sabbatini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christel Sirocchi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberta Calegari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering (DISI), Alma Mater Studiorum-University of Bologna</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Pure and Applied Sciences, University of Urbino Carlo Bo</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>25</volume>
      <fpage>8</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>In multi-agent systems, understanding the similarities and differences in agents' knowledge is essential for effective decision-making, coordination, and knowledge sharing. Current similarity metrics like cosine similarity, Jaccard similarity, and BERTScore are often too generic for comparing knowledge bases, overlooking critical aspects such as overlapping and fragmented boundaries, and varying domain densities. This paper introduces new specific similarity metrics for comparing knowledge bases, represented via symbolic knowledge. Our method compares local explanations of individual instances, preserving computational resources and providing a comprehensive evaluation of knowledge similarity. This approach addresses the limitations of existing metrics, enhancing the functionality and efficiency of multi-agent systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multi-agent systems</kwd>
        <kwd>Knowledge similarity</kwd>
        <kwd>Symbolic knowledge</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In multi-agent systems, the knowledge owned by each agent is the key to enabling autonomous
decision-making [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Agents rely on their individual and collective knowledge to steer and react
to complex environments, making understanding similarities and differences in their knowledge
a fundamental aspect of their interaction. This understanding is pivotal for different
applications within the system, including collaborative decision-making, knowledge sharing, and
coordination [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        To effectively compare the knowledge of different agents, similarity measurement metrics and
techniques are needed, allowing agents to assess the extent to which their knowledge overlaps,
identify areas of agreement and disagreement, and optimise their interactions accordingly [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Some examples include collaborative decision-making where agents can use similarity metrics to
evaluate the compatibility of their internal knowledge or beliefs. By comparing their knowledge
representations, agents can pinpoint areas of alignment and conflict, facilitating negotiations and
enabling collaborative decisions that capitalise on the strengths and complementary aspects of
each agent’s knowledge. As for knowledge sharing, similarity metrics help determine which
pieces of knowledge are most relevant to share amongst agents. High similarity scores indicate
significant overlap, suggesting that certain knowledge may be redundant if shared. In resource
allocation and task assignment, especially in multi-agent systems with diverse expertise and
capabilities, similarity metrics can match agents to tasks based on the similarity between their
knowledge and the task requirements. Similarity metrics also guide adaptive learning and
knowledge transfer processes by helping agents identify peers with similar knowledge profiles.
Also, similarity metrics can help identify common ground when agents encounter conflicting
beliefs or preferences. By focusing on areas of high similarity, agents can find mutually acceptable
solutions and build consensus, facilitating smoother negotiation and collaboration.
      </p>
      <p>
        State-of-the-art metrics to measure the similarity between two objects, e.g., vectors, exist,
such as cosine similarity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Jaccard similarity [5], and semantic similarity using contextual
embeddings like BERTScore [6]. However, these metrics are often generic distances, not specific
for comparing knowledge bases, and thus they do not take into account some fundamental aspects,
such as overlapping boundaries or fragmented, or regions with varying domain densities in their
measurements. These are aspects that should be considered to effectively measure similarity
between knowledge bases. Accordingly, in this paper, we define new similarity metrics designed
to overcome these limitations.
      </p>
      <p>Building on the premise that symbolic knowledge can be expressed through logical rules, e.g.,
approximated as hypercubes [7, 8], we propose similarity metrics for comparing two symbolic
knowledge bases in this form. Instead of comparing each rule with all possible others, which
would waste computational resources, our approach compares local explanations provided for the
same instances by two specific knowledge pieces 1. Since our approach considers explanations at
a local level and aggregates similarity measurements across instances, it offers a comprehensive
evaluation of knowledge similarity, addressing several limitations of existing metrics.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Explanation-Based Similarity Metrics</title>
      <p>In this section, we formally present four formulations of metrics designed to express the similarity
between pairs of knowledge pieces, highlighting their differences. These metrics are based on the
following assumption: the similarity between two distinct knowledge pieces can be assessed by
measuring the pairwise similarity of their local explanations when queried with the same instance.
The core idea is to gather these similarity measurements across a sufficiently large set of instances
and then aggregate them, for example, by averaging.</p>
      <p>Leveraging local explanations simplifies the similarity assessment for several reasons. For
instance, when comparing knowledge bases composed of predictive rules, it is necessary to develop
a strategy for comparing these rules. Comparing all possible pairs of rules is computationally
infeasible and often insignificant, as a rule may be similar to one but very different from all others
in the knowledge base. This issue can be mitigated by considering only pairs of
overlapping/adjacent/close rules, though defining formal notions for these concepts is complex. Exploiting local
explanations bypasses these challenges and provides reliable proxies for knowledge similarity,
enabling the creation of computationally feasible metrics.
1We point out here that we use the terms “instance”, “sample” and “individual” as synonyms representing a single
entry of the data set at hand</p>
      <sec id="sec-2-1">
        <title>2.1. Notation</title>
        <p>Let us briefly introduce the notation used in the following sections. Let  represent a data set
consisting of  pairs (, ), where  is an instance described by  input attributes (1 , 2 , . . . , ),
and  is the corresponding outcome. Formally:
Let  and  be the domains of the instances’ inputs and outputs, respectively:
Let us assume the existence of a predictive function  defined as follows:
 = {(1, 1), (2, 2), . . . , (, )} ,</p>
        <p>= (1 , 2 , . . . , ).
( ∈ ) ∧ ( ∈ ) ,</p>
        <p>∀ = 1, 2, . . . , .
 :  → ,</p>
        <p>() = ^,
where ^ is the value predicted by  for the instance . Any entity with predictive capabilities,
such as a machine learning model or symbolically represented knowledge, can be modelled
with  . In this paper, we consider without loss of generality the comparison between two
symbolic knowledge bases, e.g., knowledge encoded by domain experts or obtained with symbolic
knowledge-extraction (SKE) techniques applied to black-box machine learning predictors [9].
Nonetheless, our approach can be employed to compare any pair of objects providing local
explanations.</p>
        <p>Finally, let us assume the existence of an explaining function  , which maps data set instances
 to their corresponding local explanations   , derived by analysing  :</p>
        <p>:  → ,
 () =   ,  ∈   ⊆ ,
∀ = 1, 2, . . . , .</p>
        <p>Specifically,   defines a subregion within the domain  that encloses  and provides a local
explanation for the prediction  ().</p>
        <p>
          The similarity metrics introduced in the following subsections are represented by the operators
≈  , ∼  ,  , and ∼  ⊂ . These metrics are functions evaluated on the instances of a data set ,
≈ ⊂
mapping pairs of predictive functions 1 and 2 to the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] interval:
        </p>
        <p>
          ≈  , ∼  , ≈  ⊂ , ∼  ⊂ : ( → ) × ( → ) → [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ].
        </p>
        <p>The symbol ≈ signifies approximate equality between operands, while ∼ denotes similarity that
is not necessarily bound to equality. Additionally, the subscript ⊂ indicates asymmetric relations,
while its absence indicates symmetric ones. These symbols are used as infix binary operators
(e.g., 1 ∼  2).</p>
        <p>The specific task at hand determines the nuances of similarity assessment, which must satisfy
specific sets of properties. For example, when comparing a knowledge base encoded by human
experts with symbolic knowledge extracted via SKE, one may desire the latter to precisely align
with the expert knowledge, focusing on the exact coincidence between corresponding decision
boundaries. Conversely, a different scenario may involve replacing a knowledge base with a
newer, more accurate one without altering the decision boundaries that led to correct predictions
in the past (backward compatibility; [10]). In this case, the similarity assessment should only
consider the subset of data set instances that received the expected outcomes from the current
knowledge piece. Accordingly, we define two subsets of  suitable for similarity assessment
based on user needs: 1=2 and  .</p>
        <p>Definition 1 (Congruence set). The congruence set 1=2 is defined as the subset of  instances
receiving the same predictions from both 1 and 2, regardless of the prediction correctness:
1=2 = { | (, ) ∈  ∧ 1() = 2()} .</p>
        <p>Definition 2 (Backward compatibility set). The backward compatibility set  is defined as the
subset of  instances receiving correct predictions from  :</p>
        <p>= { | (, ) ∈  ∧  () = } .</p>
        <p>If necessary, the intersection of the two sets can be taken to fulfill both definitions:
Definition 3 (Backward-compatible congruence set). The backward-compatible congruence set
1,2 is defined as the subset of  instances receiving correct predictions from both 1 and 2:
1,2 = { | (, ) ∈  ∧ 1() = 2() = } .</p>
        <p>In the following we use the symbol ⋆ as a generic notation for any of these subsets.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Perfect Overlapping</title>
        <p>Two knowledge bases are considered perfectly overlapping if they provide identical local
explanations for the same queries. This definition can be extended to produce not just a Boolean indicator
of similarity, but also a numerical evaluation of the degree of overlap between the explanations
provided for a set of instances.</p>
        <p>Definition 4 (Perfect overlapping). Let 1, 2 be two functions representing the predictive
process of two symbolic knowledge bases and 1 , 2 be the explaining functions for 1 and 2,
respectively. The knowledge bases are perfectly overlapping if the explanations provided by them
⋆
for each instance  of ⋆ are perfectly overlapping, i.e., if 1 ≈ 2 = 1:</p>
        <p>⋆
1 ≈ 2 =
1
|⋆| ∈⋆  (1 () ∪ 2 ())
(a) Perfect overlapping (≈  ).
xi
xj if1 if2
xj if1 if2
if1
if2
xi
if1
if2
if1 if2
xi
xj if1
xj if1 if2
if1
if2
20
40
80
100
120
20
40
80
100
120
20
40
80
100
120
20
40
80
100</p>
        <p>120
(b) Sensitive overlapping (∼  ).</p>
        <p>(c) Perfect fragmentation (≈  ⊂ ).
20
40
80
100
120
20
40
80
100</p>
        <p>120
220
d
10
0</p>
        <p>0
30
220
d
10
0</p>
        <p>0
30
220
d
10
0</p>
        <p>0
30
220
d
10
0
0
60
d1
60
d1
60
d1
60
d1
220
d
10
0</p>
        <p>0
30
220
d
10
0</p>
        <p>0
30
220
d
10
0</p>
        <p>0
30
220
d
10
0
0
60
d1
60
d1
60
d1
60
d1
20
40
80
100
120
20
40
80
100
120
(d) Sensitive fragmentation (∼  ⊂ ).
220
d
10
0
0
220
d
10
0
0
60
d1
20
40
80
100
120
20
40
80
100</p>
        <p>120
60
d1
(a) Explanations with non-uniform density.
(b) Fragmented explanations.</p>
        <p>Definition 4 essentially represents an intersection-over-union operation, also known as Jaccard
similarity, applied to pairs of local explanations for each instance in a given data set. This
metric reaches a score of exactly 1 only if the explanations are perfectly equivalent, meaning
the intersection of the explanations matches their union. The calculation of ≈  is illustrated in
Figure 1a.</p>
        <p>While this definition serves as an ideal proxy for detecting perfect overlap between distinct
knowledge bases, it presents two potential issues. The first issue arises when applying the
knowledge bases to a data set with a non-homogeneous sample distribution, as illustrated in
Figure 2a. In the figure, explanations  1 and  2 for the instance  ∈  overlap but are not
identical. The intersection-over-union measurement between  1 and  
2 would yield a low score,
indicating low perfect overlap for 1 and 2. However, the intersection of the two explanations
includes the most relevant portion of the explanation union, specifically the region with the
highest sample density  ∈ . Thus, even if the explanations are not identical, the regions of
the input space within the union but outside the intersection may be considered less significant,
and therefore less penalising in the similarity assessment. This concept is formalised into a
scoring metric in Definition 5.</p>
        <p>The second issue arises when comparing a knowledge base providing many distinct
explanations with small coverage (1) to another generating fewer explanations with larger coverage
(2), as illustrated in Figure 2b. In the figure, two instances  and  ∈  have different local
explanations  1 and  1 , respectively, according to knowledge base 1, while they share the
same explanation  ⋆2 according to knowledge base 2. However,  ⋆2 essentially represents the
union of  1 and  1 . Thus, the perfect overlap score for 1 and 2 would be low, even though the
knowledge bases are nearly identical except for the fragmentation in the explanations provided
by 1. Since this fragmentation may not be a significant factor in assessing knowledge similarity
in some applications, we formalise a scoring metric accordingly in Definition 6.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Sensitive Overlapping</title>
        <p>The definition of perfect overlap can be relaxed to a notion of sensitive overlap, which measures
the extent to which a pair of knowledge bases produce overlapping explanations in the most
relevant subregions of the input space when queried with a set of instances.
sample density, i.e., if 1 ∼ 2 = 1:
Definition 5 (Sensitive overlapping). Let 1, 2 be two functions representing the predictive
process of two symbolic knowledge bases and 1 , 2 be the explaining functions for 1 and 2,
respectively. The knowledge bases are sensitively overlapping if the explanations provided by
them for each instance  o⋆f ⋆ are overlapping in the input space regions characterised by high
⋆
1 ∼ 2 =
1
|⋆| ∈⋆ | { |  ∈ 1 () ∪ 2 ()} |
.</p>
        <p>Definition 5 employs an intersection-over-union operation applied to pairs of local explanations,
evaluated based on the number of enclosed data set samples rather than the explanation volumes.
The metric reaches a score of 1 only if all data set samples are consistently enclosed within the
intersection of the examined explanations, even if their union is larger. The calculation of ∼  is
illustrated in Figure 1b.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Perfect Fragmentation</title>
        <p>The definition of perfect overlap can also be relaxed into a notion of perfect fragmentation,
representing the extent to which a knowledge base provides explanations that are perfectly
enclosed within broader explanations generated by a different knowledge base when both are
queried with the same instances.</p>
        <p>Definition 6 (Perfect fragmentation). Let 1, 2 be two functions representing the predictive
process of two symbolic knowledge bases and 1 , 2 be the explaining functions for 1 and
2, respectively. The knowledge base 1 is perfectly fragmented with respect to the knowledge
2 if the explanation provided by 1 for each instance  of ⋆ is entirely enclosed within the
⋆
explanation of 2 for the same instance, i.e., if 1 ≈ ⊂ 2 = 1:</p>
        <p>⋆
1 ≈ ⊂ 2 =
 (1 ())
where (· ) has the same meaning as in Definition 4.</p>
        <p>Definition 6 is consequently more adaptable than Definition 4, as it compares the intersection
between explanations with a single explanation rather than with their union.</p>
        <p>The computation of ≈  ⊂ , depicted in Figure 1c, is asymmetrical and results in a score of 1 only
if all the data set samples under consideration receive an explanation from the first operand that is
entirely enclosed within the broader explanation generated by the second operand.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Sensitive Fragmentation</title>
        <p>The concepts of sensitive overlap and perfect fragmentation can be combined into the definition
of sensitive fragmentation, which assesses the similarity by considering the extent to which
knowledge generates explanations whose most relevant subregion is perfectly enclosed within the
broader explanation provided by a different knowledge.</p>
        <p>Definition 7 (Sensitive fragmentation). Let 1, 2 be two functions representing the predictive
process of two symbolic knowledge bases and 1 , 2 be the explaining functions for 1 and 2,
respectively. The knowledge base 1 is sensitively fragmented with respect to the knowledge 2 if
the explanation provided by 1 for each instance  of ⋆ is overlapping with the corresponding
explanation provided by 2 in the input space regions characterised by high sample density, i.e.,
⋆
if 1 ∼ ⊂ 2 = 1:</p>
        <p>⋆
1 ∼ ⊂ 2 =</p>
        <p>Definition 7 represents the least rigid metric amongst those proposed in this study, as it
compares the intersection of explanations with a single explanation (rather than their union) and
evaluates based on the number of enclosed data instances (rather than volumes).</p>
        <p>The sensitive fragmentation metric is asymmetrical and yields a score of 1 only when all the
data set samples enclosed within the explanation provided by the first operand are also enclosed
within the explanation provided by the second operand, for every pair of explanations produced
for the considered data set instances. The calculation of ∼  ⊂ is illustrated in Figure 1d.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>We conducted experiments using the presented similarity metrics on the original Wisconsin Breast
Cancer data set [11], which comprises nine input features ranging from 1 to 10 and represents
cases of breast cancer. The data set’s binary output class indicates whether the tumour is benign
or malignant.</p>
      <p>Our objective was to compare existing knowledge bases in the literature with alternatives
obtained by applying SKE techniques. We compared these based on accuracy, readability, and
completeness of the knowledge pieces against the data set ground truth, through both individual
evaluations and the   aggregating scoring metric [12]. Additionally, we measured the
similarity between the extracted knowledge bases and those existing in the literature, taken as
reference, as described in this study.</p>
      <p>The knowledge bases used as reference are from the works of Duch et al. (2001) and Hayashi
and Nakano (2015). The corresponding classification rules are displayed in Listings 1 and 2,
respectively. Notably, both sets of rules base predictions on two input features: “Bare Nuclei”
(BN) and “Clump Thickness” (CT). Decision boundaries identified by these knowledge bases are
illustrated in Figure 3 as coloured rectangles.</p>
      <p>The GRIDEX SKE algorithm [15] was utilised to extract symbolic knowledge from a gradient
boosting (GB) classifier trained on the WBC data set. The hyper-parameters of the GB, namely
4</p>
      <p>6
Bare nuclei
8
10</p>
      <p>Benign
Malignant
Duch, Benign
Duch, Malignant
GridEx, Benign
GridEx, Malignant
Benign
Malignant
Duch, Benign
Duch, Malignant
GridEx, Benign
GridEx, Malignant
Benign
Malignant
Duch, Benign
Duch, Malignant
GridEx, Benign</p>
      <p>GridEx, Malignant</p>
      <p>6
Bare nuclei
8
10
(a) Duch, Adamczak, and Grabczewski + GRIDEX3.</p>
      <p>(b) Hayashi and Nakano + GRIDEX3.</p>
      <p>Benign
Malignant
Hayashi, Benign
Hayashi, Malignant
GridEx, Benign
GridEx, Malignant
Benign
Malignant
Hayashi, Benign
Hayashi, Malignant
GridEx, Benign
GridEx, Malignant
Benign
Malignant
Hayashi, Benign
Hayashi, Malignant
GridEx, Benign
GridEx, Malignant
4</p>
      <p>6
Bare nuclei
8
10
4</p>
      <p>6
Bare nuclei
8
10
(c) Duch, Adamczak, and Grabczewski + GRIDEX6.</p>
      <p>(d) Hayashi and Nakano + GRIDEX6.
4</p>
      <p>6
Bare nuclei
8
10
4</p>
      <p>6
Bare nuclei
8
10
(e) Duch, Adamczak, and Grabczewski + GRIDEX12.
(f) Hayashi and Nakano + GRIDEX12.</p>
      <p>Listing 1 Knowledge base adapted from Duch, Adamczak, and Grabczewski (2001) for the WBC
data set.</p>
      <p>M a l i g n a n t i f BN &gt; 5 . 5
M a l i g n a n t i f CT &gt; 6 . 5</p>
      <p>B e n i g n i f BN &lt;= 5 . 5 ∧ CT &lt;= 6 . 5
Listing 2 Knowledge base adapted from Hayashi and Nakano (2015) for the WBC data set.</p>
      <p>B e n i g n i f BN &lt;= 1 . 5
M a l i g n a n t i f BN &gt; 1 . 5 ∧ CT &gt; 4 . 5
M a l i g n a n t i f BN &gt; 6 . 5 ∧ CT &lt;= 4 . 5</p>
      <p>B e n i g n i f 1 . 5 &lt; BN &lt;= 6 . 5 ∧ CT &lt;= 4 . 5
Listing 3 Knowledge base extracted with GRIDEX3.</p>
      <p>B e n i g n i f BN &lt;= 4 . 0 ∧ CT &lt;= 5 . 5
M a l i g n a n t i f BN &gt; 4 . 0 ∧ CT &lt;= 5 . 5</p>
      <p>M a l i g n a n t i f CT &gt; 5 . 5
Listing 4 Knowledge base extracted with GRIDEX6.</p>
      <p>B e n i g n i f BN &lt;= 4 . 0 ∧ 4 . 0 &lt; CT &lt;= 7 . 0
M a l i g n a n t i f BN &lt;= 4 . 0 ∧ CT &gt; 7 . 0
M a l i g n a n t i f BN &gt; 7 . 0 ∧ CT &gt; 7 . 0
B e n i g n i f BN &lt;= 7 . 0 ∧ CT &lt;= 4 . 0
M a l i g n a n t i f BN &gt; 7 . 0 ∧ CT &lt;= 7 . 0</p>
      <p>M a l i g n a n t i f 4 . 0 &lt; BN &lt;= 7 . 0 ∧ CT &gt; 4 . 0
the number of estimators and learning rate, were tuned via a grid search and set to 133 and
0.2033, respectively. The GB was trained by using 75% of the WBC data instances as the training
set, while the remaining samples were reserved for the test set, resulting in an observed test
accuracy of 0.99 for the trained model. Several GRIDEX instances with varying parameters
were applied to the trained GB. We leveraged the implementation provided within the PSYKE
framework [16, 17, 18]. Three instances, employing an adaptive splitting strategy with a recursion
depth of 1, were chosen to generate knowledge bases containing 3, 6, and 12 items, respectively,
for subsequent analysis and comparison with reference knowledge pieces. These instances are
henceforth referred to as GRIDEX3, GRIDEX6, and GRIDEX12, respectively. The corresponding
classification rules are summarised in Listings 3 to 5, and their decision boundaries are depicted
in Figure 3 as hatched regions superimposed onto the reference knowledge boundaries.</p>
      <p>Quality evaluations regarding classification accuracy, human readability (quantified as
knowledge size), and completeness (represented by the percentage of covered test samples; [19]) for all
Listing 5 Knowledge base extracted with GRIDEX12.</p>
      <p>B e n i g n i f BN &lt;= 2 . 3 ∧ 4 . 6 &lt; CT &lt;= 6 . 4
B e n i g n i f 6 . 1 &lt; BN &lt;= 7 . 4 ∧ 6 . 4 &lt; CT &lt;= 8 . 2
M a l i g n a n t i f 6 . 1 &lt; BN &lt;= 7 . 4 ∧ CT &gt; 8 . 2
M a l i g n a n t i f BN &gt; 8 . 7 ∧ CT &gt; 8 . 2
M a l i g n a n t i f 6 . 1 &lt; BN &lt;= 8 . 7 ∧ CT &lt;= 2 . 8
B e n i g n i f BN &lt;= 3 . 6 ∧ 2 . 8 &lt; CT &lt;= 4 . 6
M a l i g n a n t i f 3 . 6 &lt; BN &lt;= 6 . 1 ∧ 2 . 8 &lt; CT &lt;= 4 . 6
B e n i g n i f BN &lt;= 6 . 1 ∧ CT &lt;= 2 . 8
M a l i g n a n t i f BN &gt; 8 . 7 ∧ CT &lt;= 8 . 2
M a l i g n a n t i f 7 . 4 &lt; BN &lt;= 8 . 7 ∧ CT &gt; 2 . 8
M a l i g n a n t i f 2 . 3 &lt; BN &lt;= 7 . 4 ∧ 4 . 6 &lt; CT &lt;= 6 . 4</p>
      <p>M a l i g n a n t i f BN &lt;= 6 . 1 ∧ CT &gt; 6 . 4
knowledge pieces utilised in the experiments are detailed in Table 1 for the test set (with the best
values highlighted in bold). Additionally, we introduced the   metric with a
fidelity/readability trade-off parameter  = 6 (6- ) to offer a more concise quality assessment. It is
worth noting that high-quality knowledge corresponds to low   scores, and higher  values
prioritise predictive accuracy while disregarding readability. Furthermore, similarity assessments
between the extracted and reference knowledge bases, evaluated on the backward-compatible
congruence set (Definition 3), are summarised in Table 2. It is important to mention that the
robustness of the results presented in Tables 1 and 2 has been confirmed through 10-fold
crossvalidation repeated 10 times. Given the low variability of the results, we provide here only the
average values without their corresponding standard deviations.</p>
      <p>Amongst the reference knowledge bases considered, the one presented by Duch et al. exhibits
identical accuracy and coverage compared to that offered by Hayashi and Nakano. However, the
former comprises fewer items, leading to knowledge that is more readable and consequently of
higher quality. This is evidenced by a lower 6-  score, as anticipated.</p>
      <p>Similarly, amongst the extracted knowledge bases, the one provided by GRIDEX6 emerges as
the best according to the   score, boasting the highest accuracy and completeness, despite
its suboptimal size. Indeed, any loss in human readability is offset by the gain in accuracy. This
Similarity measurements between the GRIDEX outputs (1) and the reference knowledge bases
(2).
knowledge piece also outperforms the reference ones for the same reasons. Upon comparing the
knowledge bases of GRIDEX6 (1) and Duch et al. (2), it becomes apparent that their decision
boundaries are not identical. This enables GRIDEX to achieve superior predictive performance.
=Ho0w.3e4v.erG, itvheendtivheerdsiitsytriinbubtoiounndoafrideastaissreetflescatemdpilnesa alocwrospsertfheectc-loavsesrificlaaptipoinngruslceosr,et:hes1ens≈1it,iv2e-2
overlapping score is also low: 1 ∼1,2 2 = 0.50. From these perspectives, the knowledge bases
are dissimilar, and substituting the reference knowledge with the extracted one does not ensure
coherence and continuity.</p>
      <p>A more thorough examination of the disparities between the two knowledge bases reveals that
their boundaries differ because GRIDEX6 comprises more, albeit smaller, classification rules.
Consequently, the perfect-fragmentation score is elevated, as the GRIDEX6 rules are
predominantly enclosed within the reference ones: 1 ≈ 1⊂,2 2 = 0.84. Given that non-overlapping rule
regions typically exhibit a sparse sample density, it follows that the sensitive-fragmentation score
is even higher: 1 ∼ 1⊂,2 2 = 0.99. Hence, continuity and coherence are maintained if preserving
boundaries is not imperative.</p>
      <p>In summary, aligning with the visual examination, it can be concluded that the knowledge base
of GRIDEX6 does not exhibit perfect or sensitive overlap with the reference provided by Duch
et al.. However, it is nearly perfectly fragmented compared to the latter, earning an optimal score
when assessing its degree of sensitive fragmentation.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and conclusion</title>
      <p>Our study introduces novel similarity metrics tailored for comparing knowledge bases within
multi-agent systems, addressing the critical need of effective knowledge assessment in
autonomous decision-making environments. Through our experiments and analyses, we have
demonstrated the practical applicability and significance of these metrics in enhancing various
aspects of multi-agent systems’ functionality and efficiency.</p>
      <p>The results of our experiments highlight the utility of our proposed similarity metrics in
facilitating collaborative decision-making, knowledge sharing, coordination, and resource allocation
within multi-agent systems. Specifically, our metrics enable agents to assess the compatibility of
their internal knowledge, identify areas of agreement and conflict, and optimise their interactions
accordingly. Moreover, they aid in determining which pieces of knowledge are most relevant
to share amongst agents and match agents to tasks based on their knowledge profiles and task
requirements.</p>
      <p>In conclusion, our study underscores the importance of effective knowledge assessment in
multi-agent systems and introduces novel similarity metrics designed to meet this need. By
leveraging these metrics, agents can better understand the landscape of their collective knowledge,
leading to more effective collaboration, decision-making, and coordination within the system. We
believe that our proposed metrics represent a significant step forward in advancing the capabilities
of multi-agent systems and pave the way for further research and development in this field.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been supported by the EU ICT-48 2020 project TAILOR (No. 952215) and the
European Union’s Horizon Europe AEQUITAS research and innovation programme under grant
number 101070363.
[5] A. H. Murphy, The Finley affair: A signal event in the history of forecast verification,</p>
      <p>Weather and forecasting 11 (1996) 3–20.
[6] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, Bertscore: Evaluating text
generation with BERT, in: 8th International Conference on Learning Representations,
ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, OpenReview.net, 2020. URL:
https://openreview.net/forum?id=SkeHuCVFDr.
[7] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Hypercube-based methods for symbolic
knowledge extraction: Towards a unified model, in: A. Ferrando, V. Mascardi (Eds.),
WOA 2022 – 23rd Workshop “From Objects to Agents”, volume 3261 of CEUR Workshop
Proceedings, Sun SITE Central Europe, RWTH Aachen University, 2022, pp. 48–60. URL:
http://ceur-ws.org/Vol-3261/paper4.pdf.
[8] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Towards a unified model for symbolic
knowledge extraction with hypercube-based methods, Intelligenza Artificiale 17 (2023)
63–75. URL: https://doi.org/10.3233/IA-230001. doi:10.3233/IA-230001.
[9] G. Ciatto, F. Sabbatini, A. Agiollo, M. Magnini, A. Omicini, Symbolic knowledge extraction
and injection with sub-symbolic predictors: A systematic literature review, ACM Computing
Surveys 56 (2024) 161:1–161:35. URL: https://doi.org/10.1145/3645103. doi:10.1145/
3645103.
[10] J. L. Haggerty, R. J. Reid, G. K. Freeman, B. H. Starfield, C. E. Adair, R. McKendry,</p>
      <p>Continuity of care: A multidisciplinary review, Bmj 327 (2003) 1219–1221.
[11] W. H. Wolberg, O. L. Mangasarian, Multisurface method of pattern separation for medical
diagnosis applied to breast cytology., Proceedings of the national academy of sciences 87
(1990) 9193–9196.
[12] F. Sabbatini, R. Calegari, Symbolic knowledge-extraction evaluation metrics: The FiRe
score, in: K. Gal, A. Nowé, G. J. Nalepa, R. Fairstein, R. Ra˘dulescu (Eds.),
Proceedings of the 26th European Conference on Artificial Intelligence, ECAI 2023, Kraków,
Poland. September 30 – October 4, 2023, 2023. URL: https://ebooks.iospress.nl/doi/10.
3233/FAIA230496. doi:10.3233/FAIA230496.
[13] W. Duch, R. Adamczak, K. Grabczewski, A new methodology of extraction, optimization
and application of crisp and fuzzy logical rules, IEEE Transactions on Neural Networks 12
(2001) 277–306. URL: https://doi.org/10.1109/72.914524. doi:10.1109/72.914524.
[14] Y. Hayashi, S. Nakano, Use of a recursive-rule extraction algorithm with J48graft to achieve
highly accurate and concise rule extraction from a large breast cancer dataset, Informatics
in Medicine Unlocked 1 (2015) 9–16.
[15] F. Sabbatini, G. Ciatto, A. Omicini, GridEx: An algorithm for knowledge extraction
from black-box regressors, in: D. Calvaresi, A. Najjar, M. Winikoff, K. Främling (Eds.),
Explainable and Transparent AI and Multi-Agent Systems. Third International Workshop,
EXTRAAMAS 2021, Virtual Event, May 3–7, 2021, Revised Selected Papers, volume
12688 of LNCS, Springer Nature, Basel, Switzerland, 2021, pp. 18–38. doi:10.1007/
978-3-030-82017-6_2.
[16] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, On the design of PSyKE: A platform
for symbolic knowledge extraction, in: R. Calegari, G. Ciatto, E. Denti, A. Omicini,
G. Sartor (Eds.), WOA 2021 – 22nd Workshop “From Objects to Agents”, volume 2963
of CEUR Workshop Proceedings, Sun SITE Central Europe, RWTH Aachen University,
2021, pp. 29–48. 22nd Workshop “From Objects to Agents” (WOA 2021), Bologna, Italy,
1–3 September 2021. Proceedings.
[17] R. Calegari, F. Sabbatini, The PSyKE technology for trustworthy artificial intelligence
13796 (2023) 3–16. URL: https://doi.org/10.1007/978-3-031-27181-6_1. doi:10.1007/
978-3-031-27181-6_1, xXI International Conference of the Italian Association for
Artificial Intelligence, AIxIA 2022, Udine, Italy, November 28 – December 2, 2022,
Proceedings.
[18] F. Sabbatini, G. Ciatto, R. Calegari, A. Omicini, Symbolic knowledge extraction from
opaque ML predictors in PSyKE: Platform design &amp; experiments, Intelligenza Artificiale
16 (2022) 27–48. URL: https://doi.org/10.3233/IA-210120. doi:10.3233/IA-210120.
[19] F. Sabbatini, R. Calegari, On the evaluation of the symbolic knowledge extracted from
black boxes, AI and Ethics 4 (2024) 65–74. doi:https://doi.org/10.1007/
s43681-023-00406-1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Marik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechoucek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Štepánková</surname>
          </string-name>
          ,
          <article-title>Social knowledge in multi-agent systems</article-title>
          ,
          <source>MultiAgent Systems and Applications: 9th ECCAI Advanced Course, ACAI 2001 and Agent Link's 3rd European Agent Systems Summer School</source>
          , EASSS 2001 Prague, Czech Republic,
          <source>July 2-13, 2001 Selected Tutorial Papers</source>
          <volume>9</volume>
          (
          <year>2001</year>
          )
          <fpage>211</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Panzarasa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Jennings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <article-title>Formalizing collaborative decision-making and practical reasoning in multi-agent systems</article-title>
          ,
          <source>Journal of logic and computation 12</source>
          (
          <year>2002</year>
          )
          <fpage>55</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harispe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ranwez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Janaqi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Montmain</surname>
          </string-name>
          ,
          <article-title>Semantic measures for the comparison of units of language, concepts or instances from text and knowledge base analysis</article-title>
          ,
          <source>arXiv preprint arXiv:1310.1285</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          , Introduction to information retrieval, Cambridge University Press,
          <year>2008</year>
          . URL: https://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf. doi:
          <volume>10</volume>
          . 1017/CBO9780511809071.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>