Measuring and controlling knowledge diversity
Yasser Bourahla1 , Jérôme David1 , Jérôme Euzenat1 and Meryem Naciri1
1
    Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000 Grenoble, France


                                         Abstract
                                         Assessing knowledge diversity may be useful for many purposes. In particular, it is necessary to measure
                                         diversity in order to understand how it arises or is preserved; it is also necessary to control it in order
                                         to measure its effects. Here we consider measuring knowledge diversity using two components: (a)
                                         a diversity measure taking advantage of (b) a knowledge difference measure. We present the general
                                         principles and various candidates for such components. We discuss how these measures may be used to
                                         generate populations of agents with controlled levels of knowledge diversity.

                                         Keywords
                                         Knowledge diversity, Diversity measure, Ontology dissimilarity, Diversity control, Entropy


1. Introduction
Agents may hold different knowledge. This characterises the knowledge diversity of an agent
population. In general, diversity is an important asset. It has been shown, in different contexts,
that groups of agents with diverse abilities have better problem solving skills those with high
abilities [1, 2, 3]. In an evolutionary context, diversity is considered to have influence on species
resilience [4].
   However, knowing that agents in a population are diverse does not tell how much diverse is the
population’s knowledge, nor which population has more diverse knowledge. There are various
reasons to assess agent populations’ knowledge diversity, in particular studies in social modelling.
Our own work aims at performing experiments for which we have to measure knowledge diversity
[5], because we want to characterise those factors that promote or inhibit diversity, and we have
to control knowledge diversity, because we want to characterise its influence on other factors,
e.g. robustness. Of course, the diversity that is measured should be the same as that which is
controlled.
   Hence we are in need of formal knowledge diversity models applicable to formal knowledge
representations [6]. In this paper, we focus on both measuring and controlling knowledge diversity
in a coherent way. Measuring knowledge diversity may be split in two components: (a) taking
advantage of proved diversity measures (b) relying on knowledge-specific structures, and in
particular ontology dissimilarities. We show how this can be achieved in an integrated way. We
do not provide the single best measure, but instead an ordered set of measures and methods which
balance accuracy and complexity. Controlling knowledge may be grounded on such measures.

The Eighth Joint Ontology Workshops (JOWO’22), August 15-19, 2022, Jönköping University, Sweden
$ Yasser.Bourahla@inria.fr (Y. Bourahla); Jerome.David@univ-grenoble-alpes.fr (J. David);
Jerome.Euzenat@inria.fr (J. Euzenat); Meryem.Naciri@inria.fr (M. Naciri)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   After stating the problem to solve (§2), we discuss related work addressing it (§3). We first
provide a concrete example of ontology dissimilarites based on the semantics of the representation
language and one specific way to categorise ontologies (§4). We then introduce a simple way to
measure knowledge diversity among agents based on such measures (§5). This view is refined
by segmenting the knowledge space into a priori categories and considering the distribution of
agents with respect to these (§6). Finally, we discuss simple ways to control knowledge diversity
among a population of agents with respect to the proposed measures (§7).


2. Problem statement
The problem which is considered here is to associate a number to the knowledge diversity of a
population of agents (measure) and to associate knowledge to a population of agents so that their
diversity is close to such a number (control). We only consider agent populations of the same
size.
   In principle, a population 𝐴 ∈ 𝒜 of agents is characterised by a set of features 𝐹 . Each 𝑎 ∈ 𝐴
may be considered a point in the space induced by ×𝐹 . In our case, there is a single, but complex,
feature 𝑓 which associates to each agent its knowledge (in this paper, we assume that it is an
ontology 𝑜 ∈ 𝒪 expressed in a description logic [7]).
   Given an agent population 𝐴, 𝑂𝐴 = {𝑓 (𝑎); 𝑎 ∈ 𝐴} is the multiset of their ontologies
representing the distribution of these ontologies in the population. If one replaces some of
these ontologies by others, then this multiset contains 0 or more occurrences of each ontology.
The number of different ontologies used by the population is noted |𝑂𝐴 | and the number of
occurrences of ontology 𝑜 in 𝑂 is #𝑜 (also called
                                                ∑︀ the abundance of 𝑜). By extension, the cardinal
of a distribution 𝑆 is noted #𝑆, e.g. #𝑂𝐴 = 𝑜∈𝑂 #𝑜 = |𝐴|.
   Accounting for diversity may be achieved by defining an index 𝛿 : 𝒜 → R of diversity within a
population or a partial order ⪯⊆ 𝒜 × 𝒜 indicating that a population is more diverse than another.
Such a diversity measure (𝛿) is an absolute measure within a population. It is also possible to
consider diversity across populations. For that purpose, it is sufficient to define an order relation
⪯⊆ 𝒜 × 𝒜 (a population’s knowledge is less diverse than another). When a diversity measure 𝛿
has been defined, it is easy to test if 𝛿(𝐴) ≤ 𝛿(𝐴′ ) and then decide that 𝐴 ⪯ 𝐴′ . It may also be
possible to define such an order directly.


3. Related work
With respect to this problem, three lines of related works may be considered: how this is
approached in biology, how this is approached in social sciences, and how knowledge difference
is measured in knowledge representation.
   Measuring biological diversity. In genetics, diversity is usually obtained by measuring a
distance between observed biological objects [8, 9]: (DNA) sequences (blast), genes or proteomes.
This measures the distance between individual characteristics or possibly species (represented
by one representative object or specimen). It has to be generalised to populations. In ecology,
many measures of diversity exist [10]. One important line of these is based on the probability of
a random individual to be in a category and rely on entropy [10, 11].
                                                    Amount of diversity
                                             Low        Moderate                 High


          Diversity properties    Variety


                                              (a)            (c)                    (d)

                                 Balance

                                              (d)            (f)                    (g)


                                 Disparity


                                              (a)            (b)                    (e)

Figure 1: Presentation of the diversity properties of [1] inspired by Figure 1 of [14]. 10 objects are
distributed within 5 different categories. Variety considers the number of instantiated categories,
balance, the evenness of the distribution, and disparity, the distance between the categories
(here taken as their linear position). These dimensions are independent.


   Measuring social diversity. Measuring social and human diversity may be a more controversial
issue [12]. Because the number of distinguishing features can be large, it is necessary to reduce
these to identifiable categories. This may be achieved by clustering [13] or by using predefined
questionnaires, localising people on a predefined space partition. [1] identifies three properties to
take into account in measuring diversity (see Figure 1) which may be sumarised by:
    • Variety: how many categories are represented?
    • Balance: how many representatives of each category are there?
    • Disparity: how different are these categories?
[14] provides a different interpretation of disparity as the way an amount of resources is shared
among a population depending on the feature values. Although, the latter is important in social
sciences, knowledge diversity may be better measured in terms of the three former ones.
   Knowledge measures. Concerning knowledge representation, we have the possibility to
directly access agent knowledge, like DNA sequences. Hence, it should be possible to measure
knowledge diversity [16]. [15] considers how to compute diversity by extracting data from graphs
called ‘heteroegenous information networks’. This could be applied directly to RDF graphs. It
consists of specifying random walks from which collecting the data and applying an entropy
measure on the probability distribution of the collected data. Beyond data, various measures have
been developed to compute distances or dissimilarities between knowledge representations. These
may be used to assess disparity. In particular, ontology dissimilarities have been designed based
on lexical (4.2 of [17]), syntactic [17, 18], vector-space (4.1 of [17]), alignment-based (using
explicit relations between ontologies [19]), instance-based [16], semantic (based on models and
         ⊤                      ⊤                ⊤            ⊤                           ⊤

  𝑝      𝑞     ¬𝑝         𝑝           ¬𝑝                𝑞            ¬𝑞            𝑞            ¬𝑞

𝑝⊓𝑞           ¬𝑝 ⊓ 𝑞                                                        𝑝⊓𝑞        ¬𝑝 ⊓ 𝑞


         ⊥                      ⊥                ⊥             ⊥                           ⊥
        (A)                    (B)              (C)           (D)                         (E)
Figure 2: Five ontologies to be compared. Each of the concept presented in these corresponds
to the named classes of the ontologies.


closure) [20]. Once measures between individual knowledge are available, classical descriptive
statistic measures can be used to aggregate them at the population level.
   We will build on such ontology dissimilarities to assess and control knowledge diversity in a
set of agents.


4. Knowledge dissimilarity and categories
In order to provide precise examples, this section introduces specific ways to measure how far
away knowledge representations can be and how they can be grouped into categories. These
techniques have been used to measure diversity in the experiments of [5]. However, this example
applies on a specific use case, in which knowledge is expressed as description logic ontologies and
diversity is based on the semantics of named classes. This is not supposed to be the only answer
to measuring knowledge differences. It is an example showing that knowledge can be taken
seriously, through its semantics, when measuring diversity. All techniques briefly mentioned in
Section 3, from the simplest to the most sophisticated, may be used instead.

4.1. Ontologies
Knowledge is expressed through a very limited description logic [7] based on a finite set of
properties 𝑃 ̸= ∅. For simplicity, the properties are considered Boolean, i.e. an object either has
a property 𝑝 ∈ 𝑃 or it does not. The grammar of class descriptions is:

                       𝐶 := ⊤ | ⊥ | ∃𝑝.⊤ | ∀𝑝.⊥ | 𝐶 ⊓ 𝐷 | 𝐶 ⊔ 𝐷 | ¬𝐶

Constraints on properties may be ∃𝑝.⊤ (noted 𝑝: the class of objects having property 𝑝) or ∀𝑝.⊥
(noted ¬𝑝, the class of objects not having property 𝑝). Named classes are defined through axioms
of the form ‘name ≡ description’. The set 𝐶(𝑜) of the named classes of ontology 𝑜 contains ⊤
(the class of all objects) and ⊥ (the empty class). We assume that ontologies follow the unique
name assumption: two equivalent classes cannot have different names.
   Figure 2 presents five simple such ontologies. We use them to illustrate dissimilarity between
ontologies and its use to compute diversity among agents.
       A    B    C      D     E                        C                     A    B     C     D     E
  A    0   1/3   2/3    1     1                                         A    0    4/7   2/7   3/7   5/7

  B         0    1/3    1     1                                         B         0     2/4   2/4   2/6

  C               0    1/3   2/3                B             D         C               0     2/4   2/6

  D                     0    1/3                                        D                     0     4/6

  E                           0                 A             E         E                           0
Figure 3: Semantic dissimilarity measures between the ontologies of Figure 2. Graph-based
(left) and named-class-based (right) and the transitive reduction of the subsumption graph ⟨𝑂, ⊑⟩
(middle).


4.2. Semantic dissimilarity between ontologies
In [5], the dissimilarity 𝑑 between two ontologies 𝑜 and 𝑜′ is defined as:

                                     |{(𝑐, 𝑐′ ) ∈ 𝐶(𝑜) × 𝐶(𝑜′ )|𝑜 ∪ 𝑜′ |= 𝑐 ≡ 𝑐′ }|
                   𝑑(𝑜, 𝑜′ ) = 1 −
                                                  𝑚𝑎𝑥(|𝐶(𝑜)|, |𝐶(𝑜′ )|)

i.e. the proportion of named classes in the largest ontology with no equivalent class in the other.
   If two ontologies are semantically equivalent, from the standpoint of their named classes, then
their dissimilarity will be 0. For instance, consider the ontology 𝐸 of Figure 2 in which ¬𝑝 ⊓ 𝑞 is
expressed as ¬(𝑝 ⊔ ¬𝑞), then its dissimilarity to 𝐸 will be 0.
   The use of this dissimilarity on the ontologies of Figure 2 is given in Figure 3 (right).

4.3. Graph-based dissimilarity between ontologies
It is also possible to derive a dissimilarity from the graph of a subsumption relation across
ontologies. The usual way to define subsumption across ontologies would be 𝑜 ⊑ 𝑜′ iff 𝑜′ ≡ 𝑜∪𝑜′
(∪ here denoting the union of the axioms). We may also base it, in the continuity of the previous
section on named class equivalence, i.e. an ontology subsumes another if it contains at least one
equivalent class for each class of the other. Formally:

                             𝑜 ⊑ 𝑜′ if ∀𝑐′ ∈ 𝑜′ , ∃𝑐 ∈ 𝑜; 𝑜 ∪ 𝑜′ |= 𝑐 ≡ 𝑐′

   Such a relation can structure 𝑂 in a graph ⟨𝑂, ⊑⟩ as the one featured at the centre of Figure 3.
   From this graph, the dissimilarity between two ontologies may be the length of the shortest
monotonous subsumption chain within the transitively reduced version of ⊑ normalised by the
longest such path +1 (with the measure set to 1 when no such path exists). The result is given,
for ontologies of Figure 2, on Figure 3 (left).

4.4. Semantic categories of ontology
A way to define categories from a set of ontologies is to group equivalent ontologies in one single
category. Different predicates (≡) may be used to describes what counts as same ontologies. This
may be based on strict syntactic equivalence, on named class equivalence, on equality of the sets
of models, etc. This predicate allows us to group ontologies, but may be used for other purposes.
   Finally, the dissimilarity defined between ontologies may be applied to the categories them-
selves. If they are compatible with the way to group ontologies into categories, i.e. ∀𝑜, 𝑜′ , 𝑜′′ ∈
𝑂, 𝑜 ≡ 𝑜′ ⇒ 𝑑(𝑜, 𝑜′′ ) = 𝑑(𝑜′ , 𝑜′′ ), then the dissimilarities defined above between two categories
is the dissimilarity between any members of these categories.
   This will be useful to integrate the dissimilarity 𝑑 between ontologies in more sophisticated
diversity measures (Section 6).
   So far, we have defined simple semantic measures between ontologies and have provided a
simple way to group ontologies. This does not tell us what the diversity of a set of agents is.


5. Measuring diversity as a dissimilarity
It is possible to consider that diversity can be measured by how far appart are the agents’
knowledge. The simplest measure would be to consider whether ontologies are different or not.
This can be generalised by introducing a dissimilarity between these ontologies introducing
gradation in the differences.

5.1. Knowledge diversity as different ontologies
The most basic way to assess the diversity of a population is to count the different ontologies
agents have. This requires to be able to identify when these ontologies are different.
   This is achieved with an equivalence predicate (≡) which can be a simple predicate to express
that two ontologies are the same (𝑜 ≡ 𝑜′ ) or different (¬(𝑜 ≡ 𝑜′ )). This equality may be a
syntactic or semantic equality, as discussed in Section 4.4. In fact, it can be any equivalence
relation.
   Predicates based on a dissimilarity 𝑑 between ontologies and a threshold 𝜃 such that any
pair of ontologies below the threshold are considered the same (𝑜 ≡ 𝑜′ iff 𝑑(𝑜, 𝑜′ ) ≤ 𝜃), is not
an equivalence relation in general. These may be used a posteriori to determine clusters of
ontologies.
   In order to aggregate this difference at the population level, it is possible to simply compute
the number of different ontologies with respect to the number of agents:

                                             |{𝑓 (𝑎); 𝑎 ∈ 𝐴}/≡ |
                                  𝛿≡ (𝐴) =
                                                      |𝐴|
In case of syntactic equivalence, the measure is:

                                                    |𝑂𝐴 |
                                         𝛿= (𝐴) =
                                                     |𝐴|

   Such an indicator 𝛿= , will provide the highest diversity measure: as soon as two ontologies are
different, even slightly, their difference will be counted maximally. A finer view of this may be to
compute a dissimilarity between ontologies instead of using a Boolean predicate.
5.2. Knowledge diversity as how ontologies are different
It is possible to define a similarity or dissimilarity between agents’ ontologies measuring how
alike or different they are. Here we consider a dissimilarity 𝑑 : 𝒪 × 𝒪 → R because it is
compatible with the boolean measure. It will return 0 when the ontologies are the same.
   Any dissimilarity (or similarity) between ontologies can be used, such as those presented in
Section 4 and those mentioned in Section 3.
   With respect to a population 𝐴, diversity may be assessed by agregating the dissimilarity
between their ontologies. This may be achieved with:

    • the average dissimilarity 𝛿𝛼𝑑 (𝐴) in the population (works as well with predicates);
    • the median 𝛿𝜇𝑑 (𝐴) of the dissimilarities (the set of dissimilarities is a multiset);
    • the span or diameter of the population, i.e. the largest dissimilarity: 𝛿∅ 𝑑 (𝐴).


   Table 1 shows results for distributions of ontologies. It can be observed that the diameter and
median are not very discriminative of different case. Similarly, average dissimilarity may return
the same value for quite different cases: there are many ways to distribute ontologies at the same
dissimilarity. Hence, it is useful to consider that for the same average dissimilarity, a lower the
standard deviation means that this dissimilarity is regularly shared and the population is more
diverse.
   We extend this line of reasoning to taking distribution of knowledge in different categories
instead of considering ontologies one by one.


6. Measuring diversity on a distribution
So far, we only considered dissimilarities between individual knowledge (or ontologies). When
the set of possible objects becomes larger, it is customary to group them into categories. Indeed,
each individual is usually different from the others but what counts as diversity is how specific
categories of individuals are represented. These categories group objects (ontologies) whose
difference is considered as insufficient to consider them different (hence diverse).
   There are two ways to determine such categories:

a priori diversity is measured based on a set of predefined categories that can be assigned to
      individuals. These categories are independent from the data.

a posteriori the categories are determined through how individuals may be grouped together.
     This may be determined through clustering or applying factorial analysis to the ontologies.
     Such categories are usually relative to the considered data.

We will work on a priori categories because it allows to define what kind of diversity is to be
measured (knowing what to look for) and not diversity that has to be discovered (returning what
is the most diverse). For that purpose we will use a set 𝐾 of categories. Each ontology belongs
to one and only one category. The number of agents 𝑎 ∈ 𝐴 whose ontology 𝑓 (𝑎) belong to a
category 𝑘 ∈ 𝐾 is denoted by #𝑘. Figure 1 shows the diversity of possible distributions of 10
objects in 5 categories presented along a linear order.
   For the sake of simplicity, we will consider the categories of Section 4.4. They have the
advantage that any ontology in these category may be taken as representing it, and, as already
discussed, its dissimilarity to another category will be the same as its dissimilarity to any element
of that other category. Hence, the set of category 𝐾 will be a set of ontologies 𝑂 and the
distribution of agents’ knowledge in this set will simply be 𝑂𝐴 .
   A measure of diversity in categories must take into account:

    • the number of filled categories, which correspond to the variety of [14] (Figure 1),
    • the distribution of objects in these categories, which correspond to separation (or intensity),
    • how far appart are such categories, which corresponds to the disparity (or spread), and
      which can be measured by a dissimilarity.

We consider two families of measures in this context.

6.1. Weighted average dissimilarity in a distribution
A non-structured distribution is one in which the set of categories is simply a set with no additional
structure. In such a case, two ontology within the same category will be counted as the same (0)
and two objects in different category will be counted as different (1). The diversity measure may
be computed with respect to the abundance of each category as:
                                         ∑︀
                                            𝑜∈𝑂𝐴 #𝑜 × (|𝐴| − #𝑜)
                               𝛿≡ (𝐴) =
                                               |𝐴| × (|𝐴| − 1)
   A structured distribution is defined when there exists a structure among the categories. This is
given by providing a dissimilarity 𝑑 between categories representing this structure.
   The diversity is then the average dissimilarity between the objects’ categories:
                                    ∑︀             ∑︀𝑜′ ̸=𝑜    ′        ′
                                     𝑜∈𝑂𝐴 #𝑜 × (      𝑜′ ∈𝑂𝐴 #𝑜 × 𝑑(𝑜, 𝑜 ))
                        𝛿𝛼𝑑 (𝐴) =
                                               |𝐴| × (|𝐴| − 1)
  This is the same measure as in Section 5.2 but applied to categories instead of individual
ontologies.
  The structure may have various interpretations:

    • The non-structured case may be obtained by taking the dissimilarity as the inverse of the
      identity matrix (0 on the diagonal, 1 otherwise).
    • The linear structure that corresponds to the alignment of the five categories in Figure 1.
      The dissimilarity corresponds to how many slots the two categories are appart.
    • Any dissimilarity between categories may be used and it can be based on syntactic or
      semantic cues. We will use the two semantic dissimilarities presented in Section 4.2 and
      4.3.
6.2. Entropy of a distribution
The distribution among classes can be considered as the probability #𝑜
                                                                     |𝐴| of obtaining an ontology
of category 𝑜 when drawing it randomly among 𝐴, also called relative abundance. The entropy
measures how much random this probability is, i.e. how much the category of a random individual
may be predicted. This seems a very good candidate to measure diversity and it has been used for
that purpose [11].
   On such a distribution, a parametric measure of diversity can be defined [10, 21]:
                                              ⎛                           ⎞ 1
                                                  #𝑜̸ =0 (︂         )︂𝑞       1−𝑞
                                                  ∑︁          #𝑜
                                  𝛿𝑞 (𝐴) = ⎝                              ⎠
                                                              |𝐴|
                                                  𝑜∈𝑂𝐴

It is the exponent of a general entropy measure. This measure has been extended to structured
distributions in which a similarity exists [22]. Such a similarity 𝑠 may be obtained from a
                                             ′
dissimilarity 𝑑 by taking 𝑠(𝑜, 𝑜′ ) = 𝑒−𝑑(𝑜,𝑜 ) .
                            ⎛                  ⎛                                                1
                                                                                        ⎞𝑞−1 ⎞ 1−𝑞
                                #𝑜̸ =0             #𝑜 ′ ̸=0
                                ∑︁       #𝑜 ⎝       ∑︁        #𝑜′                   ′
                  𝛿𝑞𝑑 (𝐴) = ⎝                ×                      × 𝑒−𝑑(𝑜,𝑜 ) ⎠            ⎠
                                         |𝐴|   ′
                                                              |𝐴|
                                𝑜∈𝑂𝐴              𝑜 ∈𝑂𝐴

The same dissimilarities between categories as above may be retained.
   Depending of the parameter 𝑞, called the order of diversity, a different measure is obtained
ranging from assigning more weight to the rarest category to assigning more weight to the
commonest category, with more balanced weighting around 1. Here we use 𝑞 = 2 which
corresponds to the inverse of the Gini-Simpson measure. The notion of ‘diversity profile’ (the
evolution of diversity with 𝑞) may be useful in defining a partial order between populations: a
population 𝐴 is more diverse than another 𝐴′ , 𝐴′ ⪯ 𝐴 if for any 𝑞, 𝛿𝑞𝑑 (𝐴) ≥ 𝛿𝑞𝑑 (𝐴′ ).
   This diversity measure is not normalised. It ranges within [1.0 + ∞]. When comparing a finite
set of distributions, it is possible to normalise it as:

                                                       𝛿𝑞𝑑 (𝐴) − 1
                                          ¯𝛿 𝑑 (𝐴) =
                                             𝑞               𝑑
                                                          𝛿˙ 𝑞 − 1
      𝑑
with 𝛿˙ 𝑞 the maximum value among the set.
   Contrary to average dissimilarities, diameter and entropy are independent from scale (adding
more agents in the same distribution will return the same measure).
   Table 1 provides the measure values for these different distributions. Concerning the semantic
measures, the five categories are considered as corresponding from left to right to the ontologies
(A–E) of Figure 2.
   As expected the diversity is minimum for distribution (a) for all measures. We also expect it
to be maximum for distribution (g). But this did not happen in the case of the linear distance in
which (e) has the highest diversity value. Actually (e) is polarised with two groups very far apart.
However it is not very diverse as only two categories are filled.
                   distribution           (a)    (b)   (c)    (d)    (e)    (f)    (g)
                            |𝐴|           10      10    10    10      10    10      10


                   Stats
                          |𝑂𝐴 |            1       2     3      5     2      5      5
                                 |𝑂𝐴 |
                                  |𝐴|      .1     .2    .3     .5     .2    .5      .5


                   Non struct.
                                    ∅      0       1     1      1     1      1      1
                                   𝛿𝜇𝑑    0.0    1.0   1.0   1.0     1.0   1.0     1.0
                                   𝛿𝛼𝑑     0    .56    .62   .67    .56    .82    .89
                                   ¯𝛿 𝑑   0.0    .45   .54   .60     .45   .86     1.0
                                      2
                                    ∅      0       2     2      4     4      4      4
                   Linear


                                   𝛿𝜇𝑑    0.0    2.0   1.0   1.0     4.0   1.0     1.0
                                   𝛿𝛼𝑑    0.0   1.11   .71   1.11   2.22   1.33   1.78
                                   ¯𝛿 𝑑   0.0    .43   .33   .48     .54   .70     1.0
                                      2
                                    ∅      0       1     1      1     1      1      1
                   Graph-B.


                                   𝛿𝜇𝑑    0.0    1.0   .33   .33     1.0   .33     .33
                                   𝛿𝛼𝑑    0.0    .56   .27   .37     .56   .47     .59
                                   ¯𝛿 𝑑   0.0    .78   .39   .56     .78   .74     1.0
                                      2
                                    ∅      0      .5    .5     .5   .71    .29    .71
                   Name-b.


                                   𝛿𝜇𝑑    0.0     .5    .5     .5    .29    .5      .5
                                   𝛿𝛼𝑑    0.0    .28   .31   .38     .16   .44     .46
                                   ¯𝛿 𝑑   0.0    .52   .61   .74     .30   .94     1.0
                                      2

Table 1
Measures corresponding to the distributions of Figure 1 with non structured, linearly structured
(with the A-B-C-D-E order), graph-based and named-class-based semantic categories. In each
case, is displayed the diameter, the median, the average dissimilarity and the normalised entropy-
based diversity (𝑞 = 2).


   The entropy-based diversity measures seems to be behaving better. They are also very sensitive
to the influence of the dissimilarity: there is no consensus between the orders induced by the
measures except for the extrema. For instance, the graph-based semantic diversity has the same
high values for (b) and (e) because the two categories which are filled have the same dissimilarity.
These measures also do not account very well for the empty categories: they have been designed
for applications for which most categories have some representations. Our example is likely too
small.
   In conclusion, it seems that entropy-based measures, by putting emphasis on equal dispersion
of objects in all categories is a good measure of diversity, and that a semantic dissimilarity
structures knowledge categories appropriately. However, it is not always the case that agents
ontologies present themselves as distributions. In this case, a semantic dissimilarity alone should
provide a good idea of diversity.


7. Controlling diversity
When one wants to observe experimentally the consequences of diversity, it is necessary to be
able to control it.
  The conceptually most straightforward way to achieve this is to determine the diversity measure
          1.                         1.                                           1.

                                                                                  18/20
                                     10/12
                                                                                               16/20

          4/6                                8/12                     14/20
                                                                                               12/20
                             6/12

                                                                      8/20


          0.                         0.                                           0.
Figure 4: Patterns of ontology substitution for |𝐴| = 3, 4, 5. Dashed transitions do not decrease
diversity.


to be used and to design agent populations and their knowledge which comply to different values
of these measures. This may be quite difficult because we already have two factors (agents and
knowledge) and knowledge itself may be diverse in a variety of ways. Synthesising artificial
ontologies is possible, but may result in non representative results.
   In our experiments [5], we observed that some factors, e.g. number of object descriptors, have
an influence on the diversity of the resulting knowledge. It would thus be tempting to control
these factors in order to obtain the desired diversity. However, this is not a good way to control
diversity because it would be very difficult to determine if the observed effects will be due to
diversity or to these factors themselves.
   One possible way to improve on that is to perform such an experiment and to stop the simulation.
What is obtained is a population of agents 𝐴 with a ‘roof’ diversity 𝛿˙ = 𝛿(𝐴). At that stage, if
each agent knowledge is replaced by the ontology of one specific agent, taken at random or as
the fittest knowledge for some measure, yielding distribution 𝐴′ , then 𝛿(𝐴′ ) will be minimal.
One further advantage of this procedure is that it compares distributions over the same set of
agents and the same set of ontologies. Hence all distributional measures may be used and may be
normalised without inconvenient. It is thus possible to obtain different distributions with diversity
ranging in [0..𝛿˙ ] by distributing not 1 but 𝑛 ∈ [1..|𝐴|] different ontologies among the agents.

7.1. Boolean approach
In the case of a boolean distance function, the natural approach is to replace an ontology by
another (already used by one agent).
  The diversity of a (multi)set of such ontologies is computed by 𝛿(𝐴) (with any of the 𝛿 defined
before).
  We consider a simple operation 𝑟 which consists of replacing one ontology by another ontology
           1.0                                                                                                       3/3


           .80                                                                                                       2.4/3

           .66                                                                                                       2/3

           .53                                                                                                       1.6/3


           0.0                                                                                                       0/3

Figure 5: The different distributions of three ontologies in three categories and the possible
transitions for 𝑟. Dashed transitions do not decrease diversity.


that is already present in the multiset. Applying iteratively this operation progressively reduces
the diversity in the multiset. These successive applications define a path in the graphs of Figure 4.
If one takes care to never apply an operation that increases diversity, then this path goes from the
top to (one of) the bottom(s).
   Figure 4 presents multisets of ontologies of the same cardinality, up to category permutation,
related by arrows representing the application of 𝑟; dashed arrows are those which do not decrease
diversity. This enable to define different levels of diversity that can be used for the experiments.
For the sake of simplicity, diversity is here measured by average dissimilarity in a non structured
set of categories. The more ontologies are initially in the set, the more levels can be defined.

7.2. Simple dissimilarity-based approach
It may be more interesting to control diversity based on non-boolean dissimilarities between
ontologies. The same type of reasoning may be applied with 𝛿 𝑑 .
   Hence, contrary to what appears in Figure 4, the levels do not depend on the number of
occurrences of ontologies only. Their respective dissimilarity measures will play an important
role and lead to a larger variety of diversity levels.
   Consider three ontologies 𝐴, 𝐵, and 𝐶 such that 𝑑(𝐴, 𝐵) = .8, 𝑑(𝐵, 𝐶) = 1. and 𝑑(𝐴, 𝐶) =
1.21 . These may be considered as ontologies 𝑜 ∈ 𝑂 or categories 𝑘 ∈ 𝐾. The development of
the pattern above on these three ontologies provide the result of Figure 5.
   The opportunities are here very limited due to the strict pattern followed for only three
ontologies by only three agents. Starting with four, the diagram becomes largely more complex
with more paths from top to bottom.
   Figure 4 and 5 show that the diversity levels are fixed and relatively more dense over .5 than

   1
       This averages to 1., but there is no obligation. Moreover, it still provides high level of diversity (all over 0.5).
below it. This may be a problem when one wants to control better the given level. However, this is
also very dependent on the diversity measure, hence unless one knows very precisely the meaning
of these levels, it may be better to rely only on the order between distributions. Moreover, on a
statistical basis if the goal is to establish the relation between a similarity measure and another
measure, it is not strictly necessary to have a regularly spaced sample.
   A simple algorithm for generating new sets of ontologies consists of computing the average
dissimilarity between each ontologies and all the others. Then replacing the one with the higher
dissimilarity with either (a) the most central (closer to the barycentre), or (b) the most central
with respect to the remaining ones. Such an algorithm would again define a path within the lattice
of possible ontology distributions.
   Increasing the ratio agent/ontologies and using more sophisticated measures, such as the
entropy-based ones, provide a reasonable way to collect samples of various diversity-levels.

7.3. Finely controlling diversity
The algorithm sketched in the previous section still does not allow to control finely the obtained
levels. If someone would like to obtain a set of ontologies for 1., .75, .5, .25 and 0. diversity. It
will not return exactly this. Worse, it will not return the best possible solution. It is possible to
express it as an optimisation problem. Indeed, the goal is to find, for each diversity level 𝑙, the
assignment # that provides the multiset 𝑂* whose diversity 𝛿(𝑂) is the closest to 𝑙, i.e.

                                  𝑂* = 𝐴𝑅𝐺𝑀 𝐼𝑁# |𝑙 − 𝛿(𝑂)|

under the constraints that:
                                                      ∑︁
                               ∀𝑜 ∈ 𝑂, #𝑜 ≥ 0 and           #𝑜 = |𝐴|
                                                      𝑜∈𝑂

   A very expensive algorithm for achieving this would be to compute all the replacement
combinations, to measure their diversity, and to retain those which are the closest to the expected
levels.


8. Conclusions
This work was motivated by the need to measure and control knowledge diversity. It seems to us
that the approach consisting in taking advantage of measures already defined to assess diversity
and adjoining them with measures already defined to assess knowledge difference is a promising
one. However, there are so many such measures to be combined that a careful examination of the
properties of these and their combinations is an interesting perspective. This would benefit from
carefully studying the interactions between axioms governing both components in the spirit of
[15, 11]. We briefly discussed one practical way to control knowledge diversity for the purpose
of experiments. This is a critical issue that has no simple solution. However, the problem has to
be tied to the knowledge diversity measure.
Code and data availability
All measures and algorithms have been implemented in Python and can retrieved from https:
//moex.inria.fr/software/kdiv/.


Acknowledgments
This work has been partially supported by the MIAI Knowledge communication and evolution
chair (ANR-19-P3IA-0003). We thank the reviewers for their careful reading and for helping
clarify some formulations.


References
 [1] A. Stirling, A general framework for analysing diversity in science, technology and society,
     Journal of the royal society—Interface 4 (2007) 707–719.
 [2] L. Hong, S. Page, Groups of diverse problem solvers can outperform groups of high-
     ability problem solvers, Proceedings of the national academy of sciences 101 (46) (2004)
     16385–16389.
 [3] D. Noble, M. Prates, D. Bossle, L. Lamb, Collaboration in Social Problem-Solving: When
     Diversity Trumps Network Efficiency, in: Proceedings of the Twenty-Ninth AAAI Confer-
     ence on Artificial Intelligence, AAAI’15, AAAI Press, 1277–1283, 2015.
 [4] D. Reed, R. Frankham, Population fitness is correlated with genetic diversity, Conservation
     biology 17 (2003) 230–237.
 [5] Y. Bourahla, M. Atencia, J. Euzenat, Knowledge improvement and diversity under
     interaction-driven adaptation of learned ontologies, in: U. Endriss, A. Nowé, F. Dignum,
     A. Lomuscio (Eds.), Proc. 20𝑡ℎ ACM international conference on Autonomous Agents and
     Multi-Agent Systems (AAMAS), London (UK), 242–250, URL http://www.ifaamas.org/
     Proceedings/aamas2021/pdfs/p242.pdf, 2021.
 [6] J. Goguen, Support for ontological diversity and evolution, URL https://cseweb.ucsd.edu/
     ~goguen/papers/onto-intgn.html, 2005.
 [7] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider, The description
     logic handbook: Theory, implementation and applications, Cambridge University Press, 2
     edn., 2007.
 [8] C. Aremu, Exploring statistical tools in measuring genetic diversity for crop improvement,
     in: M. Çalişkan (Ed.), Genetic diversity in plants, chap. 17, IntechOpen, 339–348, 2012.
 [9] M. Avolio, J. Beaulieu, E. Lo, M. Smith, Measuring genetic diversity in ecological studies,
     Plant ecology 213 (17) (2012) 1105–1115.
[10] L. Jost, Entropy and diversity, Oikos 113 (2) (2006) 363–375.
[11] T. Leinster, Entropy and diversity: the axiomatic approach, Cambridge university press,
     ISBN 9781108965576, URL https://arxiv.org/pdf/2012.02113.pdf, 2021.
[12] E. Cheng, How to measure diversity —mathematical theory gives some rigor to discussion
     of a sensitive social and political issue, Wall street journal 2017.
[13] H. Cooke, I. Keppo, S. Wolf, Diversity in theory and practice: a review with application to
     the evolution of renewable energy generation in the UK, Energy Policy 61 (2013) 88–95.
[14] D. Harrison, K. Klein, What’s the difference? Diversity constructs as separation, variety, or
     disparity in organizations, Academy of management review 32 (2007) 1199–1228.
[15] P. Ramaciotti Morales, R. Lamarche-Perrin, R. Fournier-S’niehotta, R. Poulain, L. Tabourier,
     F. Tarissan, Measuring diversity in heterogeneous information networks, Theoretical com-
     puter science 859 (2021) 80–115, URL https://pedroramaciotti.github.io/files/publications/
     2021_TCS.pdf.
[16] F. Giunchiglia, M. Fumagalli, On knowledge diversity, in: Proc. 4th International Workshop
     on Ontology Modularity, Contextuality, and Evolution (WOMoCoE), Graz (AT), URL
     http://ceur-ws.org/Vol-2518/paper-WOMOCOE2.pdf, 2010.
[17] J. David, J. Euzenat, Comparison between ontology distances (preliminary results), in: Proc.
     7th conference on international semantic web conference (ISWC), Karlsruhe (DE), vol.
     5318 of Lecture notes in computer science, 245–260, URL http://www.springerlink.com/
     content/cj22428300485784/, 2008.
[18] A. Mädche, S. Staab, Measuring Similarity between Ontologies, in: Proc. 13th International
     Conference on Knowledge Engineering and Knowledge Management (EKAW), vol. 2473
     of Lecture notes in computer science, Siguenza (ES), 251–263, 2002.
[19] J. David, J. Euzenat, O. Sváb-Zamazal, Ontology similarity in the alignment space, in:
     Proc. 9th international semantic web conference (ISWC), Shanghai (CN), 129–144, URL
     https://exmo.inria.fr/files/publications/david2010b.pdf, 2010.
[20] J. Euzenat, C. Allocca, J. David, M. d’Aquin, C. Le Duc, O. Sváb-Zamazal, Ontology
     distances for contextualisation, Tech. Rep. 3.3.4, NeOn, URL https://exmo.inria.fr/files/
     reports/neon-334.pdf, 2009.
[21] M. Hill, Diversity and evenness: a unifying notation and its consequences, Ecology 54 (2)
     (1973) 427–432, URL https://pedroramaciotti.github.io/files/publications/2021_TCS.pdf.
[22] T. Leinster, C. Cobbold, Measuring diversity: the importance of species similarity, Ecology
     93 (3) (2012) 477–489, URL https://www.maths.gla.ac.uk/~cc/pdf/Leinster2011.pdf.