1. Introduction

Towards Conceptual Clustering in EL with Simulation Graphs

Ruud van Bakel

Michael Cochez

Patrick Koopmann

0 0 Vrije Universiteit Amsterdam , De Boelelaan 1105, 1081 HV Amsterdam , The Netherlands

2025

38 3 6

We introduce ℰℒ clustering, a type of conceptual clustering in which each cluster is described by an ℰℒ concept. The cluster then contains all instances of that concept. This contribution can be seen as a form of unsupervised learning of ℰℒ concepts from data, useful for analyzing graph-based data as well as for ontology engineering. Towards a practical method for ℰℒ clustering, we introduce complete simulation graphs, a structure from which all possible ℰℒ clusterings of the given data can be extracted. From this structure, a good ℰℒ clustering is then selected based on a utility function. We evaluate a first prototypical implementation of this idea on ABoxes of ontologies from a known benchmark, and show that bounded simulation graphs and ℰℒ clusterings can often be computed in practice.

eol>unsupervised learning conceptual clustering summarization

1. Introduction

One of the central functions of an ontology is to specify the meaning of concepts from some domain of interest. In the case of ontologies formalized in description logics (DLs), these concepts are specified through axioms, usually producing a subsumption hierarchy of concepts from more general to more specific ones. Such an ontology can then be used in combination with a dataset (an ABox), and a reasoner can infer which objects in the data belong to which concepts defined in the ontology, thus organizing these objects into diferent categories of difering specificity. This makes ontologies useful for organizing and accessing data for many use cases. For example, consider a digital health record containing information about patients and what they are treated for in a hospital. Using a medical ontology, we can group patients into more high-level categories and e.g. find out how many patients are treated for heart diseases.

A major downside of ontologies is that they need to be formalized first, which requires the right expertise and can be a laborous process. As a solution, we propose to use conceptual clustering as an approach to mine DL concepts directly from data. This may be useful for diferent reasons: 1) the obtained concepts could be relevant concepts that should be added to an ontology, and thus support ontology engineering, and 2) they can help analyzing and exploring data that are organized in graphs (e.g. knowledge graphs), by organizing the objects in the data into categories that can also be described in an intuitive way.

A pletora of research considers learning of DL descriptions, though not much considers the precise setting we have in mind. Approaches for active learning, e.g. following Angluin’s exact learning approach, assume that the learning system is interacting with a teacher (e.g. a domain expert) to whom it can send queries and current formalizations [ 1, 2 ]. Another line of research considers concept learning: here, we start with a set of individuals from the data grouped into positive and negative examples, and try to learn a DL concept that describes all of the positive and none of the negative examples [ 3, 4, 5, 6, 7 ]. Both types of learning are supervised in the sense that additional information from a domain expert is required.

In unsupervised learning, the goal is to learn axioms or concepts with less user input from the data. Unsupervised learning of GCIs is investigated in [ 8 ] using an induction-based approach. A limitation of this approach is that a set of interesting concepts has to be known beforehand, which arguably makes this approach less unsupervised. D’Amato et al. [ 9 ] developed a method for conceptual clustering to automatically generate hierarchies of ℒ concepts from data. The idea of conceptual clustering goes back to [ 10, 11 ]. Early systems such as COBWEB [12] cluster data represented as vectors with values for diferent features into a hierarchy of clusters that can be described based on their features. Clusters can then be described by “concepts” that associate to the diferent features a probability distribution of the weights. This is similar to classical clustering, where objects are typically clustered based on distance measures. However, instead of distance measures, usually a utility criterion is used that takes the quality of the cluster description and its predictive power into account. Moreover, the expressivity of the description language determines which clusters are possible, since every entity in the cluster must be described by the concept. D’Amato et al. follow this idea using the more expressive ℒ to describe the clusters. Unfortunately, the method does not scale well on larger data sets.

Moving to the more scalable DL ℰ ℒ, a line of research considers unsupervised learning following the idea of formal concept analysis (FCA)[13, 14, 15, 16]. Those works focus on finding a finite base of axioms that fully characterize a given interpretation, i.e. cover all ℰ ℒ axioms that hold in this interpretation. ℰ ℒ provides nice model-theoretical properties that allow to use techniquess such as products and simulations to more eficiently determine concepts that are relevant in a given interpretation. [ 17] uses the ideas from these FCA methods to apply a form of conceptual clustering for ℰ ℒ with bounded role depth. The method was able to cluster 15 objects on a knowledge graph with 1,216 triples into ℰ ℒ concepts with role depth two, but could not scale beyond. In this paper, we want to better understand the general feasibility of such a approach, using more closely the original idea of conceptual clustering used in [ 9 ], using an ad-hoc, but well-motivated, first utility measure.

Diferent from classical clustering approaches on graphs that first translate nodes into feature vectors, e.g. through embeddings [18, 19, 20], our conceptual clustering approach directly works on the graph structure, which adds to the transparency of the clustering result. The central idea is to use simulation graphs, a compressed representation of all possible clusterings of a given ABox. Similar structures were also used in works on FCA with ℰ ℒ. To obtain scalability, we used an optimized algorithm to summarize ABox based on simulation—identifying objects that cannot be distinguished by ℰ ℒ concepts. An extended version of the paper with additional details is provided together with the experimental data on Zenodo [21].

2. Preliminaries 2.1. The Description Logic ℰ ℒ

Fix three countably infinite, pair-wise disjoint sets NC, NR and NI of respectively concept names, roles and individuals. Concept names and roles correspond to unary and binary predicates in first-order logic, and individuals to constants. An ABox is a set of ground atoms called assertions, which are of the forms () and (, ), where ∈ NC, ∈ NR and , ∈ NI. We use NI()/NC()/NR() to refer to the individuals/concept names/roles in . ℰ ℒ concepts are built based on the syntax rule ::= ⊤ | | ⊓ | ∃., where ∈ NC and ∈ NR. The depth of an ℰ ℒ concept is the maximal nesting depth of role restrictions ∃. occurring in it. The semantics of DLs is defined using first-order interpretations, e.g. tuples ℐ = ⟨Δℐ , · ℐ ⟩ with a non-empty domain Δℐ and an interpretation function · ℐ assigning to every individual ∈ NI an element ℐ ∈ NI, every concept name ∈ NC a set ℐ ⊆ Δℐ , and to every role ∈ NR a relation ℐ ⊆ Δℐ × Δℐ . The interpretation function is lifted to ℰ ℒ concepts as follows: ⊤ℐ = Δℐ , ( ⊓ )ℐ = ℐ ∩ ℐ , and (∃.)ℐ = { ∈ Δℐ | ∃⟨, ⟩ ∈ ℐ s.t. ∈ ℐ }. An individual ∈ NI satisfies a concept in ℐ, in symbols ℐ |= (), if ℐ ∈ ℐ . ℐ is a model of an ABox , in symbols ℐ |= , if ℐ ∈ ℐ for every () ∈ , and ⟨ℐ , ℐ ⟩ ∈ ℐ for every (, ) ∈ . TBoxes or ontologies put additional constraints on an interpretation—an interpretation that satisfies these constraints is then a model of this TBox. The syntax is not relevant in this paper. A knowledge base (KB) is the union ∪ of a TBox and an ABox. A model of such a KB is an interpretation that is a model for both and . Note that a TBox and an ABox on their own is already a KB. If an assertion is satisfied in every model of a KB , we write |= and say is entailed by . In the special case where = (), we say that is an instance of in .

Every ABox has a unique minimal model that has one domain element per individual in the ABox which can be embedded into every other model of the ABox. For convenience, we may sometimes identify ABoxes with their minimal models.

2.2. Model Theory

Relevant to this work are also the notions of simulations , bisimulations and products of interpretations, which characterize the expressivity of DLs.

A pointed interpretation is a tuple ⟨ℐ, ⟩ of an interpretation ℐ and a domain element ∈ Δℐ . Given two pointed interpretations ⟨ℐ, ⟩ and ⟨ , ⟩, a simulation from ⟨ℐ, ⟩ into ⟨ , ⟩ is a relation ⪯ ⊆ Δℐ × Δ that satisfies: 1. ⪯ , 2. if ′ ∈ ℐ and ′ ⪯ ′, then ′ ∈ , 3. if ⟨1, 2⟩ ∈ ℐ and 1 ⪯ 1, then there must be 2 ∈ Δ s.t. 2 ⪯ 2 and ⟨1, 2⟩ ∈ . We sometimes just write ⟨ℐ, ⟩ ⪯ ⟨ , ⟩ to indicate that there exists such a simulation, without indicating the precise ⪯ . If Conditions 2 and 3 also hold for the inverse relation, then ⪯ is called a bisimulation. Simulations characterize the expressivity of ℰ ℒ model-theoretically: for countable interpretations, if there is a simulation from ⟨ℐ, ⟩ into ⟨ , ⟩, then for every ℰ ℒ concept , ∈ ℐ implies ∈ . In the same way, bisimulations characterize the expressivity of the classical DL ℒ, which plays only a minor role in this paper and is therefore not introduced beyond this fact. For elements , in an interpretation ℐ, we say that is (bi-)similar to if there exists a (bi-)simulation ⪯ s.t. ⟨ℐ, ⟩ ⪯ ⟨ℐ , ⟩. (Note that similarity is in general not symmetric, contrary to what the name suggests. Bisimilarity on the other hand forms an equivalence relation.)

The product of two pointed interpretations ⟨ℐ1, ⟩ and ⟨ℐ2, ⟩ is a pointed interpretation ⟨ , ⟨, ⟩⟩ defined inductively as the smallest interpretation that satisfies the following: 1. ⟨, ⟩ ∈ Δ , 2. for every ⟨′, ′⟩ ∈ Δ s.t. ′ ∈ ℐ1 and ′ ∈ ℐ2 , we have ⟨′, ′⟩ ∈ , 3. for every ⟨1, 1⟩ ∈ Δ s.t. ⟨1, 2⟩ ∈ ℐ1 and ⟨1, 2⟩ ∈ ℐ2 , there is ⟨2, 2⟩ ∈ Δ and ⟨⟨1, 1⟩, ⟨2, 2⟩⟩ ∈ .

The construction of ⟨ , ⟨, ⟩⟩ ensures that ⟨ , ⟨, ⟩⟩ is a maximal interpretation such that ⟨ , ⟨, ⟩⟩ ⪯ ⟨ℐ 1, ⟩ and ⟨ , ⟨, ⟩⟩ ⪯ ⟨ℐ 2, ⟩. As a consequence, ⟨, ⟩ satisfies exactly those ℰ ℒ concepts that both and satisfy in their respective interpretations. If Δℐ1 and Δℐ2 are both finite, then |Δ | ≤ | Δℐ1 | · | Δℐ2 |.

Lemma 1. Let ⟨ , ⟨, ⟩⟩ be the product of two pointed interpretations ⟨ℐ1, ⟩ and ⟨ℐ2, ⟩. Then, for every ℰ ℒ-concept such that ∈ ℐ1 and ∈ ℐ2 , ⟨, ⟩ ∈ . Moreover, ⟨, ⟩ satisfies no other concepts in . 3. ℰ ℒ Clusterings and Simulation Graphs This paper is about computing ℰ ℒ clusterings: Definition 1. Let be a KB. An ℰ ℒ clustering for is a finite set of pairs ⟨, A⟩ where is an ℰ ℒ concept and A is the set of instances of in . We call the pairs ⟨, A⟩ ℰ ℒ clusters.

Note that this definition is highly unspecific, and even allows empty clusters or clusters that only difer in the concept but not in the set of individuals. For example, ℰ ℒ clusters may be pair-wise disjoint, or they may form a (subsumption-)hierarchy. We also do not require the clustering to cover all individuals in the ABox, to allow for robustness against outliers. We will come to the question of what constitutes a good ℰ ℒ clustering later.

At the center of our approach is the simulation graph, which is a structure that abstracts ABoxes into diferent sets of individuals in a way that preserves simulations. Fix an ABox . Given sets A, B ⊆ , we write A→− B if for every ∈ A, there exists ∈ B s.t. (, ) ∈ .

Definition 2 (Simulation graphs). Let be an ABox. A simulation graph for is a directed labeled graph = ⟨, , , ⟩ with vertex-labeling and edge labeling s.t. ⊆ 2NI(), : → 2NC and : → 2NR , and which satisfies the following conditions: 1. (A) = { ∈ NC | () ∈ for all ∈ A}, 2. if A→− B, then ⟨A, B⟩ ∈ with (⟨A, B⟩) = { ∈ NR | A→− B}, 3. for every ∈ NI() and A ∈ , if () ∈ for every ∈ (A) and {}→− ⟨A, B⟩ ∈ s.t. ∈ (⟨A, B⟩), then ∈ A.

B for every

The vertices in the simulation graphs correspond to sets of individuals from the ABoxes, with edges between vertices if there is a role connection between the corresponding individuals. The last condition ensures that the graph, intuitively, groups individuals based on simulations. Since simulation graphs label vertices with concept names and edges with roles, we can identify them with corresponding interpretations, and lift the definition of products and simulations to also include simulation graphs. Our definition then ensures that for all A ∈ , A contains exactly those individuals ∈ NI() for which ⟨, A⟩ ⪯ ⟨ , ⟩. This allows us to establish a connection between simulation graphs and ℰ ℒ clusterings: Lemma 2. One can extract from every simulation graph for an ℰ ℒ clustering for that has an ℰ ℒ cluster ⟨, A⟩ for every node A in and no other clusters. Moreover, for every ℰ ℒ clustering for , there exists a simulation graph for that has a node A for every cluster ⟨, A⟩ ∈ .

Note that the simulation graph corresponding to an ℰ ℒ clustering may have more vertices than the clustering has clusters, in order to deal with role-successors.

We define some desirable properties of simulation graphs if we want to use them for computing ℰ ℒ clusterings.

Definition 3 (Properties). A simulation graph = ⟨, , , ⟩ for is • complete if for every ℰ ℒ concept , there exists some A ∈ that contains exactly the instances of in , • -complete, where ≥ 0, if for every ℰ ℒ concept of depth at most , there exists some A ∈ that contains exactly the instances of in , • summarizing if for every ∈ NI() there is some vertex A ∈ that contains exactly the individuals in NI() that are similar to . • -summarizing, where ≥ 0, if for every ∈ NI(), there is some vertex A ∈ s.t. A = { ∈

NI() | |= () for every ℰ ℒ-concept of depth at most s.t. |= ()}.

Completeness implies all the other properties in this definition. Being -summarizing is an approximation to being summarizing in the same way as -completeness approximates completeness: for this, it is suficient to recall that, if is similar to , then is an instance of every ℰ ℒ concept that is an instance of. For ≥ | NI()|, the notions -complete and -summarizing are equivalent to the non-approximative versions complete and summarizing.

3.1. Summarizing Simulation Graphs

As we will see later, summarizing simulation graphs can be computed eficiently. They have the potential to be useful not only for conceptual clustering in ℰ ℒ, but also to improve reasoning times in general and to be used for other learning problems in DLs up to the expressivity of ℒ. The key property is that summarizing simulation graphs preserve all the relevant information about the individuals in an ABox, even if a TBox is added that is expressed in the more expressive DL ℒ. For a given simulation graph = ⟨, , , ⟩, introduce a distinct individual A for every A ∈ and define the induced ABox as

= {(A) | ∈ (A)} ∪ {(A, B) | ⟨A, B⟩ ∈ and ∈ (⟨A, B⟩)}.

Theorem 1. Let = ⟨, , , ⟩ be a summarizing simulation graph for an ABox and an ℒ TBox. Let ∈ NI() and let A ∈ be the smallest node in (wrt. ⊆ ) s.t. ∈ A. Then, for every ℒ concept , ∪ |= () if ∪ |= (A).

Proof. The definition of summarizing simulation graphs makes sure that A is not only similar to , but indeed bisimilar. Consequently, given a model ℐ of ∪ , we can find a model of ∪ s.t. ⟨ℐ, ℐ ⟩ and ⟨ , A ⟩ are bisimilar, which implies that they satisfy the same ℒ concepts, and the same holds for the other direction.

Theorem 1 justifies why we define and compute simulation graphs only for ABoxes: adding a TBox, even if it is more expressive than ℰ ℒ, cannot lead to new clusters being introduced. At the same time, reasoning is usually less computationally expensive without TBox. As we will see later, we are able to compute summarizing simulation graphs even for large ABoxes in very short time, and the number of nodes in such a graph is often significantly smaller than the size of the ABox. This indicates that simulation graphs also have the potential to speed up reasoning for KBs with large ABoxes. Theorem 1 also indicates potential for using simulation graphs for learning in more expressive logics: Corollary 1. Let = ∪ be an ℒ KB and = ⟨, , , ⟩ a summarizing simulation graph for . Then, for every ℒ concept , the set of instances of in is the union over some subset ′ ⊆ .

This means that even for supervised learning problems, that try to find a concept based on a given set of positive and negative examples, solutions can be obtained from summarizing simulation graphs. In fact, together with Lemma 2, we obtain that all solutions to concept learning problems in ℒ can be expressed as disjunction over ℰ ℒ concepts. (Complement and value restrictions may however lead to more concise concepts.)

3.2. Complete Simulation Graphs

It is an easy consequence of our definitions that complete simulation graphs are unique. Corollary 2. For any ABox , there is exactly one complete simulation graph.

We can get to this conclusion also by observing that complete simulation graphs are simply summarizing simulation graphs that are closed under products. For this, we first need to adapt the definition of products to simulation graphs. Given a simulation graph = ⟨, , , ⟩ and two elements A, B ∈ , we can define the product of A and B as a node A × B in a new simulation graph A× B = ⟨ ′, ′, ′, ′⟩ defined inductively as the smallest extension of with new vertices A′ × B′ that satisfies the following conditions: 1. A × B ∈ ′, 2. for any A′ × B′ ∈ ′, A′ ∪ B′ ⊆ A′ × B′, 3. for any A′ × B′ ∈ ′, ′(A′ × B′) = (A′) ∩ (B′), 4. for any A1× B1 ∈ ′, ⟨A1, A2⟩, ⟨B1, B2⟩ ∈ , there is A2× B2 ∈ ′ and ⟨A1× B1, A2× B2⟩ ∈ ′ with ′(⟨A1 × B1, A2 × B2⟩) = (⟨A1, A2⟩) ∩ 2(⟨B1, B2⟩).

Note that as a consequence of Item 3 in Definition 2, A′ × B′ may contain more elements than A∪B—we require it to be the smallest extension that satisfies these criteria. The product is well-defined, since we can compute it using a fixpoint construction. We now say that a simulation graph is closed under products if applying the product on any pair of vertices produces again the same simulation graph. By using Lemma 1 and relating products in simulation graphs to products in pointed interpretations, we obtain the following theorem: Theorem 2. For all ≥ 0, any simulation graph that is (-)summarizing and closed under products is (-)complete.

3.3. Extracting ℰ ℒ-Clusterings

Because every ℰ ℒ concept is represented with its set of instances, complete simulation graphs give an upper bound on what can be included in an ℰ ℒ-clustering. If we are only interested in ℰ ℒ concepts of bounded depth, -complete simulation graphs are suficient. Indeed, we can extract for each node in an -complete simulation graph the corresponding ℰ ℒ-concept of depth at most using an inductive procedure. Fix a simulation graph = ⟨, , , ⟩. We associate to every A ∈ and ≥ 0 the ℰ ℒ concept A() defined inductively as follows: 1. A(0) = d (A) 2. For > 0, A() = d (A) ⊓ d⟨A,B⟩∈ d∈ (⟨A,B⟩) ∃.B(− 1) It follows directly from our definitions that |= A () for every ∈ A, and that for an -complete simulation graph, we can obtain all possible ℰ ℒ clusters with concepts of depth at most . We can use those clusters also together with an ℰ ℒ- or ℒ TBox. It can then be that we need less clusters: concepts for diferent vertices may become equivalent because of the TBox, and even capture the same set of individuals if they are not equivalent. The TBox can also be used to obtain more concise descriptions of the clusters: the method in [22] can compute for a given ℰ ℒ concept a concept of minimal length that is equivalent wrt. the TBox. We may use this in future work to obtain more user-friendly representations of conceptual clusterings.

To determine which subset ′ ⊆ constitutes a good ℰ ℒ-clustering, we introduce quality measures for ℰ ℒ-clusterings, which we here define in the most general way.

Definition 4. An ℰ ℒ-cluster quality measure is a function mapping pairs of KBs and ℰ ℒ clusters to real numbers. An ℰ ℒ-clustering quality measure is a function that assigns to every KB and set of ℰ ℒ-clusters a real number.

Quality measures may take into account diferent aspects such as the number of individuals in the cluster, the length of the concept, or try to minimize the overlap between diferent clusters. What constitutes a good quality measure will ultimately depend on the use case: for instance, if we want to use clusterings to support ontology engineering, we may also want to take into account whether introducing a concept name for a cluster will allow us to describe individuals more concisely. We keep a deeper investigation of quality measures as future work, and use for now a quality measure based on the idea of category utility already used in the conceptual clustering algorithm COBWEB [12]. Intuitively, the utility of a cluster captures its predictive power: how does knowing that an individual belongs to a cluster help us in predicting which concepts it satisfies? This is in in particular helpful if we want to use ℰ ℒ clusterings to organize and explore the individuals in a KB: For this, we may organize the ℰ ℒ clustering in a subsumption hierarchy, and navigate it starting from the most general concept. Assume we want to explore a dataset about employees in a company and the gender of an employee has no correlation to any of the other properties in the dataset. Then, organizing the data by gender does not bring any advantage, since it does not have any predictive power. Rather, we would like to organize the employees based on concepts that let us predict additional information.

Simulation graphs also help with defining this predictive power. Fix a KB = ∪ . Given two sets A, B ⊆ NI(), the conditional probability of A given B is defined as (A | B) = |A|B∩B|| , and the probability of A as (A) = (A | NI()). Given a concept name , we use to refer to the instances of in . By abuse of notation, given A ⊆ and ∈ NR, we use ∃.A to denote the individuals that have an -successor in A:

∃.A = { ∈ NI() | (, ) ∈ , ∈ A} Definition 5. Let = ∪ be a KB and = ⟨, , , ⟩ a simulation graph for . The utility of a cluster ⟨, A⟩ in context C ⊆ NI(), relative to , is defined as (A | C) = (A | C) · ⎝ ∑︁ (︀ ( | A ∩ C)2 − ( | C)2)︀ ⎛

∈NC ∈NR B∈ ⎞ + ∑︁ ∑︁ (︀ (∃.B | A ∩ C)2 − (∃.B | C)2)︀ ⎠ The utility of ⟨, A⟩ ∈ is defined as (A | NI()).

Note that the concept used to describe the cluster is not relevant here. determines which subsets of NI() can be distinguished by the measure. In the case where is complete, these are indeed the subsets of NI() that can be distinguished using ℰ ℒ concepts. Individuals can be described based on which concept names they satisfy, and which ℰ ℒ concepts their -successors satisfy. The intuitive idea of the utility of a vertex A is that guessing whether an individual belongs to A gives us an advantage in guessing any of these characteristics.

Looking at the utility of vertices in isolation is not suficient for extracting a good ℰ ℒ clustering, because there may be vertices with high utility that cover almost the same set of individuals. We therefore consider also the relative utility of one vertex to another: Assuming we know that an individual belongs to A or B, does guessing which one it belongs to give any advantage in guessing its other properties? This corresponds to (A | A ∪ B). If A and B help predicting the same things, then this value is going to be lower than if they predict diferent things, so that minimizing this value will lead to a more useful set of ℰ ℒ clusters. We can thus define the relative utility of a cluster ⟨, A⟩ in a given ℰ ℒ-clustering as

(A | ) = min{ (A | A ∪ B) | ⟨, B⟩ ∈ , ̸= }.

Intuitively, if the relative utility of a cluster is low, then dropping it from the clustering is not going to afect its overall predictive power, as the predictive power of that cluster is anyway low in presence of other clusters.

We now define the utility of an ℰ ℒ-clustering (relative to a simulation graph ) as () =

∑︁ ⟨,A⟩∈ (A | ). 4. Computing ℰ ℒ-Clusterings in Practice We first compute a summarizing simulation graph = ⟨, , , ⟩ from the given Abox . For ease of presentation, we assume in the following that every concept assertion () is represented as a role assertion (, * ) with * some specific fresh individual. The backbone of our algorithm is the process of partition refinement [23]. The intuition here is that we cluster many individuals together at first, then as we consider deeper ℰ ℒ concepts, we further refine the clusters in smaller ones. We use the following notation to refer to our partitions on the individuals: Definition 6 (Partition on the individuals). () := ()() is the partition on the set of individuals NI, generated by ℰ ℒ concepts of depth .

Grandson

Man

Thing Married Man

Married Parent

Married Woman Married Father

Married Mother Married Grandfather

Married Grandmother

We will now explain what it means for ℰ ℒ concepts to generate a partition. The first partition always clusters all individual together in one set: (0) = {NI}. For any subsequent partition we have the following rule: Definition 7 (Refinement) . Any two concepts and are clustered together in () (with ≥ 1) if all of the following conditions hold: • and are clustered together in (− 1), • for every (, ) ∈ there exists an (, ) ∈ , such that and are clustered together in (− 1), • for every (, ) ∈ there exists an (, ) ∈ , such that and are clustered together in (− 1).

If we assume a finite ABox, then when computing the partitions at some point the algorithm will ifnd a fixed point , i.e. (+1) = (). From this point onward all the partitions will stay the same. Therefore when the fixed point is found, our algorithm stops. We can now get as the union of all the partitions: = ⋃︀

=0 ().

For the edges we have the following rule: Definition 8 (Simulation graph edges). Any triple (S, , O) is part of if and only if there exists at least one level that satisfies the following: • S ∈ (), • O ∈ (− 1), • there exist individuals ∈ S and ∈ O, such that (, ) ∈ .

With the vertices and edges generated, we can now simply take the labelings as defined in Definition 2 to create . It is easy to verify that this results in a summarizing simulation graph. Moreover, if we replace by = ⋃︀

=0 (), we obtain a -summarizing simulation graph.

To extend the summarizing simulation graph to a complete one, which at the same time serves as initial clustering, we add products of pairs of vertices in rounds until a fixpoint is reached. To optimize the utility, we follow a greedy approach, and remove the worst cluster according to (A | A ∪ B) until a termination criterion is met. In our experiments, we used the desired number of clusters as criterion.

5. Evaluation

We applied the implementation on the family-history ontology that was used in [ 5 ] to evaluate concept learning, which is an ontology with a very simple TBox (no complex concepts), and whose ABox contains famility relations as roles and concepts for diferent types of family members. The ABox contained 1780 assertions with 202 individuals, which were summarized into 244 nodes in the summarizing simulation graph, and 30 in the 1-summarizing graph. We computed the 1-complete graph after 4 rounds of building products of all nodes, thus obtaining all relevant ℰ ℒ concepts without nested role restrictions. This graph contained 53 nodes, from which we extracted 10 ℰ ℒ clusters using the greedy algorithm.

When computing the utility (here and in later experiments), we used the complete summarizing graph, and not just the 1-summarizing one. While the corresponding concepts were relatively complex, they mostly correspond to simple concepts in human language, e.g. “Person ⊓ Male” corresponding to “Man”—Figure 1 shows the clusters organized into the subsumption hierarchy, where we represent each ℰ ℒ concept with the simple concept in natural language (the full list of concepts is in the appendix). We ifnd more or less the concepts one would expect as central concepts in a family tree, with the exception of “Woman”, which one would have expected together with “Man”, and the fact that “Grandson” is the only cluster representing children. This might indicate a bias in the data, but also shows that more research is needed into utility measure and alternative methods for computing the clustering. Note that the most general cluster “Thing” is needed to allow for higher relative utility of the other clusters.

The next experiment was about getting a first idea of the feasibility of ℰ ℒ clustering, and used ontologies from the OWL EL materialization track of the OWL reasoner competition 2015 [24, 25]. This corpus contains 110 ontologies in the OWL EL profile with non-empty ABoxes. We removed ABoxes axioms that were not supported by our method (e.g. equivalence assertions or assertions involving complex concepts), which resulted in 16 ontologies without ABox, leaving us with 94 ontologies to be used in the experiment. For each ontology, we computed the summarizing simulation graphs and attempted to compute 2-complete simulation graphs, from which we then extract ℰ ℒ-clusterings of size 10 using the greedy algorithm. We used a time limit of 10 minutes for each ontology, which resulted in 12 timeouts. The detailed results are shown in the appendix. Section 5 visualizes how the ABox sizes relate to the sizes of the 2-complete graphs and to the computation times. The number of nodes in the summarizing simulation graph is often much smaller than the number of individuals, and that the computation of clusters, even though so far only the summarizing step is really optimized, is possible in short time in most cases. Quite remarkably, many very large ABoxes resulted in summarizing simulation graphs with under 10 nodes, including the largest one with almost 750,000 individuals which had only 5 nodes in the summarizing graph. This indicates the potential of summarizing simulation graphs also for speeding up ABox reasoning.

6. Conclusion

While our summarization method is already quite eficient, we want to investigate more dedicated algorithms for computing the clusterings. One option is to integrate the computation of products directly into the summarization implementation, to be able to compute products more eficiently on the diferent levels of the simulation graphs. Another way could be to use an incremental approach as done in the conceptual clustering algorithm for ℒ [ 9 ]. Our selection of good clusterings is currently done in a greedy fashion—we want to investigate whether encodings into answer set programming might allow us to compute globally optimal solutions. Future evaluations should look into diferent corpora than the ORE corpus, for instance knowledge graphs without an ontology. To understand whether our utility function makes sense and compare with other alternatives, user studies will be required.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools. Artif. Intell. Rev. 52 (2019) 1267–1296. URL: https://doi.org/10.1007/s10462-018-9627-1. doi:10. 1007/S10462-018-9627-1. [12] D. H. Fisher, Knowledge acquisition via incremental conceptual clustering, Machine learning 2 (1987) 139–172. [13] F. Distel, Learning description logic knowledge bases from data using methods from formal concept analysis, Ph.D. thesis, Dresden University of Technology, 2011. URL: https://nbn-resolving.org/urn: nbn:de:bsz:14-qucosa-70199. [14] D. Borchmann, F. Distel, F. Kriegel, Axiomatisation of general concept inclusions from finite interpretations, J. Appl. Non Class. Logics 26 (2016) 1–46. URL: https://doi.org/10.1080/11663081. 2016.1168230. doi:10.1080/11663081.2016.1168230. [15] F. Kriegel, Eficient axiomatization of OWL 2 EL ontologies from data by means of formal concept analysis, in: M. J. Wooldridge, J. G. Dy, S. Natarajan (Eds.), Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, AAAI Press, 2024, pp. 10597–10606. URL: https://doi.org/10.1609/aaai.v38i9.28930. doi:10.1609/AAAI.V38I9.28930. [16] R. Guimarães, A. Ozaki, C. Persia, B. Sertkaya, Mining ℰℒ⊥ bases with adaptable role depth, J.

Artif. Intell. Res. 76 (2023) 883–924. URL: https://doi.org/10.1613/jair.1.13777. doi:10.1613/JAIR. 1.13777. [17] F. Hodo, S. Pranav, B. Sertkaya, Clustering knowledge graphs using concept lattices (extended abstract), in: O. Kutz, C. Lutz, A. Ozaki (Eds.), Proceedings of the 36th International Workshop on Description Logics (DL 2023) co-located with the 20th International Conference on Principles of Knowledge Representation and Reasoning and the 21st International Workshop on Non-Monotonic Reasoning (KR 2023 and NMR 2023)., Rhodes, Greece, September 2-4, 2023, volume 3515 of CEUR Workshop Proceedings, CEUR-WS.org, 2023. URL: https://ceur-ws.org/Vol-3515/abstract-12.pdf. [18] F. Martel, A. Zouaq, Taxonomy extraction using knowledge graph embeddings and hierarchical clustering, in: C. Hung, J. Hong, A. Bechini, E. Song (Eds.), SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, Republic of Korea, March 22-26, 2021, ACM, 2021, pp. 836–844. URL: https://doi.org/10.1145/3412841.3441959. doi:10.1145/3412841.3441959. [19] L. Guo, Q. Dai, Graph clustering via variational graph embedding, Pattern Recognit. 122 (2022) 108334. URL: https://doi.org/10.1016/j.patcog.2021.108334. doi:10.1016/J.PATCOG.2021. 108334. [20] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: A survey of approaches and applications, IEEE Trans. Knowl. Data Eng. 29 (2017) 2724–2743. URL: https://doi.org/10.1109/ TKDE.2017.2754499. doi:10.1109/TKDE.2017.2754499. [21] P. Koopmann, R. Bakel, M. Cochez, Towards conceptual clustering in ℰℒ with simulation graphs — experimental data and extended version, 2025. URL: https://doi.org/10.5281/zenodo.16794054. doi:10.5281/zenodo.16794054. [22] N. Nikitina, P. Koopmann, Small is beautiful: Computing minimal equivalent ℰℒ concepts, in: S. Singh, S. Markovitch (Eds.), Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, AAAI Press, 2017, pp. 1206–1212.

URL: https://doi.org/10.1609/aaai.v31i1.10684. doi:10.1609/AAAI.V31I1.10684. [23] R. Paige, R. E. Tarjan, Three partition refinement algorithms, SIAM Journal on Computing 16 (1987) 973–989. URL: https://doi.org/10.1137/0216062. doi:10.1137/0216062. [24] B. Parsia, N. Matentzoglu, R. S. Gonçalves, B. Glimm, A. Steigmiller, The OWL reasoner evaluation (ORE) 2015 competition report, J. Autom. Reason. 59 (2017) 455–482. URL: https://doi.org/10.1007/ s10817-017-9406-8. doi:10.1007/S10817-017-9406-8. [25] N. Matentzoglu, B. Parsia, ORE 2015 reasoner competition corpus, 2015. URL: https://doi.org/10. 5281/zenodo.18578. doi:10.5281/zenodo.18578.

[1]

Konev ,

Lutz ,

Ozaki ,

Wolter , Exact learning of lightweight description logic ontologies , J. Mach. Learn. Res . 18 ( 2017 ) 201 : 1 - 201 : 63 . URL: https://jmlr.org/papers/v18/ 16 - 256 .html.

[2]

M. R. C.

Duarte ,

Konev , A . Ozaki, ExactLearner: A tool for exact learning of ℰ ℒ ontologies , in: M. Thielscher , F.

Toni , F.

Wolter (Eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018 , Tempe, Arizona, 30 October - 2 November 2018 , AAAI Press, 2018 , pp. 409 - 414 . URL: https://aaai.org/ocs/index.php/KR/KR18/ paper/view/18006.

[3]

Lehmann , DL-Learner: Learning concepts in description logics , J. Mach. Learn. Res . 10 ( 2009 ) 2639 - 2642 . URL: https://dl.acm.org/doi/10.5555/1577069.1755874. doi: 10 .5555/1577069. 1755874.

[4]

Rizzo ,

Fanizzi , C. d'Amato, Class expression induction as concept space exploration: From DL-Foil to DL-Focl, Future Gener . Comput. Syst . 108 ( 2020 ) 256 - 272 . URL: https://doi.org/10.1016/ j.future. 2020 . 02 .071. doi: 10 .1016/J.FUTURE. 2020 . 02 .071.

[5]

Heindorf ,

Blübaum ,

Düsterhus ,

Werner ,

V. N.

Golani ,

Demir ,

A. N.

Ngomo , Evolearner: Learning description logics with evolutionary algorithms , in: F. Laforest,

Troncy ,

Simperl ,

Agarwal ,

Gionis , I. Herman , L. Médini (Eds.), WWW '22: The ACM Web Conference 2022 ,

Virtual

Event , Lyon, France, April 25 - 29 , 2022 , ACM, 2022 , pp. 818 - 828 . URL: https: //doi.org/10.1145/3485447.3511925. doi: 10 .1145/3485447.3511925.

[6]

A. N.

Ngomo ,

Demir ,

N. J.

Kouagou ,

Heindorf ,

Karalis ,

Bigerl , Class expression learning with multiple representations , in: P. Hitzler,

M. K.

Sarker , A . Eberhart (Eds.), Compendium of Neurosymbolic Artificial Intelligence , volume 369 of Frontiers in Artificial Intelligence and Applications , IOS Press, 2023 , pp. 272 - 286 . URL: https://doi.org/10.3233/FAIA230145. doi: 10 .3233/ FAIA230145.

[7] B. ten Cate ,

Funk ,

J. C.

Jung , C. Lutz, SAT-based PAC learning of description logic concepts , in: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023 , 19th -25th August 2023 , Macao, SAR , China, ijcai.org, 2023 , pp. 3347 - 3355 . URL: https://doi.org/10.24963/ijcai. 2023 /373. doi: 10 .24963/IJCAI. 2023 /373.

[8]

Sazonau , U. Sattler, G. Brown, General terminology induction in OWL , in: M. Arenas , O.

Corcho , E. Simperl, M.

Strohmaier , M. d'Aquin, K.

Srinivas , P.

Groth , M.

Dumontier , J.

Heflin , K.

Thirunarayan , K.

Thirunarayan , S. Staab (Eds.), The Semantic Web - ISWC 2015 , Springer International Publishing, Cham, 2015 , pp. 533 - 550 .

[9] C. d'Amato , S.

Staab , N.

Fanizzi , F.

Esposito , DL-Link: a conceptual clustering algorithm for indexing description logics knowledge bases , Int. J. Semantic Comput . 4 ( 2010 ) 453 - 486 . URL: https://doi.org/10.1142/S1793351X10001085. doi: 10 .1142/S1793351X10001085.

[10] R. S. Michalski, R. E. Stepp, Learning from observation: Conceptual clustering , Machine learning: An artificial intelligence approach ( 1983 ) 331 - 363 .

[11]

A. P.

Suárez ,

J. F. M.

Trinidad ,

J. A.

Carrasco-Ochoa , A review of conceptual clustering algorithms,