<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering Knowledge Graphs Using Concept Lattices (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabiola Hodo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sai Pranav</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barış Sertkaya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Frankfurt University of Applied Sciences</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose an approach for identifying clusters in a knowledge graph based on commonalities of entities and for computing logical descriptions of these clusters. Our approach is based on computing formal concepts from data, which is a standard data analysis technique in Formal Concept Analysis (FCA) [1]. In FCA, attributes are unary predicates which are either satisfied or not satisfied by objects. A formal concept is a pair (, ), where  is the set of objects satisfying all of the attributes in  and  is the set of common attributes of the objects in . Ordered w.r.t. set inclusion, formal concepts form a lattice, called the concept lattice, which is a hierarchy of formal concepts. In [2, 3, 4, 5, 6, 7] FCA was employed in combination with DLs for computing the base of valid axioms of an application domain and for enriching existing DL knowledge bases. In [8] it was used to compute a data-driven schema from knowledge graphs. In the present work, we make use of similar ideas for a diferent purpose, namely for identifying clusters in knowledge graphs. Our motivation is to learn logical descriptions of entities that share common properties. As properties we use the notion of a model-based most specific concept (mmsc) introduced in [ 9, 3]. The resulting formal concepts, which are clusters of entities, are then described by ℰℒ⊥ concept descriptions. Varying the role-depth allows us to compute clusters of variable granularities from the knowledge graph. Shallow role-depths result in coarse clusters, higher role-depths result in more fine-grained clusters. Our approach is comparable to Relational Concept Analysis (RCA) [10], which is an extension of FCA for dealing with relational data. As data structure it uses a set of formal contexts and a set of relations. The distinguishing feature of our work is that we use existing FCA data structures and algorithms of the shelf, and do not need to extend them for relational data. This has the advantage that we can use existing FCA libraries just out of the box.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graphs</kwd>
        <kwd>clustering</kwd>
        <kwd>Concept Lattices</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Identifying Clusters in Knowledge Graphs
In order to depict our approach on a simple example consider the knowledge graph given in
Figure 1. It contains the 5 entities , , , , , and some facts on their class memberships and
on their relationships to each other. For identifying formal concepts in a knowledge graph,

child</p>
      <p>sibling
 sibling</p>
      <p>Female</p>
      <p>Female child
the first step is to build a formal context whose objects are entities from the knowledge graph
and whose attributes are properties of these entities. In principle, we can select class names
as attributes. A formal concept obtained this way will have the common set of class names
as its intent. More precisely, a formal concept (, ) will generalize the entities in  just by
conjuncting the classes that they belong to. As a result, the formal concepts, or the clusters we
obtain will not be detailed enough since we do not take into account the relationships of the
entities in  to other entities.</p>
      <p>
        Alternatively, as the set of attributes we can select every combination of relation and class
names until some certain role-depth. This would be the other extreme. It would result in a
highly fine-grained clustering, but would have the drawback that the number of attributes is
huge. For a set of class names NC, a set of relation names NR and for depth , the number of
attributes (or concept descriptions) we obtain this way is super-exponential in the sizes of NC
and NR. The reason is that, the set of attributes for role depth  is generated using the power
set of attributes at role depth  − 1. (For more details see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].)
      </p>
      <p>
        Our aim is clustering the entities in a knowledge graph based on their common properties,
which requires use of attributes that generalize sets of entites, yet at the same time are as specific
as possible. To this purpose, we use the notion of model-based most specific concept description
introduced in [
        <xref ref-type="bibr" rid="ref3">9, 3</xref>
        ] as attributes of a formal context. For generalizing concept descriptions from
individuals, least common subsumers (LCS) and most specific concepts (msc) have intensively
been studied in the literature [11, 12, 13].
      </p>
      <p>
        Definition 1 ( [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ]). Let  be a knowledge graph and ∆  be the set of entites in . We denote
the most specific concept description for the entities in  ⊆ ∆  at depth  as  . Based on this,
we define a set of attributes for  at depth  as:
      </p>
      <p>,, = N ∪ {⊥, ⊤} ∪ {∃.− 1 |  ⊆ ∆  and  ̸= ∅}
where N is the set of types entities in . More precisely, it is defined as N = { |  ∈
NC and ∃ ∈  s.t.  is an instance of }. For depth  = 0 we define ,, = N ∪ {⊥, ⊤}.
The formal context induced by  and  at depth  is then K = (, ,,, ℐℛ) where the
incidence relation ℐℛ is defined as ℐℛ = {(, ) |  ∈ ,  ∈ ,, and  is an instance of }.
Example 1. Consider the toy knowledge graph  in Figure 1 with ∆  = {, , , , }, NC =
{Female, ⊥ ⊤} and NR = {sibling, child}. The formal context K0 obtained with  = ∆  and
depth  = 0 has the attribute set ,Δ,0 = {⊥, Female, ⊤}.</p>
      <p>This formal context gives rise to three
formal concepts: (∅, {⊥}), ({, }, {Female}) and
({, , }, {⊤}). When ordered w.r.t. inclusion
between their extents (first set in the pair), these
formal concepts form the depicted hierarchy. For
depth 0, we get a rather rough clustering of the
knowledge graph. The only commonality of the
individuals  and  is that they are both females.
, , 
, 
,Δ,1 = {⊥, Female, ∃sibling.Female, ∃sibling.⊤, ∃child.Female, ∃child.⊤, ⊤}
which gives us the formal context K1. The
new formal context gives rise to 6 formal
concepts seen in the concept lattice on the
right. It is visible in the lattice, that the
entities  and  now belong to the formal
concept marked with the attributes Female and
∃sibling.⊤. (For reading the attributes
satisifed by an object, start at the node with this
object and follow the upwards lines in the lattice.)</p>
      <p>Female,
∃sibling.⊤
∃ sibling.</p>
      <p>Female

,  ⊤</p>
      <p>⊥, ∃child.Female

∃child.⊤</p>
      <p>For depth 1, the concept hierarchy is more detailed compared to the hierarchy we obtain with
depth 0. Finally we build the formal context for the same knowledge graph for role-depth  = 2,
which gives us the formal context K2 with 13 attributes. It gives rise to a new concept
lattice, which as the previous one, has 6 formal concepts. This time the concepts are even more
ifne-grained. Consider again the concept with the extent {, }. This time it has the intent
{Female, ∃sibling.⊤, ∃sibling.(∃child.⊤)}, which describes the set of females that have a sibling,
which has a child. Intuitively, this concept corresponds to the notion of an aunt in natural language.
∃sibling.(Female ⊓ ∃sibling.⊤),
∃sibling.(Female ⊓ ∃sibling.⊤ ⊓ ∃child.⊤)</p>
      <p>Female, ∃sibling.⊤,
∃sibling.(∃child.⊤)</p>
      <p>∃child.⊤
⊥
,  ⊤</p>
    </sec>
    <sec id="sec-2">
      <title>2. Experimenting with WikiData</title>
      <p>For evaluating our approach we implemented it in Python and tested it on a small dataset
extracted from WikiData. Our implementation1 uses the Python library rdflib2 for parsing
and navigating a knowledge graph written in RDF. Additionally it uses the concepts3 library
for computing the formal concepts of a given context. The tests are run on a laptop with an
Intel i7 processor with 8 cores running at 2.8Ghz and with 16GB of RAM.</p>
      <p>The data extracted from WikiData4 contains 1216 triples about the 27 member states of the
European Union. The properties occurring in the dataset are rdf:type, P1344 (participant in),
P463 (member of) and P47 (shares border with). Some of the 191 classes occurring in the data
set are Q1065 (United Nations), Q8908 (Council of Europe), Q8268 (Eurozone), Q7817 (World
Health Organization) Q8919 (European Atomic Energy Community) and Q7809 (UNESCO).</p>
      <p>We ran experiments for 3,4,5,10,15 and 20 of the 27 EU-member countries each with
roledepths 0, 1 and 2. Table 1 displays the number of attributes and number of clusters generated in
these settings and also the execution time in seconds. For the object set of size 20 and role-depth
2, the implementation ran more than 30 minutes without producing a result. Profiling of the code
revealed that the reason for this is the unoptimized implementation of the recursive function
for computing the mmsc of a given set of entities. It also revealed that in all of the experiments,
the major time consuming part was computing the attributes of the formal context, which is
based on the computation of the mmsc.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion and Future Work</title>
      <p>We have presented an approach for identifying clusters in a knowledge graph. Unlike other
approaches for analyzing knowledge graphs, which are based on statistical and machine learning
models, our approach uses a method based on DLs and FCA. It can be used to learn logical
descriptions of classes from a knowledge graph and to build a hierarchy of these classes. An
experimental evaluation with data sets extracted from WikiData showed that our implementation
sufers from performance issues even for small data sets.</p>
      <p>As future work, we are going to work on improving both the theory and the implementation
of our approach. On the theory part, we are going to investigate how we can improve the
computation of the attributes. One idea here could be to learn approximate descriptions of the
clusters instead of exact descriptions. In the implementation part, we are going to work on
optimizations so that we can compute clusters for larger fragments of WikiData with higher
role-depths.
1https://github.com/sertkaya/knowledge-graph-concept-learner
2https://rdflib.readthedocs.io
3https://concepts.readthedocs.io
4https://www.wikidata.org/</p>
      <p>Number of
objects</p>
      <p>Depth
3
4
5
10
15
20
depth, J. Artif. Intell. Res. 76 (2023) 883–924. URL: https://doi.org/10.1613/jair.1.13777.
doi:10.1613/jair.1.13777.
[8] L. González, A. Hogan, Modelling dynamics in semantic web knowledge graphs with formal
concept analysis, in: P. Champin, F. Gandon, M. Lalmas, P. G. Ipeirotis (Eds.), Proceedings
of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France,
April 23-27, 2018, ACM, 2018, pp. 1175–1184. URL: https://doi.org/10.1145/3178876.3186016.
doi:10.1145/3178876.3186016.
[9] F. Baader, F. Distel, Exploring finite models in the description logic ELgfp, in: S. Ferré,
S. Rudolph (Eds.), Proceedings of the 7th International Conference on Formal Concept
Analysis, (ICFCA 2009), volume 5548 of Lecture Notes in Artificial Intelligence ,
SpringerVerlag, 2009, pp. 146–161.
[10] M. R. Hacene, M. Huchard, A. Napoli, P. Valtchev, Relational concept analysis: mining
concept lattices from multi-relational data, Ann. Math. Artif. Intell. 67 (2013) 81–108. URL:
https://doi.org/10.1007/s10472-012-9329-3. doi:10.1007/s10472-012-9329-3.
[11] F. Baader, R. Küsters, R. Molitor, Computing least common subsumers in description logics
with existential restrictions, in: Proceedings of the 16th International Joint Conference on
Artificial Intelligence (IJCAI’99), 1999, pp. 96–101.
[12] A. Ecke, A. Turhan, Role-depth bounded least common subsumers for EL+ and ELI, in:
Y. Kazakov, D. Lembo, F. Wolter (Eds.), Proceedings of the 2012 International Workshop on
Description Logics, DL-2012, Rome, Italy, June 7-10, 2012, volume 846 of CEUR Workshop
Proceedings, CEUR-WS.org, 2012. URL: https://ceur-ws.org/Vol-846/paper_58.pdf.
[13] J. C. Jung, C. Lutz, F. Wolter, Least general generalizations in description logic: Verification
and existence, in: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI
2020, AAAI Press, 2020, pp. 2854–2861. URL: https://ojs.aaai.org/index.php/AAAI/article/
view/5675.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          ,
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          , Springer-Verlag, Berlin, Germany,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          ,
          <article-title>Relational exploration: Combining Description Logics and Formal Concept Analysis for knowledge specification</article-title>
          ,
          <source>Ph.D. dissertation, Fakultät Mathematik und Naturwissenschaften</source>
          , TU Dresden, Germany,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Distel</surname>
          </string-name>
          ,
          <article-title>Learning description logic knowledge bases from data using methods from formal concept analysis</article-title>
          ,
          <source>Ph.D. dissertation</source>
          , Dresden University of Technology, Germany,
          <year>2011</year>
          . URL: https://nbn-resolving.org/urn:nbn:de:bsz:
          <fpage>14</fpage>
          -qucosa-70199.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sertkaya</surname>
          </string-name>
          ,
          <article-title>Formal concept analysis for qualitative data analysis over triple stores</article-title>
          , in: O.
          <string-name>
            <surname>D. Troyer</surname>
            ,
            <given-names>C. B.</given-names>
          </string-name>
          <string-name>
            <surname>Medeiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Billen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Hallot</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Simitsis</surname>
            ,
            <given-names>H. V.</given-names>
          </string-name>
          <string-name>
            <surname>Mingroot</surname>
          </string-name>
          (Eds.), Advances in Conceptual Modeling.
          <article-title>Recent Developments and New Directions - ER 2011 Workshops FP-UML, MoRE-BI, Onto-CoM, SeCoGIS</article-title>
          , Variability@ER,
          <string-name>
            <surname>WISM</surname>
          </string-name>
          , Brussels, Belgium,
          <source>October 31 - November 3</source>
          ,
          <year>2011</year>
          . Proceedings, volume
          <volume>6999</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2011</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Borchmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Distel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <article-title>Axiomatisation of general concept inclusions from ifnite interpretations</article-title>
          ,
          <source>Journal of Applied Non-Classical Logics</source>
          <volume>26</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <article-title>Constructing and Extending Description Logic Ontologies using Methods of Formal Concept Analysis</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Technische Universität Dresden, Dresden, Germany,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guimarães</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ozaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Persia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sertkaya</surname>
          </string-name>
          ,
          <article-title>Mining ℰ ℒ⊥ bases with adaptable role</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>