=Paper= {{Paper |id=Vol-3308/paper7 |storemode=property |title=Exploring the Application of Graph-FCA to the Problem of Knowledge Graph Alignment |pdfUrl=https://ceur-ws.org/Vol-3308/Paper07.pdf |volume=Vol-3308 |authors=Sébastien Ferré |dblpUrl=https://dblp.org/rec/conf/cla/Ferre22 }} ==Exploring the Application of Graph-FCA to the Problem of Knowledge Graph Alignment== https://ceur-ws.org/Vol-3308/Paper07.pdf
Exploring the Application of Graph-FCA to the
Problem of Knowledge Graph Alignment
Sébastien Ferré1,∗
1
    Univ Rennes, CNRS, IRISA, Campus de Beaulieu, 35042 Rennes, France


                                         Abstract
                                         Knowledge Graphs (KG) have become a widespread knowledge representation. When different KGs
                                         exist for some domain, it is valuable to merge them into a richer KG. This is known as the problem of
                                         KG alignement, which encompasses related problems such as entity alignement or ontology matching.
                                         Although most recent approaches rely on supervised representation learning, Formal Concept Analysis
                                         (FCA) has also been proposed as a basis for symbolic and unsupervised approaches. We here explore
                                         the application of Graph-FCA, an extension of FCA for KGs, to different scenarios of KG alignments:
                                         (A) when the two KGs have common values, and (B) when pre-aligned pairs are known. We show
                                         that, compared to previous FCA-based approaches, Graph-FCA allows for a more natural and scalable
                                         representation of the KGs to be aligned, and makes it simpler to extract alignments from the concepts. It
                                         also features flexibility w.r.t. different alignment scenarios.

                                         Keywords
                                         Knowledge Graph, Semantic Web, Entity Alignement, Ontology Matching, Graph-FCA




1. Introduction
A knowledge graph (KG) is a modelling of some domain of interest that supports many tasks
such as question answering, reasoning over data or machine learning. They are commonly
represented in languages of the Semantic Web (RDF(S), OWL), and published on the web as
Linked Open Data (LOD) [9]. A KG is a set of entities described by their classes, properties,
and relations with other entities. Due to the open nature of the web, it is common that a same
domain of interest is modelled by multiple KGs. Those have in general overlapping, hence
redundant, information but also complementary information. It is therefore valuable to merge
the different KGs that model a same domain into a single richer KG. The difficulty is that
different KGs generally use different entity identifiers (URIs) for the same real entities, and even
different schemas or ontologies, i.e. different classes, properties and relations.
   The problem of merging two KGs consists in discovering equivalence links between entities
and/or between schemas, which is known as knowledge graph alignment. A number of other
terms are used in the literature, partially depending on the focus on entities or on the schema:
entity alignment, entity matching, ontology alignment/matching, data interlinking [2, 14]. In

Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 79–91.
∗
    Corresponding author.
Envelope-Open ferre@irisa.fr (S. Ferré)
Orcid 0000-0002-6302-2333 (S. Ferré)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
order to align two KGs, they must have something in common, some alignment seeds. Those
seeds can be: a common language for the string values (e.g., names, titles), a set pre-aligned
pairs, or a combination of those. Most existing work consider a single scenario, and may not
apply to other scenarios. A common scenario (called Scenario A in the following) is to only
assume a common language, and no pre-aligned pairs. This is an unsupervised setting. Another
common scenario in recent literature (called Scenario B in the following), assumes that a set of
pre-aligned pairs is available [14]. This is a supervised setting where the pre-aligned pairs serve
as a training set. Most approaches use similarity measures that combine terminological features
(i.e., strings for names, titles, ...), and structural features (i.e., relationships between entities). A
recent trend is to use KG embeddings as such features. Formal Concept Analysis (FCA) [7] and
FCA extensions, such as Pattern Structures [6] and Relational Concept Analysis [11], have been
recently proposed for tasks related to KG alignment [15, 3, 1]. Formal concepts are used there
as a symbolic form of similarity.
    In this paper, we explore the application of Graph-FCA [5], an extension of FCA specifically
designed for analyzing KGs, to the two main scenarios of KG alignment. We propose an approach
that makes three contributions to previous FCA-based approaches.
   1. A more natural and scalable representation. The formal context is the union (rather than
      the product) of the two KGs to be aligned, with minor adjustments depending on the
      alignment scenario.
   2. A uniform alignment extraction methods for both entities and schema elements (classes,
      properties and relations). All alignement pairs are extracted from concept extents (rather
      than from both extents and intents).
   3. A flexible approach w.r.t. different scenarios. They are handled through minor adjustments
      to the representation of KGs.
We show the effectiveness of our approach on a few use cases taken or adapted from previous
FCA-based work.
  Section 2 states the KG alignment problem. Section 3 presents the related work. Section 4
shortly describes Graph-FCA. Section 5 explains our approach from principles to concrete
application, and discusses the results on the two main scenarios. Section 6 concludes and draws
perspectives.


2. Problem Statement
We start by defining knowledge graphs (KG), taking care to distinguish between classes, prop-
erties and relations in order to allow for a more accurate modelling. Many work on KGs only
consider entities and relations.

Definition 1 (knowledge graph). A knowledge graph is a structure 𝐺 = ⟨𝐸, 𝐶, 𝑃, 𝑅, 𝑇 ⟩, where 𝐸
is the set of entities, 𝐶 is the set of classes, 𝑃 is the set of properties, 𝑅 is the set of relations, and 𝑇
is a set of triples. The schema of the KG is made of the classes, properties and relations. There are
three kinds of triples, for any 𝑒, 𝑒 ′ ∈ 𝐸, 𝑐 ∈ 𝐶, 𝑝 ∈ 𝑃, 𝑟 ∈ 𝑅:

    • (𝑒, type , 𝑐) states that class 𝑐 is the type of entity 𝑒;
Figure 1: Two KGs (left and right) with cyclic dependencies (from Atencia et al [3])


    • (𝑒, 𝑝, 𝑣) states that property 𝑝 has value 𝑣 on entity 𝑒, where a value can be a string, a number,
      etc.;
    • (𝑒, 𝑟, 𝑒 ′ ) states that relation 𝑟 links entity 𝑒 to entity 𝑒 ′ .

   In RDF and OWL, entities are called individuals, values are called literals, and the distinction
between properties and relations corresponds to the distinction in OWL between datatype
properties and object properties. URIs are used to identify entities, classes, properties, and
relations.
   Figure 1 shows two knowledge graphs, one on the left and another on the right. It falls into
Scenario A as it uses completely different schemas and it only shares proper names of entities
(square boxes). The objective is to find the best alignement between the two. This is the most
complex example considered by Atencia et al [3] (Fig. 8, p.23). It involves cyclic dependencies
that have to be taken into account to find the correct alignment.
   It is easy to find that o1:z1 aligns with o2:i1 because they are the unique entities in each KG
to have "Dupont" as a value. From this first alignement, one can find two additional alignments:
property o1:lastname with o2:name , and class o1:Person with o2:Inhabitant . It is more
difficult to find that o1:z3 aligns with o2:i3 . o1:z3 has lastname "Dubois" but both o2:i2
and o2:i3 have name "Dubois" . o1:z3 has a home/address in Paris but so do o2:i2 and o2:i3 .
Traversing deeper in the graph, we find that o1:z3 ’s home is owned by a Dupont who lives in
Grenoble. Something equivalent can be found for o2:i3 but not for o2:i2 . We observe that the
alignment of entities, classes, properties and relations depend on each other, and that handling
long-range dependencies may be necessary to resolve alignments.
   The example in Figure 1 can be adapted to cover Scenario B where string values are in
different incomparable languages but some pre-aligned pairs may be available. Imagine for
instance that the strings of the left-side KG are in Chinese, and that we already know that o1:z1
aligns with o2:i1 . Without that pre-aligned pair, one only has structural information, which is
not enough to discriminate between the different people and places. Knowing that o1:z1 aligns
with o2:i1 (Dupont), one can align classes o1:Person and o2:Inhabitant , and also relations
o1:home and o2:address . By propagating alignments through neighborhood, starting from
the pre-aligned pair, it is here possible to find the correct alignment for all elements: first o1:h1
with o2:a1 , then o1:city with o2:city and o1:z2 with o2:i2 , and so on.


3. Related Work
The alignment of KGs (or entities, or ontologies) is a matter of defining and combining similarity
measures [2]. There are four kinds of similarities, depending on the kind of information that is
used. Terminological similarity is based on the strings that are used to name entities and schema
elements. Structural similarity is based on the description of entities by classes, properties, and
relations. Extensional similarity is based on the extensions of classes. Semantic similarity is
based on external sources of knowledge, and on automated reasoning (e.g., subsumption checks
in description logics). Terminological and structural similarities are the most common methods,
and are often combined to get better results. In general, strutural similarity does not consider
the KG as a whole but use either tree-shaped descriptions extracted from the knowledge graph,
or neighbor-based propagation of similarity values (e.g., RiMOM-IM [13]).
   The trend in KG alignement is to use representation learning methods, in a supervised setting
where a significant number of pre-aligned pairs is supposed to be known [14]. The principle
is first to learn embeddings for the two KGs that coincide for the pre-aligned pairs. Other
alignments can then be found by nearest-neighbor search in the embedding space. Many
representation learning methods exist, such as TransE [4], ComplEx-N3 [10] or R-GCN [12].
   Different variants of FCA have been applied to tasks related to KG alignment. Classical
FCA was used for aligning (aka. matching) several biomedical ontologies, in a system called
FCA-Map [15]. The scenario is here to match classes and relations based on their names and
their relationships (e.g., class hierarchies, class disjointness, relation domains and ranges), there
are no entities per se. The intuition behind using FCA is that if a concept has only two objects
in its extent (here classes and relations), one from each ontology, then this is a strong indication
that the two entities should be aligned. This is called an anchor in FCA-Map. The global
alignment is obtained by defining a succession of five formal contexts, and computing their
concept lattices. This can be seen as an ad-hoc form of Relational Concept Analysis (RCA) [11],
with the notions of iterative concept formation, and relational attributes.
   RCA was used for the task of link key discovery [3], which consists in finding pairs of properties
and/or relations that are identifying keys for the entities of a pair of classes. In Figure 1, the
pair o1:lastname /o2:name is not a link key for the pair of classes o1:Person /o2:Inhabitant ,
but combined with the pair o1:home /o2:address , it is. Link key discovery is somewhat
stronger than KG alignment because in addition to provide alignments, it provides a rule for
predicting aligned pairs in unseen data. RCA can be compared to methods that propagate
similarity values through relations, except that formal concepts are used in place of numerical
values. Pattern Structures [6] were used for the same task with an approach that is similar
Figure 2: Graph context about dishes, cereals, and countries.


but does not cover long-range and cyclic dependencies like in RCA. In those approaches, the
discovery of an alignment between two KGs is somewhat contrived. First, although a KG can be
naturally represented as an RCA context, the RCA context must instead represent the product
of the two KGs: i.e., the objects are all pairs of entities (one from each KG), the properties
are all pairs of properties, and so on. This entails a quadratic increase in the size of the data,
which poses scalibility issues. Second, the discovered alignment is scattered in the extents and
intents of formal concepts, across several concept lattices. We show in this paper that those
two disadvantages are overcome when using Graph-FCA [5] on the task of KG alignment.


4. Graph-FCA
Graph-FCA [5] is an extension of FCA for multi-relational data, and in particular for knowledge
graphs. Graph-FCA defines a graph context as an incidence relation between tuples of objects
and attributes. A graph context is a triple 𝐾 = (𝑂, 𝐴, 𝐼 ), where 𝑂 is a set of objects, 𝐴 is a set of
attributes, and 𝐼 ⊆ 𝑂 ∗ × 𝐴 is an incidence relation between object tuples 𝑜 = (𝑜1 , … , 𝑜𝑘 ) ∈ 𝑂 ∗ , for
any arity 𝑘, and attributes 𝑎 ∈ 𝐴. A graph context represents a KG with objects as nodes, and
attributes as hyper-edge labels. An hyper-edge ((𝑜1 , … , 𝑜𝑘 ), 𝑎) can be seen as the logical atom
𝑎(𝑜1 , … , 𝑜𝑘 ); we use the latter notation for readability. Unary edges correspond to node labels
while binary edges correspond to labeled edges. Classical FCA is a special case of Graph-FCA
where 𝑘 = 1 (only unary edges) and 𝑛 = 1 (only unary concepts).
   Figure 2 shows the graphical representation of a graph context about dishes, cereals, and
countries. Each box represents an object 𝑜 along with the list of attributes 𝑎 such that there is an
unary edge 𝑎(𝑜) ∈ 𝐼. Each arrow from box 𝑜1 to box 𝑜2 labelled by attribute 𝑎 denotes a binary
edge 𝑎(𝑜1 , 𝑜2 ) ∈ 𝐼. An hyper-edge 𝑎(𝑜1 , … , 𝑜𝑘 ) with 𝑘 > 2 is represented as an ellipsis labeled
by 𝑎, and having an arrow labeled 𝑖 to each box 𝑜𝑖 (not illustrated here).
   Concept intents express what several 𝑛-tuples of objects have in common. They are graph
patterns with 𝑛 distinguished nodes. They are called Projected Graph Patterns (PGP), and they
are similar to conjunctive queries. For example, the PGP [(𝑥, 𝑦) ← 𝑝𝑎𝑟𝑒𝑛𝑡(𝑥, 𝑧), 𝑝𝑎𝑟𝑒𝑛𝑡(𝑦, 𝑧)]
Figure 3: Compact representation of graph concepts about dishes, cereals, and countries.


has three nodes 𝑥, 𝑦, 𝑧, two edges 𝑝𝑎𝑟𝑒𝑛𝑡(𝑥, 𝑧) and 𝑝𝑎𝑟𝑒𝑛𝑡(𝑦, 𝑧), and two distinguished nodes 𝑥, 𝑦.
It can be used as a definition of the ”sibling” relationship, i.e. the fact that 𝑥 and 𝑦 are siblings if
they have a common parent 𝑧. FCA is extended to graph contexts by extending the set operations
to PGPs. PGP inclusion ⊆𝑞 is based on graph homomorphisms, and PGP intersection ∩𝑞 is based
on the categorical product of graphs (see [5] for details).
   The Galois connection is defined in Graph-FCA between PGPs (𝑄, ⊆𝑞 ) and sets of object
           ∗
tuples (2𝑂 , ⊆). In the definitions of 𝑄 ′ and 𝑅′ below, the PGP [𝑜 ← 𝐼 ] represents the description
of an object tuple 𝑜 by the whole incidence relation 𝐼 seen from the relative position of 𝑜.

                  𝑄 ′ ∶= {𝑜 ∈ 𝑂 𝑛 ∣ 𝑄 ⊆𝑞 [𝑜 ← 𝐼 ]}, for 𝑄 = [(𝑥1 , … , 𝑥𝑛 ) ← 𝑃] ∈ 𝑄
                  𝑅′ ∶= ∩𝑞 {[𝑜 ← 𝐼 ] ∣ 𝑜 ∈ 𝑅},      for 𝑅 ⊆ 𝑂 𝑛 , 𝑛 ∈ ℕ

   Informally, 𝑄 ′ can be understood as the result set of 𝑄, seen as a query; and 𝑅′ as the most
specific query whose result set contains 𝑅. From there, concepts can be defined in the usual
way, and organized into lattices. A concept is a pair (𝑅, 𝑄) such that 𝑄 ′ = 𝑅 and 𝑅′ =𝑞 𝑄. The
arity of 𝑄 and 𝑅 must be the same, it determines the arity of the concept. Unary concepts are
about sets of objects, while binary concepts are about relationships between objects, and so on.
Unlike RCA, there is a concept lattice for each concept arity rather than for each object type.
   Figure 3 displays a compact representation of the graph concepts about dishes, cereals, and
countries. It only shows unary concepts (𝑅, 𝑄), with 𝑄 = [𝑥 ← 𝑃], s.t. |𝑅| ≥ 2, 𝑃 ≠ ∅, and
𝑃 is a “core pattern” (i.e., 𝑃 has no homomorphic subset). Each box 𝑥 identifies a unary
concept, with its name at the top (e.g., Q1b), and its extent at the bottom (here, Pakistan,
Thailand). The concept intent is the PGP [𝑥 ← 𝑃], where 𝑃 is the subgraph made of solid
arrows containing node 𝑥. Those arrows denote binary edges while the middle part of the
boxes denote unary edges. By reading the graph, we learn that Concept Q1b is the concept
of “Asian countries, which eat a lot of some dish whose main cereal is a rice produced in the
country itself”. Formally, its intent is denoted by [𝑏 ← 𝑐𝑜𝑢𝑛𝑡𝑟𝑦(𝑏), 𝐴𝑠𝑖𝑎(𝑏), 𝑒𝑎𝑡𝐿𝑜𝑡𝑂𝑓 (𝑏, 𝑎), 𝑑𝑖𝑠ℎ(𝑎),
ℎ𝑎𝑠𝑀𝑎𝑖𝑛𝐶𝑒𝑟𝑒𝑎𝑙(𝑎, 𝑐), 𝑐𝑒𝑟𝑒𝑎𝑙(𝑐), 𝑟𝑖𝑐𝑒(𝑐), 𝑖𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑒𝑑𝐼 𝑛(𝑐, 𝑏)] (using the small letters of concept names
as pattern variables). Concepts Q1a and Q1c have the same graph pattern as Q1b but with a
different focus, on dishes for Q1a and on cereals for Q2c. N-ary concepts are obtained by picking
several distinguished nodes. For example, (Q1b,Q1a,Q1c) is a ternary concept whose instances
are the object triples (𝑃𝑎𝑘𝑖𝑠𝑡𝑎𝑛, 𝑏𝑖𝑟𝑦𝑎𝑛𝑖, 𝑏𝑎𝑠𝑚𝑎𝑡𝑖𝑅𝑖𝑐𝑒) and (𝑇 ℎ𝑎𝑖𝑙𝑎𝑛𝑑, 𝑘ℎ𝑎𝑜𝑀𝑎𝑛𝐾 𝑎𝑖, 𝑡ℎ𝑎𝑖𝑅𝑖𝑐𝑒). It
represents the cyclic relationship existing between a country, a dish, and a cereal in Asian
countries. The Q2-concepts are generalizations of the Q1-concepts. For example, concept Q2b
covers all countries. The dashed arrow from Q2b to Q1b says that Q1b is a subconcept of Q2b.
The Q2-intents say that, in the context, all dishes use some rice as a main cereal, and that all
cereals are produced in some country but that not all countries eat a lot of some dish, and not
all dishes are eaten a lot by some country.


5. Concept-based Knowledge Graph Alignment
In this section, we explain how KG alignments can be extracted from Graph-FCA concepts, and
we discuss the results on the example and scenarios of Section 2.

5.1. Principle
When two entities from two different KGs are equivalent, e.g. o1:z1 and o2:i1 in the example,
they generally have equivalent types (e.g., o1:Person and o2:Inhabitant ), equivalent prop-
erties with equal values (e.g. o1:lastname and o2:name “Dupont”), and equivalent relations
with equivalent entities (e.g., o1:home o1:h1 and o2:address o2:a1 ). Similarly, when two
classes are equivalent, they are the type of equivalent entities. In FCA terms, the two equivalent
entities (or classes) belong to the extent of a formal concept whose intent expresses everything
they have in common. Here, the intent is not only about how the entities are described but also
about how the entities are related to other entities. Moreover, the equivalence between two
entities depend on the equivalence between the related entities, and so on recursively. This
makes Graph-FCA, like RCA, a good match for the problem of KG alignment.
   More precisely, let 𝐺1 and 𝐺2 be two KGs. If an unary concept 𝐶 = (𝑅, 𝑄) has only two
elements in its extent, 𝑅 = {𝑥1 , 𝑥2 }, where 𝑥1 belong to 𝐺1 and 𝑥2 belongs to 𝐺2 , then this is a
strong indication that 𝑥1 and 𝑥2 are equivalent because they are unique in their own KG to
match the projected graph pattern 𝑄. Such a concept is called an anchor in [15]. Concepts that
are not anchors can also contribute to the alignement, although in a less direct and less certain
way. For example, a concept with extent {𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 } says that 𝐺1 -entities 𝑥1 and 𝑦1 align with
𝐺2 -entities 𝑥2 and 𝑦2 , without knowing which aligns to which. However, if we also have the
anchor {𝑥1 , 𝑥2 }, we can deduce an alignment between 𝑦1 and 𝑦2 . Another example is a concept
with extent {𝑥1 , 𝑦2 , 𝑧2 } where some ambiguity remains on the alignment of 𝑥1 but it nonetheless
reduces the set of possibilities from all 𝐺2 -entities to only two entities, 𝑦2 and 𝑧2 .

5.2. Graph-FCA Representation
The immediate representation of a KG as a Graph-FCA context is to use classes, properties and
relations as attributes (hyper-edge labels), and entities as objects (nodes). Each occurrence of a
value is represented by a distinct node labelled by the value as this avoids spurious connections
between objects. This is formalized by the following mappings from KG triples to Graph-FCA
hyper-edges:

                (𝑒, type , 𝑐) ⇝ 𝑐(𝑒)      (𝑒, 𝑝, 𝑣) ⇝ 𝑝(𝑒, "𝑣")       (𝑒, 𝑟, 𝑒 ′ ) ⇝ 𝑟(𝑒, 𝑒 ′ )

where 𝑝(𝑒, "𝑣") is a shorthand for 𝑝(𝑒, 𝑜), 𝑣(𝑜) for some fresh object 𝑜.
   This immediate representation, however, does not produce any alignment because the two
KG representations do not share any attribute. Indeed, the two KGs have different schemas,
hence different classes, properties and relations. Recall that Graph-FCA concept intents are
made of attributes and variables. As a consequence, for all concepts except the top one, the
concept extent contains only objects from a single KG, and no anchor concept can be found.
   An adequate representation should use the same set of attributes for the two KGs to be aligned,
and those attributes should reflect what they have in common. In Scenario A, the two KGs only
share the values. Actually, they also share the graph structure, and the distinction between
classes, properties and relations. Anything else must hence be represented as objects: i.e., not
only entities but also classes, properties and relations. We achieve this by introducing three
meta-level common attributes: type for expressing the relation between entities and classes,
prop for relating entities to values through properties, and rel for relating entities to entities
through relations. The new mapping from KG triples to Graph-FCA hyper-edges is hence as
follows:

           (𝑒, type , 𝑐) ⇝ type(𝑒, 𝑐)   (𝑒, 𝑝, 𝑣) ⇝ prop(𝑒, 𝑝, "𝑣")    (𝑒, 𝑟, 𝑒 ′ ) ⇝ rel(𝑒, 𝑟, 𝑒 ′ )

This is a form of reification, in which classes, properties and relations are treated as objects. It
is made possible by the fact that Graph-FCA supports hyper-edges beyond binary edges. As
defined above, the notation "𝑣" for a value implicitly introduces an attribute 𝑣, hence accounts
for the values shared between KGs.
   In Scenario B, in contrast to Scenario A, even the values are different in the two KGs. This
means that values do not bring any information so that the mapping for properties can be
simplified as (𝑒, 𝑝, 𝑣) ⇝ prop(𝑒, 𝑝). Obviously, the two mappings can be mixed if some values
are shared and others not. To compensate for the lack of shared elements, a set of pre-aligned
pairs (𝑥1 , 𝑥2 ) may be given as input, and modelled as equivalence triples (𝑥1 , sameAs , 𝑥2 ). Such
Table 1
Anchor concepts in Scenario A (common values).
        C    extent(C)                       intent(C), only adjacent edges
        𝑎    o1:lastname o2:name             𝑝𝑟𝑜𝑝(𝑒, 𝑎, "𝐷𝑢𝑏𝑜𝑖𝑠"), 𝑝𝑟𝑜𝑝(𝑓 , 𝑎, "𝐷𝑢𝑏𝑜𝑖𝑠"), 𝑝𝑟𝑜𝑝(𝑔, 𝑎, "𝐷𝑢𝑝𝑜𝑛𝑡")
        𝑏    o1:city     o2:city             𝑝𝑟𝑜𝑝(ℎ, 𝑏, "𝐺𝑟𝑒𝑛𝑜𝑏𝑙𝑒"), 𝑝𝑟𝑜𝑝(𝑖, 𝑏, "𝑃𝑎𝑟𝑖𝑠"), 𝑝𝑟𝑜𝑝(𝑗, 𝑏, "𝑃𝑎𝑟𝑖𝑠")
        𝑐    o1:owner o2:ownedBy             𝑟𝑒𝑙(ℎ, 𝑐, 𝑓 ), 𝑟𝑒𝑙(𝑖, 𝑐, 𝑔), 𝑟𝑒𝑙(𝑗, 𝑐, 𝑒)
        𝑑    o1:home     o2:address          𝑟𝑒𝑙(𝑒, 𝑑, 𝑖), 𝑟𝑒𝑙(𝑓 , 𝑑, 𝑗), 𝑟𝑒𝑙(𝑔, 𝑑, ℎ)
        𝑒    o1:z3       o2:i3               𝑡𝑦𝑝𝑒(𝑒, 𝑙), 𝑝𝑟𝑜𝑝(𝑒, 𝑎, "𝐷𝑢𝑏𝑜𝑖𝑠"), 𝑟𝑒𝑙(𝑒, 𝑑, 𝑖), 𝑟𝑒𝑙(𝑗, 𝑐, 𝑒)
        𝑓    o1:z2       o2:i2               𝑡𝑦𝑝𝑒(𝑓 , 𝑙), 𝑝𝑟𝑜𝑝(𝑓 , 𝑎, "𝐷𝑢𝑏𝑜𝑖𝑠"), 𝑟𝑒𝑙(𝑓 , 𝑑, 𝑗), 𝑟𝑒𝑙(ℎ, 𝑐, 𝑓 )
        𝑔    o1:z1       o2:i1               𝑡𝑦𝑝𝑒(𝑔, 𝑙), 𝑝𝑟𝑜𝑝(𝑔, 𝑎, "𝐷𝑢𝑝𝑜𝑛𝑡"), 𝑟𝑒𝑙(𝑔, 𝑑, ℎ), 𝑟𝑒𝑙(𝑖, 𝑐, 𝑔)
        ℎ    o1:h1       o2:a1               𝑡𝑦𝑝𝑒(ℎ, 𝑘), 𝑝𝑟𝑜𝑝(ℎ, 𝑏, "𝐺𝑟𝑒𝑛𝑜𝑏𝑙𝑒"), 𝑟𝑒𝑙(𝑔, 𝑑, ℎ), 𝑟𝑒𝑙(ℎ, 𝑐, 𝑓 )
        𝑖    o1:h3       o2:a3               𝑡𝑦𝑝𝑒(𝑖, 𝑘), 𝑝𝑟𝑜𝑝(𝑖, 𝑏, "𝑃𝑎𝑟𝑖𝑠"), 𝑟𝑒𝑙(𝑒, 𝑑, 𝑖), 𝑟𝑒𝑙(𝑖, 𝑐, 𝑔)
        𝑗    o1:h2       o2:a2               𝑡𝑦𝑝𝑒(𝑗, 𝑘), 𝑝𝑟𝑜𝑝(𝑗, 𝑏, "𝑃𝑎𝑟𝑖𝑠"), 𝑟𝑒𝑙(𝑓 , 𝑑, 𝑗), 𝑟𝑒𝑙(𝑗, 𝑐, 𝑒)
        𝑘    o1:House o2:Place               𝑡𝑦𝑝𝑒(ℎ, 𝑘), 𝑡𝑦𝑝𝑒(𝑖, 𝑘), 𝑡𝑦𝑝𝑒(𝑗, 𝑘)
        𝑙    o1:Person o2:Inhabitant         𝑡𝑦𝑝𝑒(𝑒, 𝑙), 𝑡𝑦𝑝𝑒(𝑓 , 𝑙), 𝑡𝑦𝑝𝑒(𝑔, 𝑙)


an equivalence can be represented in Graph-FCA by using a fresh attribute 𝑎𝑥 that represents
the real entity or schema element 𝑥 that 𝑥1 and 𝑥2 refer to. The mapping rule is

                      (𝑥1 , sameAs , 𝑥2 ) ⇝ 𝑎𝑥 (𝑥1 ), 𝑎𝑥 (𝑥2 ),   for some fresh attribute 𝑎𝑥 .

Attribute 𝑎𝑥 works like a nominal attribute for each KG, labelling only 𝑥1 in 𝐺1 , and only 𝑥2 in
𝐺2 .

5.3. Results and Discussion
We ran our Graph-FCA implementation1 on the two scenarios in order to find anchors, and
hence KG alignments. We used the following options to make their computation more efficient,
and the output smaller: -minsupp 2 because anchors have by definition a support equal to 2,
and -only-cores because it excludes concepts that represent under-optimal alignments (an
example is given below).

Scenario A (common values). Graph-FCA finds 24 unary concepts, grouped in only two
patterns: Q1 and Q2. All anchor concepts are in Q1, and all concepts in Q1 are anchor concepts.
Moreover, all elements of the two KGs (entities and schema) appear in those concepts. Pattern
Q1 therefore represents a complete alignement between the two KGs. Table 1 lists those
concepts with their identifier (a lowercase letter), their extent (an object from each KG), and
their neighboring intent. For instance, concept 𝑔 shows the alignment of o1:z1 with o2:i1 , and
its intent reads as follows: The two entities have type 𝑙, property 𝑎 with value “Dupont”, relation 𝑑
to entity ℎ, and relation 𝑐 from entity 𝑖. Each variable in the intent refers to a concept in the same
pattern, and hence to an aligned pair: e.g., the common type 𝑙 is o1:Person /o2:Inhabitant ,
and the property with value “Dupont” is o1:lastname /o2:name . One can observe that the
alignement is also correct, not only for entities but also for classes, properties and relations.

1
    Open source at https://bitbucket.org/sebferre/graph-fca/
Table 2
Anchor concepts in Scenario B (pre-aligned pairs).
           C extent(C)                     intent(C), only adjacent edges
           𝑎 o1:city     o2:city           𝑝𝑟𝑜𝑝(𝑔, 𝑎), 𝑝𝑟𝑜𝑝(ℎ, 𝑎), 𝑝𝑟𝑜𝑝(𝑖, 𝑎)
           𝑏 o1:lastname o2:name           𝑝𝑟𝑜𝑝(𝑒, 𝑏), 𝑝𝑟𝑜𝑝(𝑓 , 𝑏), 𝑝𝑟𝑜𝑝(𝑙, 𝑏)
                         o2:given
           𝑐 o1:owner o2:ownedBy           𝑟𝑒𝑙(𝑔, 𝑐, 𝑓 ), 𝑟𝑒𝑙(ℎ, 𝑐, 𝑙), 𝑟𝑒𝑙(𝑖, 𝑐, 𝑒)
           𝑑 o1:home     o2:address        𝑟𝑒𝑙(𝑒, 𝑑, ℎ), 𝑟𝑒𝑙(𝑓 , 𝑑, 𝑖), 𝑟𝑒𝑙(𝑙, 𝑑, 𝑔)
           𝑒 o1:z3       o2:i3             𝑡𝑦𝑝𝑒(𝑒, 𝑘), 𝑝𝑟𝑜𝑝(𝑒, 𝑏), 𝑟𝑒𝑙(𝑒, 𝑑, ℎ), 𝑟𝑒𝑙(𝑖, 𝑐, 𝑒)
           𝑓 o1:z2       o2:i2             𝑡𝑦𝑝𝑒(𝑓 , 𝑘), 𝑝𝑟𝑜𝑝(𝑓 , 𝑏), 𝑟𝑒𝑙(𝑓 , 𝑑, 𝑖), 𝑟𝑒𝑙(𝑔, 𝑐, 𝑓 )
           𝑔 o1:h1       o2:a1             𝑡𝑦𝑝𝑒(𝑔, 𝑗), 𝑝𝑟𝑜𝑝(𝑔, 𝑎), 𝑟𝑒𝑙(𝑔, 𝑐, 𝑓 ), 𝑟𝑒𝑙(𝑙, 𝑑, 𝑔)
           ℎ o1:h3       o2:a3             𝑡𝑦𝑝𝑒(ℎ, 𝑗), 𝑝𝑟𝑜𝑝(ℎ, 𝑎), 𝑟𝑒𝑙(𝑒, 𝑑, ℎ), 𝑟𝑒𝑙(ℎ, 𝑐, 𝑙)
           𝑖 o1:h2       o2:a2             𝑡𝑦𝑝𝑒(𝑖, 𝑗), 𝑝𝑟𝑜𝑝(𝑖, 𝑎), 𝑟𝑒𝑙(𝑓 , 𝑑, 𝑖), 𝑟𝑒𝑙(𝑖, 𝑐, 𝑒)
           𝑗 o1:House o2:Place             𝑡𝑦𝑝𝑒(𝑔, 𝑗), 𝑡𝑦𝑝𝑒(ℎ, 𝑗), 𝑡𝑦𝑝𝑒(𝑖, 𝑗)
           𝑘 o1:Person o2:Inhabitant       𝑡𝑦𝑝𝑒(𝑒, 𝑘), 𝑡𝑦𝑝𝑒(𝑓 , 𝑘), 𝑡𝑦𝑝𝑒(𝑙, 𝑘)
           𝑙 o1:z1       o2:i1             𝑧1𝑖1(𝑙), 𝑡𝑦𝑝𝑒(𝑙, 𝑘), 𝑝𝑟𝑜𝑝(𝑙, 𝑏), 𝑟𝑒𝑙(ℎ, 𝑐, 𝑙), 𝑟𝑒𝑙(𝑙, 𝑑, 𝑔)


  The concepts in pattern Q2 are general concepts. One concept gathers all entities (people
and places), another all classes, and similarly for properties and relations. It simply reflects
the meta-model used in our representation. Without option -only-cores , Graph-FCA outputs
additional concepts that represent approximate alignments, such as {o1:z1, o1:z3, o2:i1,
o2:i3} or {o1:lastname, o2:name, o2:given} .


Scenario B (pre-aligned pairs). Without common values nor any pre-aligned pairs, Graph-
FCA only finds the meta-concepts, like in pattern Q2 in the above scenario. Indeed, there
is no rationale for making any alignment. When adding a pre-aligned pair, like o1:z1 with
o2:i1 , Graph-FCA finds an almost complete alignment (see Table 2). The only uncertainty is
about o1:lastname that aligns with either o2:name or o2:given . Indeed, without values, the
different properties that apply to an entity cannot be distinguished. Choosing any other pair
of entities as pre-aligned pair gives the same result. However, if the pre-aligned pair is about
the schema (e.g., two equivalent classes), then only the schema elements are correctly aligned.
Entities are grouped by type (people and places) but they are not aligned individually.

Other results. We also applied our approach to the example of Abbas et al [1], which falls
in Scenario A (common values). It is at the same time simpler than the example of Atencia et
al because it has no relations (only classes and properties), and more difficult because there
is not a one-to-one mapping between the two KGs. First, there are entities in 𝐺1 that are not
present in 𝐺2 . Second, the classes are not equivalent: the 𝐺1 -class o1:Book is split in two
𝐺2 -classes, o2:Dictionary and o2:Novel ; and the 𝐺2 -class o2:FemaleScientist is equivalent
to the intersection of two 𝐺1 -classes, o1:Woman and o1:Scientist . The alignments produced
by Graph-FCA are grouped in two patterns, one for people and another for books, because there
are no relations between them in the data. All common entities and all properties are completely
and correctly aligned (one anchor concept for each pair). For classes, three interesting concepts
are found: {o1:Woman, o1:Scientist, o2:FemaleScientist} , {o1:Book, o2:Novel} , and
{o1:Book, o2:Dictionary} . They can be read as “Woman and Scientist together align with
FemaleScientist”, and “Book aligns with either Novel or Dictionary”. This is exactly as ex-
pected. Note that each entity with type o1:Book is correctly assigned to one of o2:Novel and
o2:Dictionary (two distinct concepts), based on its properties (“author” for novels, “year” for
dictionaries).

Discussion. Compared to the use of classical FCA, RCA or Pattern Structures, our approach
does not build a context as the “product” of the two KGs, i.e. with all pairs of entities as objects,
and all pairs of classes/properties/relations as attributes. Instead, the Graph-FCA context is
built as the “union” of the two KGs. This entails that the size of our context is the sum of the
size of the two KGs, rather than the product. This clearly favors the scalability of our approach.
   A key enabling feature of Graph-FCA is the support for 𝑘-ary relations as it enables to
reify the binary properties and relations as objects, along with entities, and to use them as
arguments of meta-level ternary predicates (e.g., prop). Note that this is akin to RDF triples
where properties belong to the same space as individuals. As a consequence, the alignment
of classes, properties and relations (the schema) can be interpreted in the Graph-FCA output
exactly like for entities, through concept extents. This is in contrast with other FCA-based
approaches, where a different interpretation method must be used for the schema because it
appears in concept intents. Although it can be difficult to read Graph-FCA output in general,
a KG alignment can be read from the unary concept extents, which are sets of objects, like in
classical FCA.


6. Conclusion and Perspectives
Graph-FCA is FCA for knowledge graphs, and we have shown in this paper that by adequately
representing two KGs in terms of what they have in common, we can extract aligned pairs
of entities, classes, properties and relations from Graph-FCA concepts in a uniform way. The
results in this paper are limited to a few encouraging illustrative examples in different scenarios,
and much remains to be done to apply our approach to real KGs. A first issue is scalability. A
research track is to only generate anchors, i.e. concepts that result from the intersection of
two objects, one from each KG. A second issue is to allow for approximate matching of values,
especially texts, as they often slightly differ from one KG to another.


Acknowledgments
This research is supported by ANR project SmartFCA (ANR-21-CE23-0023).


References
 [1] Abbas, N., David, J., Napoli, A.: Discovery of link keys in RDF data based on pattern
     structures: Preliminary steps. In: Int. Conf. Concept Lattices and Their Applications (2020)
 [2] Ardjani, F., Bouchiha, D., Malki, M.: Ontology-alignment techniques: Survey and analysis.
     Int. J. Modern Education & Computer Science 7(11) (2015)
 [3] Atencia, M., David, J., Euzenat, J., Napoli, A., Vizzini, J.: Link key candidate extraction
     with relational concept analysis. Discrete applied mathematics 273, 2–20 (2020)
 [4] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embed-
     dings for modeling multi-relational data. In: Advances in neural information processing
     systems. pp. 2787–2795 (2013)
 [5] Ferré, S., Cellier, P.: Graph-FCA: An extension of formal concept analysis to knowledge
     graphs. Discrete Applied Mathematics 273, 81–102 (2019)
 [6] Ganter, B., Kuznetsov, S.: Pattern structures and their projections. In: Delugach, H.S.,
     Stumme, G. (eds.) Int. Conf. Conceptual Structures. pp. 129–142. LNCS 2120, Springer
     (2001)
 [7] Ganter, B., Wille, R.: Formal Concept Analysis — Mathematical Foundations. Springer
     (1999)
 [8] Hahn, G., Tardif, C.: Graph homomorphisms: structure and symmetry. In: Graph symmetry,
     pp. 107–166. Springer (1997)
 [9] Hitzler, P., Krotzsch„ M., Rudolph, S.: Foundations of Semantic Web Technologies. Chap-
     man & Hall/CRC (2009)
[10] Lacroix, T., Usunier, N., Obozinski, G.: Canonical tensor decomposition for knowledge
     base completion. In: Dy, J.G., Krause, A. (eds.) Int. Conf. Machine Learning. Proceedings
     of Machine Learning Research, vol. 80, pp. 2869–2878. PMLR (2018)
[11] Rouane-Hacene, M., Huchard, M., Napoli, A., Valtchev, P.: Relational concept analysis:
     mining concept lattices from multi-relational data. Annals of Mathematics and Artificial
     Intelligence 67(1), 81–108 (2013)
[12] Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling
     relational data with graph convolutional networks. In: The Semantic Web Conf. (ESWC).
     pp. 593–607. Springer (2018)
[13] Shao, C., Hu, L.M., Li, J.Z., Wang, Z.C., Chung, T., Xia, J.B.: Rimom-im: A novel iterative
     framework for instance matching. J. computer science and technology 31(1), 185–197
     (2016)
[14] Zeng, K., Li, C., Hou, L., Li, J., Feng, L.: A comprehensive survey of entity alignment for
     knowledge graphs. AI Open 2, 1–13 (2021)
[15] Zhao, M., Zhang, S., Li, W., Chen, G.: Matching biomedical ontologies based on formal
     concept analysis. J. biomedical semantics 9(1), 1–27 (2018)