Exploring the Application of Graph-FCA to the Problem of Knowledge Graph Alignment SΓ©bastien FerrΓ©1,βˆ— 1 Univ Rennes, CNRS, IRISA, Campus de Beaulieu, 35042 Rennes, France Abstract Knowledge Graphs (KG) have become a widespread knowledge representation. When different KGs exist for some domain, it is valuable to merge them into a richer KG. This is known as the problem of KG alignement, which encompasses related problems such as entity alignement or ontology matching. Although most recent approaches rely on supervised representation learning, Formal Concept Analysis (FCA) has also been proposed as a basis for symbolic and unsupervised approaches. We here explore the application of Graph-FCA, an extension of FCA for KGs, to different scenarios of KG alignments: (A) when the two KGs have common values, and (B) when pre-aligned pairs are known. We show that, compared to previous FCA-based approaches, Graph-FCA allows for a more natural and scalable representation of the KGs to be aligned, and makes it simpler to extract alignments from the concepts. It also features flexibility w.r.t. different alignment scenarios. Keywords Knowledge Graph, Semantic Web, Entity Alignement, Ontology Matching, Graph-FCA 1. Introduction A knowledge graph (KG) is a modelling of some domain of interest that supports many tasks such as question answering, reasoning over data or machine learning. They are commonly represented in languages of the Semantic Web (RDF(S), OWL), and published on the web as Linked Open Data (LOD) [9]. A KG is a set of entities described by their classes, properties, and relations with other entities. Due to the open nature of the web, it is common that a same domain of interest is modelled by multiple KGs. Those have in general overlapping, hence redundant, information but also complementary information. It is therefore valuable to merge the different KGs that model a same domain into a single richer KG. The difficulty is that different KGs generally use different entity identifiers (URIs) for the same real entities, and even different schemas or ontologies, i.e. different classes, properties and relations. The problem of merging two KGs consists in discovering equivalence links between entities and/or between schemas, which is known as knowledge graph alignment. A number of other terms are used in the literature, partially depending on the focus on entities or on the schema: entity alignment, entity matching, ontology alignment/matching, data interlinking [2, 14]. In Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16π‘‘β„Ž International Conference on Concept Lattices and Their Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 79–91. βˆ— Corresponding author. Envelope-Open ferre@irisa.fr (S. FerrΓ©) Orcid 0000-0002-6302-2333 (S. FerrΓ©) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) order to align two KGs, they must have something in common, some alignment seeds. Those seeds can be: a common language for the string values (e.g., names, titles), a set pre-aligned pairs, or a combination of those. Most existing work consider a single scenario, and may not apply to other scenarios. A common scenario (called Scenario A in the following) is to only assume a common language, and no pre-aligned pairs. This is an unsupervised setting. Another common scenario in recent literature (called Scenario B in the following), assumes that a set of pre-aligned pairs is available [14]. This is a supervised setting where the pre-aligned pairs serve as a training set. Most approaches use similarity measures that combine terminological features (i.e., strings for names, titles, ...), and structural features (i.e., relationships between entities). A recent trend is to use KG embeddings as such features. Formal Concept Analysis (FCA) [7] and FCA extensions, such as Pattern Structures [6] and Relational Concept Analysis [11], have been recently proposed for tasks related to KG alignment [15, 3, 1]. Formal concepts are used there as a symbolic form of similarity. In this paper, we explore the application of Graph-FCA [5], an extension of FCA specifically designed for analyzing KGs, to the two main scenarios of KG alignment. We propose an approach that makes three contributions to previous FCA-based approaches. 1. A more natural and scalable representation. The formal context is the union (rather than the product) of the two KGs to be aligned, with minor adjustments depending on the alignment scenario. 2. A uniform alignment extraction methods for both entities and schema elements (classes, properties and relations). All alignement pairs are extracted from concept extents (rather than from both extents and intents). 3. A flexible approach w.r.t. different scenarios. They are handled through minor adjustments to the representation of KGs. We show the effectiveness of our approach on a few use cases taken or adapted from previous FCA-based work. Section 2 states the KG alignment problem. Section 3 presents the related work. Section 4 shortly describes Graph-FCA. Section 5 explains our approach from principles to concrete application, and discusses the results on the two main scenarios. Section 6 concludes and draws perspectives. 2. Problem Statement We start by defining knowledge graphs (KG), taking care to distinguish between classes, prop- erties and relations in order to allow for a more accurate modelling. Many work on KGs only consider entities and relations. Definition 1 (knowledge graph). A knowledge graph is a structure 𝐺 = ⟨𝐸, 𝐢, 𝑃, 𝑅, 𝑇 ⟩, where 𝐸 is the set of entities, 𝐢 is the set of classes, 𝑃 is the set of properties, 𝑅 is the set of relations, and 𝑇 is a set of triples. The schema of the KG is made of the classes, properties and relations. There are three kinds of triples, for any 𝑒, 𝑒 β€² ∈ 𝐸, 𝑐 ∈ 𝐢, 𝑝 ∈ 𝑃, π‘Ÿ ∈ 𝑅: β€’ (𝑒, type , 𝑐) states that class 𝑐 is the type of entity 𝑒; Figure 1: Two KGs (left and right) with cyclic dependencies (from Atencia et al [3]) β€’ (𝑒, 𝑝, 𝑣) states that property 𝑝 has value 𝑣 on entity 𝑒, where a value can be a string, a number, etc.; β€’ (𝑒, π‘Ÿ, 𝑒 β€² ) states that relation π‘Ÿ links entity 𝑒 to entity 𝑒 β€² . In RDF and OWL, entities are called individuals, values are called literals, and the distinction between properties and relations corresponds to the distinction in OWL between datatype properties and object properties. URIs are used to identify entities, classes, properties, and relations. Figure 1 shows two knowledge graphs, one on the left and another on the right. It falls into Scenario A as it uses completely different schemas and it only shares proper names of entities (square boxes). The objective is to find the best alignement between the two. This is the most complex example considered by Atencia et al [3] (Fig. 8, p.23). It involves cyclic dependencies that have to be taken into account to find the correct alignment. It is easy to find that o1:z1 aligns with o2:i1 because they are the unique entities in each KG to have "Dupont" as a value. From this first alignement, one can find two additional alignments: property o1:lastname with o2:name , and class o1:Person with o2:Inhabitant . It is more difficult to find that o1:z3 aligns with o2:i3 . o1:z3 has lastname "Dubois" but both o2:i2 and o2:i3 have name "Dubois" . o1:z3 has a home/address in Paris but so do o2:i2 and o2:i3 . Traversing deeper in the graph, we find that o1:z3 ’s home is owned by a Dupont who lives in Grenoble. Something equivalent can be found for o2:i3 but not for o2:i2 . We observe that the alignment of entities, classes, properties and relations depend on each other, and that handling long-range dependencies may be necessary to resolve alignments. The example in Figure 1 can be adapted to cover Scenario B where string values are in different incomparable languages but some pre-aligned pairs may be available. Imagine for instance that the strings of the left-side KG are in Chinese, and that we already know that o1:z1 aligns with o2:i1 . Without that pre-aligned pair, one only has structural information, which is not enough to discriminate between the different people and places. Knowing that o1:z1 aligns with o2:i1 (Dupont), one can align classes o1:Person and o2:Inhabitant , and also relations o1:home and o2:address . By propagating alignments through neighborhood, starting from the pre-aligned pair, it is here possible to find the correct alignment for all elements: first o1:h1 with o2:a1 , then o1:city with o2:city and o1:z2 with o2:i2 , and so on. 3. Related Work The alignment of KGs (or entities, or ontologies) is a matter of defining and combining similarity measures [2]. There are four kinds of similarities, depending on the kind of information that is used. Terminological similarity is based on the strings that are used to name entities and schema elements. Structural similarity is based on the description of entities by classes, properties, and relations. Extensional similarity is based on the extensions of classes. Semantic similarity is based on external sources of knowledge, and on automated reasoning (e.g., subsumption checks in description logics). Terminological and structural similarities are the most common methods, and are often combined to get better results. In general, strutural similarity does not consider the KG as a whole but use either tree-shaped descriptions extracted from the knowledge graph, or neighbor-based propagation of similarity values (e.g., RiMOM-IM [13]). The trend in KG alignement is to use representation learning methods, in a supervised setting where a significant number of pre-aligned pairs is supposed to be known [14]. The principle is first to learn embeddings for the two KGs that coincide for the pre-aligned pairs. Other alignments can then be found by nearest-neighbor search in the embedding space. Many representation learning methods exist, such as TransE [4], ComplEx-N3 [10] or R-GCN [12]. Different variants of FCA have been applied to tasks related to KG alignment. Classical FCA was used for aligning (aka. matching) several biomedical ontologies, in a system called FCA-Map [15]. The scenario is here to match classes and relations based on their names and their relationships (e.g., class hierarchies, class disjointness, relation domains and ranges), there are no entities per se. The intuition behind using FCA is that if a concept has only two objects in its extent (here classes and relations), one from each ontology, then this is a strong indication that the two entities should be aligned. This is called an anchor in FCA-Map. The global alignment is obtained by defining a succession of five formal contexts, and computing their concept lattices. This can be seen as an ad-hoc form of Relational Concept Analysis (RCA) [11], with the notions of iterative concept formation, and relational attributes. RCA was used for the task of link key discovery [3], which consists in finding pairs of properties and/or relations that are identifying keys for the entities of a pair of classes. In Figure 1, the pair o1:lastname /o2:name is not a link key for the pair of classes o1:Person /o2:Inhabitant , but combined with the pair o1:home /o2:address , it is. Link key discovery is somewhat stronger than KG alignment because in addition to provide alignments, it provides a rule for predicting aligned pairs in unseen data. RCA can be compared to methods that propagate similarity values through relations, except that formal concepts are used in place of numerical values. Pattern Structures [6] were used for the same task with an approach that is similar Figure 2: Graph context about dishes, cereals, and countries. but does not cover long-range and cyclic dependencies like in RCA. In those approaches, the discovery of an alignment between two KGs is somewhat contrived. First, although a KG can be naturally represented as an RCA context, the RCA context must instead represent the product of the two KGs: i.e., the objects are all pairs of entities (one from each KG), the properties are all pairs of properties, and so on. This entails a quadratic increase in the size of the data, which poses scalibility issues. Second, the discovered alignment is scattered in the extents and intents of formal concepts, across several concept lattices. We show in this paper that those two disadvantages are overcome when using Graph-FCA [5] on the task of KG alignment. 4. Graph-FCA Graph-FCA [5] is an extension of FCA for multi-relational data, and in particular for knowledge graphs. Graph-FCA defines a graph context as an incidence relation between tuples of objects and attributes. A graph context is a triple 𝐾 = (𝑂, 𝐴, 𝐼 ), where 𝑂 is a set of objects, 𝐴 is a set of attributes, and 𝐼 βŠ† 𝑂 βˆ— Γ— 𝐴 is an incidence relation between object tuples π‘œ = (π‘œ1 , … , π‘œπ‘˜ ) ∈ 𝑂 βˆ— , for any arity π‘˜, and attributes π‘Ž ∈ 𝐴. A graph context represents a KG with objects as nodes, and attributes as hyper-edge labels. An hyper-edge ((π‘œ1 , … , π‘œπ‘˜ ), π‘Ž) can be seen as the logical atom π‘Ž(π‘œ1 , … , π‘œπ‘˜ ); we use the latter notation for readability. Unary edges correspond to node labels while binary edges correspond to labeled edges. Classical FCA is a special case of Graph-FCA where π‘˜ = 1 (only unary edges) and 𝑛 = 1 (only unary concepts). Figure 2 shows the graphical representation of a graph context about dishes, cereals, and countries. Each box represents an object π‘œ along with the list of attributes π‘Ž such that there is an unary edge π‘Ž(π‘œ) ∈ 𝐼. Each arrow from box π‘œ1 to box π‘œ2 labelled by attribute π‘Ž denotes a binary edge π‘Ž(π‘œ1 , π‘œ2 ) ∈ 𝐼. An hyper-edge π‘Ž(π‘œ1 , … , π‘œπ‘˜ ) with π‘˜ > 2 is represented as an ellipsis labeled by π‘Ž, and having an arrow labeled 𝑖 to each box π‘œπ‘– (not illustrated here). Concept intents express what several 𝑛-tuples of objects have in common. They are graph patterns with 𝑛 distinguished nodes. They are called Projected Graph Patterns (PGP), and they are similar to conjunctive queries. For example, the PGP [(π‘₯, 𝑦) ← π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘(π‘₯, 𝑧), π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘(𝑦, 𝑧)] Figure 3: Compact representation of graph concepts about dishes, cereals, and countries. has three nodes π‘₯, 𝑦, 𝑧, two edges π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘(π‘₯, 𝑧) and π‘π‘Žπ‘Ÿπ‘’π‘›π‘‘(𝑦, 𝑧), and two distinguished nodes π‘₯, 𝑦. It can be used as a definition of the ”sibling” relationship, i.e. the fact that π‘₯ and 𝑦 are siblings if they have a common parent 𝑧. FCA is extended to graph contexts by extending the set operations to PGPs. PGP inclusion βŠ†π‘ž is based on graph homomorphisms, and PGP intersection βˆ©π‘ž is based on the categorical product of graphs (see [5] for details). The Galois connection is defined in Graph-FCA between PGPs (𝑄, βŠ†π‘ž ) and sets of object βˆ— tuples (2𝑂 , βŠ†). In the definitions of 𝑄 β€² and 𝑅′ below, the PGP [π‘œ ← 𝐼 ] represents the description of an object tuple π‘œ by the whole incidence relation 𝐼 seen from the relative position of π‘œ. 𝑄 β€² ∢= {π‘œ ∈ 𝑂 𝑛 ∣ 𝑄 βŠ†π‘ž [π‘œ ← 𝐼 ]}, for 𝑄 = [(π‘₯1 , … , π‘₯𝑛 ) ← 𝑃] ∈ 𝑄 𝑅′ ∢= βˆ©π‘ž {[π‘œ ← 𝐼 ] ∣ π‘œ ∈ 𝑅}, for 𝑅 βŠ† 𝑂 𝑛 , 𝑛 ∈ β„• Informally, 𝑄 β€² can be understood as the result set of 𝑄, seen as a query; and 𝑅′ as the most specific query whose result set contains 𝑅. From there, concepts can be defined in the usual way, and organized into lattices. A concept is a pair (𝑅, 𝑄) such that 𝑄 β€² = 𝑅 and 𝑅′ =π‘ž 𝑄. The arity of 𝑄 and 𝑅 must be the same, it determines the arity of the concept. Unary concepts are about sets of objects, while binary concepts are about relationships between objects, and so on. Unlike RCA, there is a concept lattice for each concept arity rather than for each object type. Figure 3 displays a compact representation of the graph concepts about dishes, cereals, and countries. It only shows unary concepts (𝑅, 𝑄), with 𝑄 = [π‘₯ ← 𝑃], s.t. |𝑅| β‰₯ 2, 𝑃 β‰  βˆ…, and 𝑃 is a β€œcore pattern” (i.e., 𝑃 has no homomorphic subset). Each box π‘₯ identifies a unary concept, with its name at the top (e.g., Q1b), and its extent at the bottom (here, Pakistan, Thailand). The concept intent is the PGP [π‘₯ ← 𝑃], where 𝑃 is the subgraph made of solid arrows containing node π‘₯. Those arrows denote binary edges while the middle part of the boxes denote unary edges. By reading the graph, we learn that Concept Q1b is the concept of β€œAsian countries, which eat a lot of some dish whose main cereal is a rice produced in the country itself”. Formally, its intent is denoted by [𝑏 ← π‘π‘œπ‘’π‘›π‘‘π‘Ÿπ‘¦(𝑏), π΄π‘ π‘–π‘Ž(𝑏), π‘’π‘Žπ‘‘πΏπ‘œπ‘‘π‘‚π‘“ (𝑏, π‘Ž), π‘‘π‘–π‘ β„Ž(π‘Ž), β„Žπ‘Žπ‘ π‘€π‘Žπ‘–π‘›πΆπ‘’π‘Ÿπ‘’π‘Žπ‘™(π‘Ž, 𝑐), π‘π‘’π‘Ÿπ‘’π‘Žπ‘™(𝑐), π‘Ÿπ‘–π‘π‘’(𝑐), π‘–π‘ π‘ƒπ‘Ÿπ‘œπ‘‘π‘’π‘π‘’π‘‘πΌ 𝑛(𝑐, 𝑏)] (using the small letters of concept names as pattern variables). Concepts Q1a and Q1c have the same graph pattern as Q1b but with a different focus, on dishes for Q1a and on cereals for Q2c. N-ary concepts are obtained by picking several distinguished nodes. For example, (Q1b,Q1a,Q1c) is a ternary concept whose instances are the object triples (π‘ƒπ‘Žπ‘˜π‘–π‘ π‘‘π‘Žπ‘›, π‘π‘–π‘Ÿπ‘¦π‘Žπ‘›π‘–, π‘π‘Žπ‘ π‘šπ‘Žπ‘‘π‘–π‘…π‘–π‘π‘’) and (𝑇 β„Žπ‘Žπ‘–π‘™π‘Žπ‘›π‘‘, π‘˜β„Žπ‘Žπ‘œπ‘€π‘Žπ‘›πΎ π‘Žπ‘–, π‘‘β„Žπ‘Žπ‘–π‘…π‘–π‘π‘’). It represents the cyclic relationship existing between a country, a dish, and a cereal in Asian countries. The Q2-concepts are generalizations of the Q1-concepts. For example, concept Q2b covers all countries. The dashed arrow from Q2b to Q1b says that Q1b is a subconcept of Q2b. The Q2-intents say that, in the context, all dishes use some rice as a main cereal, and that all cereals are produced in some country but that not all countries eat a lot of some dish, and not all dishes are eaten a lot by some country. 5. Concept-based Knowledge Graph Alignment In this section, we explain how KG alignments can be extracted from Graph-FCA concepts, and we discuss the results on the example and scenarios of Section 2. 5.1. Principle When two entities from two different KGs are equivalent, e.g. o1:z1 and o2:i1 in the example, they generally have equivalent types (e.g., o1:Person and o2:Inhabitant ), equivalent prop- erties with equal values (e.g. o1:lastname and o2:name β€œDupont”), and equivalent relations with equivalent entities (e.g., o1:home o1:h1 and o2:address o2:a1 ). Similarly, when two classes are equivalent, they are the type of equivalent entities. In FCA terms, the two equivalent entities (or classes) belong to the extent of a formal concept whose intent expresses everything they have in common. Here, the intent is not only about how the entities are described but also about how the entities are related to other entities. Moreover, the equivalence between two entities depend on the equivalence between the related entities, and so on recursively. This makes Graph-FCA, like RCA, a good match for the problem of KG alignment. More precisely, let 𝐺1 and 𝐺2 be two KGs. If an unary concept 𝐢 = (𝑅, 𝑄) has only two elements in its extent, 𝑅 = {π‘₯1 , π‘₯2 }, where π‘₯1 belong to 𝐺1 and π‘₯2 belongs to 𝐺2 , then this is a strong indication that π‘₯1 and π‘₯2 are equivalent because they are unique in their own KG to match the projected graph pattern 𝑄. Such a concept is called an anchor in [15]. Concepts that are not anchors can also contribute to the alignement, although in a less direct and less certain way. For example, a concept with extent {π‘₯1 , 𝑦1 , π‘₯2 , 𝑦2 } says that 𝐺1 -entities π‘₯1 and 𝑦1 align with 𝐺2 -entities π‘₯2 and 𝑦2 , without knowing which aligns to which. However, if we also have the anchor {π‘₯1 , π‘₯2 }, we can deduce an alignment between 𝑦1 and 𝑦2 . Another example is a concept with extent {π‘₯1 , 𝑦2 , 𝑧2 } where some ambiguity remains on the alignment of π‘₯1 but it nonetheless reduces the set of possibilities from all 𝐺2 -entities to only two entities, 𝑦2 and 𝑧2 . 5.2. Graph-FCA Representation The immediate representation of a KG as a Graph-FCA context is to use classes, properties and relations as attributes (hyper-edge labels), and entities as objects (nodes). Each occurrence of a value is represented by a distinct node labelled by the value as this avoids spurious connections between objects. This is formalized by the following mappings from KG triples to Graph-FCA hyper-edges: (𝑒, type , 𝑐) ⇝ 𝑐(𝑒) (𝑒, 𝑝, 𝑣) ⇝ 𝑝(𝑒, "𝑣") (𝑒, π‘Ÿ, 𝑒 β€² ) ⇝ π‘Ÿ(𝑒, 𝑒 β€² ) where 𝑝(𝑒, "𝑣") is a shorthand for 𝑝(𝑒, π‘œ), 𝑣(π‘œ) for some fresh object π‘œ. This immediate representation, however, does not produce any alignment because the two KG representations do not share any attribute. Indeed, the two KGs have different schemas, hence different classes, properties and relations. Recall that Graph-FCA concept intents are made of attributes and variables. As a consequence, for all concepts except the top one, the concept extent contains only objects from a single KG, and no anchor concept can be found. An adequate representation should use the same set of attributes for the two KGs to be aligned, and those attributes should reflect what they have in common. In Scenario A, the two KGs only share the values. Actually, they also share the graph structure, and the distinction between classes, properties and relations. Anything else must hence be represented as objects: i.e., not only entities but also classes, properties and relations. We achieve this by introducing three meta-level common attributes: type for expressing the relation between entities and classes, prop for relating entities to values through properties, and rel for relating entities to entities through relations. The new mapping from KG triples to Graph-FCA hyper-edges is hence as follows: (𝑒, type , 𝑐) ⇝ type(𝑒, 𝑐) (𝑒, 𝑝, 𝑣) ⇝ prop(𝑒, 𝑝, "𝑣") (𝑒, π‘Ÿ, 𝑒 β€² ) ⇝ rel(𝑒, π‘Ÿ, 𝑒 β€² ) This is a form of reification, in which classes, properties and relations are treated as objects. It is made possible by the fact that Graph-FCA supports hyper-edges beyond binary edges. As defined above, the notation "𝑣" for a value implicitly introduces an attribute 𝑣, hence accounts for the values shared between KGs. In Scenario B, in contrast to Scenario A, even the values are different in the two KGs. This means that values do not bring any information so that the mapping for properties can be simplified as (𝑒, 𝑝, 𝑣) ⇝ prop(𝑒, 𝑝). Obviously, the two mappings can be mixed if some values are shared and others not. To compensate for the lack of shared elements, a set of pre-aligned pairs (π‘₯1 , π‘₯2 ) may be given as input, and modelled as equivalence triples (π‘₯1 , sameAs , π‘₯2 ). Such Table 1 Anchor concepts in Scenario A (common values). C extent(C) intent(C), only adjacent edges π‘Ž o1:lastname o2:name π‘π‘Ÿπ‘œπ‘(𝑒, π‘Ž, "π·π‘’π‘π‘œπ‘–π‘ "), π‘π‘Ÿπ‘œπ‘(𝑓 , π‘Ž, "π·π‘’π‘π‘œπ‘–π‘ "), π‘π‘Ÿπ‘œπ‘(𝑔, π‘Ž, "π·π‘’π‘π‘œπ‘›π‘‘") 𝑏 o1:city o2:city π‘π‘Ÿπ‘œπ‘(β„Ž, 𝑏, "πΊπ‘Ÿπ‘’π‘›π‘œπ‘π‘™π‘’"), π‘π‘Ÿπ‘œπ‘(𝑖, 𝑏, "π‘ƒπ‘Žπ‘Ÿπ‘–π‘ "), π‘π‘Ÿπ‘œπ‘(𝑗, 𝑏, "π‘ƒπ‘Žπ‘Ÿπ‘–π‘ ") 𝑐 o1:owner o2:ownedBy π‘Ÿπ‘’π‘™(β„Ž, 𝑐, 𝑓 ), π‘Ÿπ‘’π‘™(𝑖, 𝑐, 𝑔), π‘Ÿπ‘’π‘™(𝑗, 𝑐, 𝑒) 𝑑 o1:home o2:address π‘Ÿπ‘’π‘™(𝑒, 𝑑, 𝑖), π‘Ÿπ‘’π‘™(𝑓 , 𝑑, 𝑗), π‘Ÿπ‘’π‘™(𝑔, 𝑑, β„Ž) 𝑒 o1:z3 o2:i3 𝑑𝑦𝑝𝑒(𝑒, 𝑙), π‘π‘Ÿπ‘œπ‘(𝑒, π‘Ž, "π·π‘’π‘π‘œπ‘–π‘ "), π‘Ÿπ‘’π‘™(𝑒, 𝑑, 𝑖), π‘Ÿπ‘’π‘™(𝑗, 𝑐, 𝑒) 𝑓 o1:z2 o2:i2 𝑑𝑦𝑝𝑒(𝑓 , 𝑙), π‘π‘Ÿπ‘œπ‘(𝑓 , π‘Ž, "π·π‘’π‘π‘œπ‘–π‘ "), π‘Ÿπ‘’π‘™(𝑓 , 𝑑, 𝑗), π‘Ÿπ‘’π‘™(β„Ž, 𝑐, 𝑓 ) 𝑔 o1:z1 o2:i1 𝑑𝑦𝑝𝑒(𝑔, 𝑙), π‘π‘Ÿπ‘œπ‘(𝑔, π‘Ž, "π·π‘’π‘π‘œπ‘›π‘‘"), π‘Ÿπ‘’π‘™(𝑔, 𝑑, β„Ž), π‘Ÿπ‘’π‘™(𝑖, 𝑐, 𝑔) β„Ž o1:h1 o2:a1 𝑑𝑦𝑝𝑒(β„Ž, π‘˜), π‘π‘Ÿπ‘œπ‘(β„Ž, 𝑏, "πΊπ‘Ÿπ‘’π‘›π‘œπ‘π‘™π‘’"), π‘Ÿπ‘’π‘™(𝑔, 𝑑, β„Ž), π‘Ÿπ‘’π‘™(β„Ž, 𝑐, 𝑓 ) 𝑖 o1:h3 o2:a3 𝑑𝑦𝑝𝑒(𝑖, π‘˜), π‘π‘Ÿπ‘œπ‘(𝑖, 𝑏, "π‘ƒπ‘Žπ‘Ÿπ‘–π‘ "), π‘Ÿπ‘’π‘™(𝑒, 𝑑, 𝑖), π‘Ÿπ‘’π‘™(𝑖, 𝑐, 𝑔) 𝑗 o1:h2 o2:a2 𝑑𝑦𝑝𝑒(𝑗, π‘˜), π‘π‘Ÿπ‘œπ‘(𝑗, 𝑏, "π‘ƒπ‘Žπ‘Ÿπ‘–π‘ "), π‘Ÿπ‘’π‘™(𝑓 , 𝑑, 𝑗), π‘Ÿπ‘’π‘™(𝑗, 𝑐, 𝑒) π‘˜ o1:House o2:Place 𝑑𝑦𝑝𝑒(β„Ž, π‘˜), 𝑑𝑦𝑝𝑒(𝑖, π‘˜), 𝑑𝑦𝑝𝑒(𝑗, π‘˜) 𝑙 o1:Person o2:Inhabitant 𝑑𝑦𝑝𝑒(𝑒, 𝑙), 𝑑𝑦𝑝𝑒(𝑓 , 𝑙), 𝑑𝑦𝑝𝑒(𝑔, 𝑙) an equivalence can be represented in Graph-FCA by using a fresh attribute π‘Žπ‘₯ that represents the real entity or schema element π‘₯ that π‘₯1 and π‘₯2 refer to. The mapping rule is (π‘₯1 , sameAs , π‘₯2 ) ⇝ π‘Žπ‘₯ (π‘₯1 ), π‘Žπ‘₯ (π‘₯2 ), for some fresh attribute π‘Žπ‘₯ . Attribute π‘Žπ‘₯ works like a nominal attribute for each KG, labelling only π‘₯1 in 𝐺1 , and only π‘₯2 in 𝐺2 . 5.3. Results and Discussion We ran our Graph-FCA implementation1 on the two scenarios in order to find anchors, and hence KG alignments. We used the following options to make their computation more efficient, and the output smaller: -minsupp 2 because anchors have by definition a support equal to 2, and -only-cores because it excludes concepts that represent under-optimal alignments (an example is given below). Scenario A (common values). Graph-FCA finds 24 unary concepts, grouped in only two patterns: Q1 and Q2. All anchor concepts are in Q1, and all concepts in Q1 are anchor concepts. Moreover, all elements of the two KGs (entities and schema) appear in those concepts. Pattern Q1 therefore represents a complete alignement between the two KGs. Table 1 lists those concepts with their identifier (a lowercase letter), their extent (an object from each KG), and their neighboring intent. For instance, concept 𝑔 shows the alignment of o1:z1 with o2:i1 , and its intent reads as follows: The two entities have type 𝑙, property π‘Ž with value β€œDupont”, relation 𝑑 to entity β„Ž, and relation 𝑐 from entity 𝑖. Each variable in the intent refers to a concept in the same pattern, and hence to an aligned pair: e.g., the common type 𝑙 is o1:Person /o2:Inhabitant , and the property with value β€œDupont” is o1:lastname /o2:name . One can observe that the alignement is also correct, not only for entities but also for classes, properties and relations. 1 Open source at https://bitbucket.org/sebferre/graph-fca/ Table 2 Anchor concepts in Scenario B (pre-aligned pairs). C extent(C) intent(C), only adjacent edges π‘Ž o1:city o2:city π‘π‘Ÿπ‘œπ‘(𝑔, π‘Ž), π‘π‘Ÿπ‘œπ‘(β„Ž, π‘Ž), π‘π‘Ÿπ‘œπ‘(𝑖, π‘Ž) 𝑏 o1:lastname o2:name π‘π‘Ÿπ‘œπ‘(𝑒, 𝑏), π‘π‘Ÿπ‘œπ‘(𝑓 , 𝑏), π‘π‘Ÿπ‘œπ‘(𝑙, 𝑏) o2:given 𝑐 o1:owner o2:ownedBy π‘Ÿπ‘’π‘™(𝑔, 𝑐, 𝑓 ), π‘Ÿπ‘’π‘™(β„Ž, 𝑐, 𝑙), π‘Ÿπ‘’π‘™(𝑖, 𝑐, 𝑒) 𝑑 o1:home o2:address π‘Ÿπ‘’π‘™(𝑒, 𝑑, β„Ž), π‘Ÿπ‘’π‘™(𝑓 , 𝑑, 𝑖), π‘Ÿπ‘’π‘™(𝑙, 𝑑, 𝑔) 𝑒 o1:z3 o2:i3 𝑑𝑦𝑝𝑒(𝑒, π‘˜), π‘π‘Ÿπ‘œπ‘(𝑒, 𝑏), π‘Ÿπ‘’π‘™(𝑒, 𝑑, β„Ž), π‘Ÿπ‘’π‘™(𝑖, 𝑐, 𝑒) 𝑓 o1:z2 o2:i2 𝑑𝑦𝑝𝑒(𝑓 , π‘˜), π‘π‘Ÿπ‘œπ‘(𝑓 , 𝑏), π‘Ÿπ‘’π‘™(𝑓 , 𝑑, 𝑖), π‘Ÿπ‘’π‘™(𝑔, 𝑐, 𝑓 ) 𝑔 o1:h1 o2:a1 𝑑𝑦𝑝𝑒(𝑔, 𝑗), π‘π‘Ÿπ‘œπ‘(𝑔, π‘Ž), π‘Ÿπ‘’π‘™(𝑔, 𝑐, 𝑓 ), π‘Ÿπ‘’π‘™(𝑙, 𝑑, 𝑔) β„Ž o1:h3 o2:a3 𝑑𝑦𝑝𝑒(β„Ž, 𝑗), π‘π‘Ÿπ‘œπ‘(β„Ž, π‘Ž), π‘Ÿπ‘’π‘™(𝑒, 𝑑, β„Ž), π‘Ÿπ‘’π‘™(β„Ž, 𝑐, 𝑙) 𝑖 o1:h2 o2:a2 𝑑𝑦𝑝𝑒(𝑖, 𝑗), π‘π‘Ÿπ‘œπ‘(𝑖, π‘Ž), π‘Ÿπ‘’π‘™(𝑓 , 𝑑, 𝑖), π‘Ÿπ‘’π‘™(𝑖, 𝑐, 𝑒) 𝑗 o1:House o2:Place 𝑑𝑦𝑝𝑒(𝑔, 𝑗), 𝑑𝑦𝑝𝑒(β„Ž, 𝑗), 𝑑𝑦𝑝𝑒(𝑖, 𝑗) π‘˜ o1:Person o2:Inhabitant 𝑑𝑦𝑝𝑒(𝑒, π‘˜), 𝑑𝑦𝑝𝑒(𝑓 , π‘˜), 𝑑𝑦𝑝𝑒(𝑙, π‘˜) 𝑙 o1:z1 o2:i1 𝑧1𝑖1(𝑙), 𝑑𝑦𝑝𝑒(𝑙, π‘˜), π‘π‘Ÿπ‘œπ‘(𝑙, 𝑏), π‘Ÿπ‘’π‘™(β„Ž, 𝑐, 𝑙), π‘Ÿπ‘’π‘™(𝑙, 𝑑, 𝑔) The concepts in pattern Q2 are general concepts. One concept gathers all entities (people and places), another all classes, and similarly for properties and relations. It simply reflects the meta-model used in our representation. Without option -only-cores , Graph-FCA outputs additional concepts that represent approximate alignments, such as {o1:z1, o1:z3, o2:i1, o2:i3} or {o1:lastname, o2:name, o2:given} . Scenario B (pre-aligned pairs). Without common values nor any pre-aligned pairs, Graph- FCA only finds the meta-concepts, like in pattern Q2 in the above scenario. Indeed, there is no rationale for making any alignment. When adding a pre-aligned pair, like o1:z1 with o2:i1 , Graph-FCA finds an almost complete alignment (see Table 2). The only uncertainty is about o1:lastname that aligns with either o2:name or o2:given . Indeed, without values, the different properties that apply to an entity cannot be distinguished. Choosing any other pair of entities as pre-aligned pair gives the same result. However, if the pre-aligned pair is about the schema (e.g., two equivalent classes), then only the schema elements are correctly aligned. Entities are grouped by type (people and places) but they are not aligned individually. Other results. We also applied our approach to the example of Abbas et al [1], which falls in Scenario A (common values). It is at the same time simpler than the example of Atencia et al because it has no relations (only classes and properties), and more difficult because there is not a one-to-one mapping between the two KGs. First, there are entities in 𝐺1 that are not present in 𝐺2 . Second, the classes are not equivalent: the 𝐺1 -class o1:Book is split in two 𝐺2 -classes, o2:Dictionary and o2:Novel ; and the 𝐺2 -class o2:FemaleScientist is equivalent to the intersection of two 𝐺1 -classes, o1:Woman and o1:Scientist . The alignments produced by Graph-FCA are grouped in two patterns, one for people and another for books, because there are no relations between them in the data. All common entities and all properties are completely and correctly aligned (one anchor concept for each pair). For classes, three interesting concepts are found: {o1:Woman, o1:Scientist, o2:FemaleScientist} , {o1:Book, o2:Novel} , and {o1:Book, o2:Dictionary} . They can be read as β€œWoman and Scientist together align with FemaleScientist”, and β€œBook aligns with either Novel or Dictionary”. This is exactly as ex- pected. Note that each entity with type o1:Book is correctly assigned to one of o2:Novel and o2:Dictionary (two distinct concepts), based on its properties (β€œauthor” for novels, β€œyear” for dictionaries). Discussion. Compared to the use of classical FCA, RCA or Pattern Structures, our approach does not build a context as the β€œproduct” of the two KGs, i.e. with all pairs of entities as objects, and all pairs of classes/properties/relations as attributes. Instead, the Graph-FCA context is built as the β€œunion” of the two KGs. This entails that the size of our context is the sum of the size of the two KGs, rather than the product. This clearly favors the scalability of our approach. A key enabling feature of Graph-FCA is the support for π‘˜-ary relations as it enables to reify the binary properties and relations as objects, along with entities, and to use them as arguments of meta-level ternary predicates (e.g., prop). Note that this is akin to RDF triples where properties belong to the same space as individuals. As a consequence, the alignment of classes, properties and relations (the schema) can be interpreted in the Graph-FCA output exactly like for entities, through concept extents. This is in contrast with other FCA-based approaches, where a different interpretation method must be used for the schema because it appears in concept intents. Although it can be difficult to read Graph-FCA output in general, a KG alignment can be read from the unary concept extents, which are sets of objects, like in classical FCA. 6. Conclusion and Perspectives Graph-FCA is FCA for knowledge graphs, and we have shown in this paper that by adequately representing two KGs in terms of what they have in common, we can extract aligned pairs of entities, classes, properties and relations from Graph-FCA concepts in a uniform way. The results in this paper are limited to a few encouraging illustrative examples in different scenarios, and much remains to be done to apply our approach to real KGs. A first issue is scalability. A research track is to only generate anchors, i.e. concepts that result from the intersection of two objects, one from each KG. A second issue is to allow for approximate matching of values, especially texts, as they often slightly differ from one KG to another. Acknowledgments This research is supported by ANR project SmartFCA (ANR-21-CE23-0023). References [1] Abbas, N., David, J., Napoli, A.: Discovery of link keys in RDF data based on pattern structures: Preliminary steps. In: Int. Conf. Concept Lattices and Their Applications (2020) [2] Ardjani, F., Bouchiha, D., Malki, M.: Ontology-alignment techniques: Survey and analysis. Int. J. Modern Education & Computer Science 7(11) (2015) [3] Atencia, M., David, J., Euzenat, J., Napoli, A., Vizzini, J.: Link key candidate extraction with relational concept analysis. Discrete applied mathematics 273, 2–20 (2020) [4] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embed- dings for modeling multi-relational data. In: Advances in neural information processing systems. pp. 2787–2795 (2013) [5] FerrΓ©, S., Cellier, P.: Graph-FCA: An extension of formal concept analysis to knowledge graphs. Discrete Applied Mathematics 273, 81–102 (2019) [6] Ganter, B., Kuznetsov, S.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) Int. Conf. Conceptual Structures. pp. 129–142. LNCS 2120, Springer (2001) [7] Ganter, B., Wille, R.: Formal Concept Analysis β€” Mathematical Foundations. Springer (1999) [8] Hahn, G., Tardif, C.: Graph homomorphisms: structure and symmetry. In: Graph symmetry, pp. 107–166. Springer (1997) [9] Hitzler, P., Krotzschβ€ž M., Rudolph, S.: Foundations of Semantic Web Technologies. Chap- man & Hall/CRC (2009) [10] Lacroix, T., Usunier, N., Obozinski, G.: Canonical tensor decomposition for knowledge base completion. In: Dy, J.G., Krause, A. (eds.) Int. Conf. Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 2869–2878. PMLR (2018) [11] Rouane-Hacene, M., Huchard, M., Napoli, A., Valtchev, P.: Relational concept analysis: mining concept lattices from multi-relational data. Annals of Mathematics and Artificial Intelligence 67(1), 81–108 (2013) [12] Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: The Semantic Web Conf. (ESWC). pp. 593–607. Springer (2018) [13] Shao, C., Hu, L.M., Li, J.Z., Wang, Z.C., Chung, T., Xia, J.B.: Rimom-im: A novel iterative framework for instance matching. J. computer science and technology 31(1), 185–197 (2016) [14] Zeng, K., Li, C., Hou, L., Li, J., Feng, L.: A comprehensive survey of entity alignment for knowledge graphs. AI Open 2, 1–13 (2021) [15] Zhao, M., Zhang, S., Li, W., Chen, G.: Matching biomedical ontologies based on formal concept analysis. J. biomedical semantics 9(1), 1–27 (2018)