=Paper= {{Paper |id=Vol-3308/paper19 |storemode=property |title=CONNOR: Exploring Similarities in Graphs with Concepts of Neighbors |pdfUrl=https://ceur-ws.org/Vol-3308/Paper19.pdf |volume=Vol-3308 |authors=Hugo Ayats,Peggy Cellier,Sébastien Ferré |dblpUrl=https://dblp.org/rec/conf/cla/AyatsCF22 }} ==CONNOR: Exploring Similarities in Graphs with Concepts of Neighbors== https://ceur-ws.org/Vol-3308/Paper19.pdf
CONNOR: Exploring Similarities in Graphs with
Concepts of Neighbors
Hugo Ayats1,∗ , Peggy Cellier1 and Sébastien Ferré1
1
    Univ Rennes, INSA, CNRS, IRISA. Campus de Beaulieu, 35042 Rennes, France


                                         Abstract
                                         Since its first formalization, the Formal Concept Analysis (FCA) field has shown diverse extensions of
                                         the FCA paradigm. A recent example is Graph-FCA, an extension of FCA to graphs. In the context of
                                         Graph-FCA, a notion of concept of neighbors has been introduced to support a form of nearest neighbor
                                         search over the nodes of a graph. Concepts of neighbors have been used for diverse tasks, such as
                                         knowledge graph completion and relation classification in texts. In this paper, we present CONNOR, a
                                         Java library for the computation of concepts of neighbors on RDF graphs.




1. Introduction
Introduced in the early 1980’s, Formal Concept Analysis (FCA) enables to analyze a collection
of objects and properties as a hierarchy of concepts. Several extensions of FCA have been
proposed to handle more complex objects and properties, such as multi-relational data: e.g.,
Relational Concept Analysis (RCA) [1] or Graph-FCA [2]. In the context of Graph-FCA, the
notion of concepts of neighbors [3] has been introduced as a lazy learning approach to explore
similarities between tuples of nodes in a graph.
   Inspired by the work of Kuznetsov on Pattern Structures for classification [4], concepts of
neighbors are only those concepts that result from the intersection of a given node (or tuple
of nodes) with other (tuples of) nodes, in order to support a decision process about the given
node. Compared to computing all concepts, this decreases the worst case number of concepts
from exponential to linear. This can be seen as a form of conceptual k Nearest Neighbors (kNN)
algorithm on graph nodes, and like for the kNN approach, possible applications are numerous.
So far, concepts of neighbors have been applied to knowledge graph completion [3], query
relaxation [5], and relation classification in texts [6].
   This paper introduce CONNOR, a Java library for the computation of concepts of neighbors
in the context of the semantic web. Based on the widely-used Apache Jena library, this library
introduces data structures for representing Graph-FCA notions and implements an efficient
algorithm for computing concepts of neighbors on RDF graphs.



Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 219–224.
∗
    Corresponding author.
Envelope-Open hugo.ayats@irisa.fr (H. Ayats); peggy.cellier@irisa.fr (P. Cellier); sebastien.ferre@irisa.fr (S. Ferré)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: (a) Context graph representing the British royal family
(b) PGP representing the sibling relationship between 𝑥 and 𝑦


2. Theoretical Background
This section introduces definitions related to the concepts of neighbors. Details can be found in
the reference journal paper [7].

Definition 1. A graph context is a triple 𝐾 = (𝑂, 𝐴, 𝐼 ) where 𝑂 is a set of objects, 𝐴 a set of
attributes and 𝐼 ⊆ 𝑂 ∗ × 𝐴 is an incidence relation between tuples of objects and attributes.

   Figure 1 (a) shows the graph context representing a fragment of the family tree of the British
royal family as a graph context. The boxes are the objects (e.g. Charlotte or Harry), the types
inside boxes are unary attributes (e.g. man, woman) and the arrow labels are the binary attributes
(e.g. parent).

Definition 2. A graph pattern 𝑃 ⊆ 𝒱 ∗ × 𝐴 is a set of labeled directed edges and labeled nodes
with variables as nodes and attributes as labels.
   A projected graph pattern (PGP) is a couple 𝑄 = (𝑥, 𝑃) where 𝑃 is a graph pattern and 𝑥 ∈ 𝒱 ∗
a projection tuple. The arity of a PGP is the length of 𝑥. A PGP of arity 𝑘 is also called a 𝑘-PGP.

   Figure 1 (b) represents a 2-PGP defining the ”sibling” binary relation (two persons sharing
a male parent and a female parent). Projection variables are in bold. In practice, PGPs can be
seen as queries on the graph context. In this paper we call set of answers of a 𝑘-PGP the set of
the 𝑘-tuples of objects that are the answers of the PGP considered as a query.
   A PGP inclusion relation (noted ⊆𝑄 ) can be defined to express the fact that a 𝑘-PGP 𝑄1
is more specific than another 𝑘-PGP 𝑄2 (i.e. by renaming the variables of 𝑄2 we can obtain a
PGP having the same projection tuple than 𝑄1 and having its graph pattern included in the
graph pattern of 𝑄1 ). In this case, we denote 𝑄2 ⊆𝑄 𝑄1 . If 𝑄1 ⊆𝑄 𝑄2 and 𝑄2 ⊆𝑄 𝑄1 , they are said
equivalent (𝑄1 ≡𝑄 𝑄2 ). The most specific query for a given answer set is the query (under
equivalency relation) that is more specific to every other queries having this set as answer set.
Figure 2: Concepts of Neighbors of Charlotte


Definition 3. A graph concept of arity 𝑘 (also called 𝑘-concept) is a pair (𝑅, 𝑄), 𝑅 being a set of
𝑘-tuples of objects (called extension) and 𝑄 being a 𝑘-PGP (called intension), such that:
    • 𝑅 is the set of answers of 𝑄;
    • 𝑄 is the most specific 𝑘-PGP having 𝑅 as set of answers (under the equivalence relation).
   For example, let us consider (𝑅, 𝑄) where 𝑄 is the PGP presented in Figure 1 (b) and 𝑅 is the set
of couples: {(𝐻 𝑎𝑟𝑟𝑦, 𝑊 𝑖𝑙𝑙𝑖𝑎𝑚), … , (𝐻 𝑎𝑟𝑟𝑦, 𝐻 𝑎𝑟𝑟𝑦), (𝐺𝑒𝑜𝑟𝑔𝑒, 𝐶ℎ𝑎𝑟𝑙𝑜𝑡𝑡𝑒), … , (𝐶ℎ𝑎𝑟𝑙𝑜𝑡𝑡𝑒, 𝐶ℎ𝑎𝑟𝑙𝑜𝑡𝑡𝑒)}.
This concept represents the notion of sibling: the intension is a PGP describing the couple of
persons having a same mother and a same father, and the extension all the couples of entities
respecting this pattern.
Definition 4. The conceptual distance between two 𝑘-tuples of objects 𝑜1 and 𝑜2 is the 𝑘-concept
𝛿(𝑜1 , 𝑜2 ) = (𝑅, 𝑄) where the intension 𝑄 is the most specific 𝑘-PGP such that the extension 𝑅 contains
𝑜1 and 𝑜2 .
   It can be proven that this conceptual distance has properties similar to a numerical distance,
such as positivity, symmetry and triangle inequality, by using the concept extension inclusion
as a partial order and a notion of conceptual supremum as addition. The extensional dis-
tance 𝑑(𝑜1 , 𝑜2 ) = |𝛿(𝑜1 , 𝑜2 ).𝑒𝑥𝑡| can be used as a (degraded) numerical distance to evaluate the
dissimilarity between 𝑜1 and 𝑜2 as the number of 𝑘-tuples between them.
Definition 5. The concepts of neighbours of a 𝑘-tuple 𝑜 is the set of 𝑘-concepts C-N (𝑜) =
{𝛿(𝑜′ , 𝑜)|𝑜′ ∈ 𝑂 𝑘 }.
   The proper extension of a concept 𝛿.𝑝𝑟𝑜𝑝𝐸𝑥𝑡 for 𝛿 ∈ C-N (𝑜) is the set of 𝑘-tuples of its
extension that are not in the extension of a more specific concept. The set of proper extensions
forms a partition of 𝑂 𝑘 , and therefore the set of extensions forms a hierarchical clustering of
𝑂 𝑘.
   Figure 2 presents the five concepts of neighbors of Charlotte in the graph context presented
in Figure 1 (a). On the right, the extensions are presented as a Venn diagram, and on the left the
intensions are expressed in plain English for simplicity. The first concept has for intension the
whole graph centered on Charlotte and has only Chalotte in its extension. Then there are two
larger concepts, one describing the children of Kate and William (Charlotte and George), and
another one describing the women. Then there is an even larger concept describing the people
having a father, a mother and a sibling (Harry, William, Charlotte, and George). Finally, there is
the top concept, having an empty intension and 𝑂 as extension.
/ / ( 1 ) C r e a t i o n o f t h e model
ConnorModel model = new ConnorModel ( r d f M o d e l ) ;
/ / ( 2 ) Creation of the t a r g e t
L i s t < S t r i n g > t a r g e t = new A r r a y L i s t < > ( { ” h t t p : / / example . o r g /
       royal / Charlotte ” } ) ;
/ / ( 3 ) Creation of the i n i t t a b l e
Table table = Table . ofArity ( 1 ) ;
r d f M o d e l . l i s t S u b j e c t s ( ) . f o r E a c h R e m a i n i n g ( s −>
                 t a b l e . addInitRow ( IntStream . of ( 1 )
                         . mapToObj ( i −> s . asNode ( ) )
                         . collect ( Collectors . toList () ) ) ;
);
/ / ( 4 ) Creation of the p a r t i t i o n
C o n n o r P a r t i t i o n p a r t i t i o n = new C o n n o r P a r t i t i o n ( model , t a r g e t ,
       table , 0) ;
/ / ( 5 ) C r e a t i o n o f t h e t h r e a d and c o m p u t a t i o n o f t h e C−N
A t o m i c B o o l e a n c u t = new A t o m i c B o o l e a n ( f a l s e ) ;
ConnorThread t h r e a d = new ConnorThread ( p a r t i t i o n , c u t ) ;
thread . s t a r t ( ) ;
t h r e a d . j o i n ( 2 ∗ 1 0 0 0 ) ; / / 2 0 0 0 ms = 2 s
cut . s e t ( true ) ;
/ / ( 6 ) P r i n t of the r e s u l t s
System . o u t . p r i n t ( p a r t i t i o n . t o J s o n ( ) ) ;

Table 1
Example of usage of CONNOR


3. CONNOR: Computation of CONcepts of NeighbORs with a
   Java Library
In this section we present CONNOR, a Java library for the computation of the Concepts of
Neighbors. This library is a free and open-source software1 , based on Apache Jena2 , a well-
known Java library for the Semantic Web. As the Semantic Web is based on the RDF standard,
and as RDF graphs only represent labelled directed multigraphs, the library only handles
graph contexts with unary relations (node labels) and binary relations (labelled directed edges).
This library implements data structures for manipulating concepts of neighbors, as well as an
algorithm for their computation. Table 1 presents an example code for the computation of the
concepts of neighbors of Charlotte in the graph context presented in Figure 1 (a).
   The CONNOR library is structured into four main classes, which are detailed below.


1
    Accessible here: https://gitlab.inria.fr/hayats/CONNOR
2
    https://jena.apache.org/
ConnorModel    This class encapsulates an RDF model that represents a graph context. There
are two ways to create a model using this class. The first way consists in creating an empty
model and adding triples one by one. The second way consists in creating a model from a
pre-existing RDF model. Section (1) of Table 1 shows an example of creation of ConnorModel
from a RDF graph.

ConceptOfNeighbors       As its name tells, this class represents a concept of neighbors, and
hence a graph concept too. As expected, an object of this class is characterized by its intension
(decomposed into two attributes: the list of the projection variables and the graph pattern),
its extension and its proper extension. Those elements can be accessed through the methods
getProjectionVars() , getIntensionBody() , getExtension() and getProperExtension() .
The creation of objects of this class is handled by the ConnorPartition class, and should not
be done by library users.

ConnorPartition        This class is central to the computation of concepts of neighbors. It takes its
name from the fact that, as presented above, the proper extensions of the concepts of neighbors
form a partition of the set of objects. This class contains all the information needed for the
computations of concepts of neighbors, such as of course the graph context (represented by
a ConnorModel ), the concepts of neighbors once the computation is done (represented by a
collection of ConceptOfNeighbors objects), but also the tuple of objects (called target) for which
we want to compute the concepts of neighbors. A ConnorPartition object can be translated
into JSON for serialization and further processing. An exemple of serialization in JSON is given
by Section (6) of Table 1.
   A key aspect of this class is that it implements an anytime algorithm for the computation
of concepts of neighbors. Presented in [5], this algorithm starts with the top concept and, by
successively trying to add elements to the intension, refines it in more specific concepts that
still include the target. This way, the algorithm can be interrupted at each refinement step.
   To use this class, the base constructor of this object takes as argument a ConnorModel object,
the target (represented by a list of URIs), the partition domain and a maxDepth parameter.
An example of usage is given by Section (4) of Table 1. The partitioning domain is called
initializationTable , which is a Table object which represents a set of tuples of entities.
The role of this argument is to stipulate which set of tuples should appear in the extensions
of the concepts. An example of construction of such a table is given in Section (5) of Table 1.
Concerning the maxDepth parameter, its role is to set, if desired, a limit to the depth of the
intension from the elements of the target. If set to zero, no limit is applied.
   The class method to call in order to launch the computation of concepts of neighbors is called
fullPartitioning(cut) . Taking a mutable Boolean named cut as a parameter, it refines the
partition until convergence or until cut is switched to true .

ConnorThread This class encapsulates the computation of concepts of neighbors using a
ConnorPartition in a Java thread, so that the main thread just needs to launch this thread and
switch cut to true when desired. An example of usage is given by Section (5) of Table 1.
4. Conclusion
This paper introduces CONNOR, a Java library for the computation of concepts of neighbors
on RDF graphs. After having introduced the theoretical notions related to Graph-FCA (an
extension of FCA to graphs) and to the concepts of neighbors, this paper presents the main
classes of CONNOR, and details their working through an example code for the computation of
concepts of neighbors.


Acknowledgments
This research is supported by ANR project SmartFCA (ANR-21-CE23-0023).


References
[1] M. Rouane-Hacene, M. Huchard, A. Napoli, P. Valtchev, Relational concept analysis: mining
    concept lattices from multi-relational data, Annals of Mathematics and Artificial Intelligence
    67 (2013) 81–108. doi:10.1007/s10472- 012- 9329- 3 .
[2] S. Ferré, A Proposal for Extending Formal Concept Analysis to Knowledge Graphs, in:
    International Conference on Formal Concept Analysis, volume 9113, Springer International
    Publishing, Cham, 2015, pp. 271–286. doi:10.1007/978- 3- 319- 19545- 2_17 .
[3] S. Ferré, Application of Concepts of Neighbours to Knowledge Graph Completion, Data
    Science Pre-press (2020) 1–28.
[4] S. O. Kuznetsov, Fitting Pattern Structures to Knowledge Discovery in Big Data, in:
    Int. Conf. on Formal Concept Analysis, volume 7880, 2013, pp. 254–266. doi:10.1007/
    978- 3- 642- 38317- 5_17 .
[5] S. Ferré, Answers Partitioning and Lazy Joins for Efficient Query Relaxation and Application
    to Similarity Search, in: The Semantic Web, Cham, 2018, pp. 209–224. doi:10.1007/
    978- 3- 319- 93417- 4_14 .
[6] H. Ayats, P. Cellier, S. Ferré, A Two-Step Approach for Explainable Relation Extrac-
    tion, in: Int. Symposium on Intelligence Data Analysis, 2022, pp. 14–25. doi:10.1007/
    978- 3- 031- 01333- 1_2 .
[7] S. Ferré, M. Huchard, M. Kaytoue, S. O. Kuznetsov, A. Napoli, Formal Concept Analysis: From
    Knowledge Discovery to Knowledge Processing, A Guided Tour of Artificial Intelligence
    Research: Volume II: AI Algorithms (2020) 411–445. doi:10.1007/978- 3- 030- 06167- 8_13 .