1. Introduction

CONNOR: Exploring Similarities in Graphs with Concepts of Neighbors

Hugo Ayats

hugo.ayats@irisa.fr 0

Peggy Cellier

peggy.cellier@irisa.fr 0

Sébastien Ferré

0 0 Univ Rennes, INSA, CNRS, IRISA. Campus de Beaulieu , 35042 Rennes , France

2022

Since its first formalization, the Formal Concept Analysis (FCA) field has shown diverse extensions of the FCA paradigm. A recent example is Graph-FCA, an extension of FCA to graphs. In the context of Graph-FCA, a notion of concept of neighbors has been introduced to support a form of nearest neighbor search over the nodes of a graph. Concepts of neighbors have been used for diverse tasks, such as knowledge graph completion and relation classification in texts. In this paper, we present CONNOR, a Java library for the computation of concepts of neighbors on RDF graphs.

1. Introduction 2. Theoretical Background

This section introduces definitions related to the concepts of neighbors. Details can be found in the reference journal paper [7].

Definition 1. A graph context is a triple = (, , ) where is a set of objects, a set of attributes and ⊆ ∗ × is an incidence relation between tuples of objects and attributes.

Definition 2. A graph pattern ⊆ ∗ × is a set of labeled directed edges and labeled nodes with variables as nodes and attributes as labels.

A projected graph pattern (PGP) is a couple = ( , ) where is a graph pattern and ∈ ∗ a projection tuple. The arity of a PGP is the length of . A PGP of arity is also called a -PGP.

Figure 1 (b) represents a 2-PGP defining the ”sibling” binary relation (two persons sharing a male parent and a female parent). Projection variables are in bold. In practice, PGPs can be seen as queries on the graph context. In this paper we call set of answers of a -PGP the set of the -tuples of objects that are the answers of the PGP considered as a query.

A PGP inclusion relation (noted ⊆ ) can be defined to express the fact that a -PGP 1 is more specific than another -PGP 2 (i.e. by renaming the variables of 2 we can obtain a PGP having the same projection tuple than 1 and having its graph pattern included in the graph pattern of 1). In this case, we denote 2 ⊆ 1. If 1 ⊆ 2 and 2 ⊆ 1, they are said equivalent ( 1 ≡ 2). The most specific query for a given answer set is the query (under equivalency relation) that is more specific to every other queries having this set as answer set. Definition 3. A graph concept of arity (also called -concept) is a pair (, ) , being a set of -tuples of objects (called extension) and being a -PGP (called intension), such that: • is the set of answers of ; • is the most specific -PGP having as set of answers (under the equivalence relation).

For example, let us consider (, ) where is the PGP presented in Figure 1 (b) and is the set of couples: {( , ), … , ( , ), ( , ℎ ), … , (ℎ , ℎ )} . This concept represents the notion of sibling: the intension is a PGP describing the couple of persons having a same mother and a same father, and the extension all the couples of entities respecting this pattern.

Definition 4. The conceptual distance between two -tuples of objects 1 and 2 is the -concept ( 1, 2) = (, ) where the intension is the most specific -PGP such that the extension contains 1 and 2.

It can be proven that this conceptual distance has properties similar to a numerical distance, such as positivity, symmetry and triangle inequality, by using the concept extension inclusion as a partial order and a notion of conceptual supremum as addition. The extensional distance ( 1, 2) = |( 1, 2).| can be used as a (degraded) numerical distance to evaluate the dissimilarity between 1 and 2 as the number of -tuples between them.

The concepts of neighbours of a -tuple is the set of -concepts C-N () = Definition 5. {( ′, )| ′ ∈ }.

The proper extension of a concept . for ∈ C-N () is the set of -tuples of its extension that are not in the extension of a more specific concept. The set of proper extensions forms a partition of , and therefore the set of extensions forms a hierarchical clustering of .

Figure 2 presents the five concepts of neighbors of Charlotte in the graph context presented in Figure 1 (a). On the right, the extensions are presented as a Venn diagram, and on the left the intensions are expressed in plain English for simplicity. The first concept has for intension the whole graph centered on Charlotte and has only Chalotte in its extension. Then there are two larger concepts, one describing the children of Kate and William (Charlotte and George), and another one describing the women. Then there is an even larger concept describing the people having a father, a mother and a sibling (Harry, William, Charlotte, and George). Finally, there is the top concept, having an empty intension and as extension. / / ( 1 ) C r e a t i o n o f t h e model ConnorModel model = new ConnorModel ( rdfModel ) ; / / ( 2 ) C r e a t i o n o f t h e t a r g e t L i s t < S t r i n g > t a r g e t = new A r r a y L i s t < > ( { ” h t t p : / / example . org / r o y a l / C h a r l o t t e ” } ) ; / / ( 3 ) C r e a t i o n o f t h e i n i t t a b l e T a b l e t a b l e = T a b l e . o f A r i t y ( 1 ) ; rdfModel . l i s t S u b j e c t s ( ) . f o r E a c h R e m a i n i n g ( s −> t a b l e . addInitRow ( I n t S t r e a m . o f ( 1 ) . mapToObj ( i −> s . asNode ( ) ) . c o l l e c t ( C o l l e c t o r s . t o L i s t ( ) ) ) ; ) ; / / ( 4 ) C r e a t i o n o f t h e p a r t i t i o n C o n n o r P a r t i t i o n p a r t i t i o n = new C o n n o r P a r t i t i o n ( model , t a r g e t , t a b l e , 0 ) ; / / ( 5 ) C r e a t i o n o f t h e t h r e a d and c o m p u t a t i o n o f t h e C−N AtomicBoolean c u t = new AtomicBoolean ( f a l s e ) ; ConnorThread t h r e a d = new ConnorThread ( p a r t i t i o n , c u t ) ; t h r e a d . s t a r t ( ) ; t h r e a d . j o i n ( 2 ∗ 1 0 0 0 ) ; / / 2 0 0 0 ms = 2 s c u t . s e t ( true ) ; / / ( 6 ) P r i n t o f t h e r e s u l t s System . out . p r i n t ( p a r t i t i o n . t o J s o n ( ) ) ;

3. CONNOR: Computation of CONcepts of NeighbORs with a Java Library

In this section we present CONNOR, a Java library for the computation of the Concepts of Neighbors. This library is a free and open-source software 1, based on Apache Jena2, a wellknown Java library for the Semantic Web. As the Semantic Web is based on the RDF standard, and as RDF graphs only represent labelled directed multigraphs, the library only handles graph contexts with unary relations (node labels) and binary relations (labelled directed edges). This library implements data structures for manipulating concepts of neighbors, as well as an algorithm for their computation. Table 1 presents an example code for the computation of the concepts of neighbors of Charlotte in the graph context presented in Figure 1 (a).

The CONNOR library is structured into four main classes, which are detailed below. 1Accessible here: https://gitlab.inria.fr/hayats/CONNOR 2https://jena.apache.org/ ConnorModel This class encapsulates an RDF model that represents a graph context. There are two ways to create a model using this class. The first way consists in creating an empty model and adding triples one by one. The second way consists in creating a model from a pre-existing RDF model. Section ( 1 ) of Table 1 shows an example of creation of ConnorModel from a RDF graph.

ConceptOfNeighbors As its name tells, this class represents a concept of neighbors, and hence a graph concept too. As expected, an object of this class is characterized by its intension (decomposed into two attributes: the list of the projection variables and the graph pattern), its extension and its proper extension. Those elements can be accessed through the methods getProjectionVars(), getIntensionBody(), getExtension() and getProperExtension(). The creation of objects of this class is handled by the ConnorPartition class, and should not be done by library users.

ConnorPartition This class is central to the computation of concepts of neighbors. It takes its name from the fact that, as presented above, the proper extensions of the concepts of neighbors form a partition of the set of objects. This class contains all the information needed for the computations of concepts of neighbors, such as of course the graph context (represented by a ConnorModel), the concepts of neighbors once the computation is done (represented by a collection of ConceptOfNeighbors objects), but also the tuple of objects (called target) for which we want to compute the concepts of neighbors. A ConnorPartition object can be translated into JSON for serialization and further processing. An exemple of serialization in JSON is given by Section ( 6 ) of Table 1.

A key aspect of this class is that it implements an anytime algorithm for the computation of concepts of neighbors. Presented in [5], this algorithm starts with the top concept and, by successively trying to add elements to the intension, refines it in more specific concepts that still include the target. This way, the algorithm can be interrupted at each refinement step.

To use this class, the base constructor of this object takes as argument a ConnorModel object, the target (represented by a list of URIs), the partition domain and a maxDepth parameter. An example of usage is given by Section ( 4 ) of Table 1. The partitioning domain is called initializationTable, which is a Table object which represents a set of tuples of entities. The role of this argument is to stipulate which set of tuples should appear in the extensions of the concepts. An example of construction of such a table is given in Section ( 5 ) of Table 1. Concerning the maxDepth parameter, its role is to set, if desired, a limit to the depth of the intension from the elements of the target. If set to zero, no limit is applied.

The class method to call in order to launch the computation of concepts of neighbors is called fullPartitioning(cut). Taking a mutable Boolean named cut as a parameter, it refines the partition until convergence or until cut is switched to true.

ConnorThread This class encapsulates the computation of concepts of neighbors using a ConnorPartition in a Java thread, so that the main thread just needs to launch this thread and switch cut to true when desired. An example of usage is given by Section ( 5 ) of Table 1.

4. Conclusion

This paper introduces CONNOR, a Java library for the computation of concepts of neighbors on RDF graphs. After having introduced the theoretical notions related to Graph-FCA (an extension of FCA to graphs) and to the concepts of neighbors, this paper presents the main classes of CONNOR, and details their working through an example code for the computation of concepts of neighbors.

Acknowledgments

This research is supported by ANR project SmartFCA (ANR-21-CE23-0023).

[1]

Rouane-Hacene ,

Huchard ,

Napoli ,

Valtchev , Relational concept analysis: mining concept lattices from multi-relational data , Annals of Mathematics and Artificial Intelligence 67 ( 2013 ) 81 - 108 . doi: 10 .1007/s10472-012-9329-3.

[2]

Ferré , A Proposal for Extending Formal Concept Analysis to Knowledge Graphs , in: International Conference on Formal Concept Analysis , volume 9113 , Springer International Publishing, Cham, 2015 , pp. 271 - 286 . doi: 10 .1007/978-3- 319 -19545-2_ 17 .

[3]

Ferré , Application of Concepts of Neighbours to Knowledge Graph Completion , Data Science Pre-press ( 2020 ) 1 - 28 .

[4]

S. O.

Kuznetsov , Fitting Pattern Structures to Knowledge Discovery in Big Data , in: Int. Conf. on Formal Concept Analysis , volume 7880 , 2013 , pp. 254 - 266 . doi: 10 .1007/ 978-3- 642 -38317-5_ 17 .

[5]

Ferré , Answers Partitioning and Lazy Joins for Eficient Query Relaxation and Application to Similarity Search, in: The Semantic Web , Cham, 2018 , pp. 209 - 224 . doi: 10 .1007/ 978-3- 319 -93417-4_ 14 .

[6]

Ayats ,

Cellier ,

Ferré , A Two-Step Approach for Explainable Relation Extraction , in: Int. Symposium on Intelligence Data Analysis , 2022 , pp. 14 - 25 . doi: 10 .1007/ 978-3- 031 -01333- 1 _ 2 .

[7]

Ferré ,

Huchard ,

Kaytoue ,

S. O.

Kuznetsov ,

Napoli , Formal Concept Analysis: From Knowledge Discovery to Knowledge Processing, A Guided Tour of Artificial Intelligence Research: Volume II: AI Algorithms ( 2020 ) 411 - 445 . doi: 10 .1007/978-3- 030 -06167-8_ 13 .