CONNOR: Exploring Similarities in Graphs with Concepts of Neighbors Hugo Ayats1,βˆ— , Peggy Cellier1 and SΓ©bastien FerrΓ©1 1 Univ Rennes, INSA, CNRS, IRISA. Campus de Beaulieu, 35042 Rennes, France Abstract Since its first formalization, the Formal Concept Analysis (FCA) field has shown diverse extensions of the FCA paradigm. A recent example is Graph-FCA, an extension of FCA to graphs. In the context of Graph-FCA, a notion of concept of neighbors has been introduced to support a form of nearest neighbor search over the nodes of a graph. Concepts of neighbors have been used for diverse tasks, such as knowledge graph completion and relation classification in texts. In this paper, we present CONNOR, a Java library for the computation of concepts of neighbors on RDF graphs. 1. Introduction Introduced in the early 1980’s, Formal Concept Analysis (FCA) enables to analyze a collection of objects and properties as a hierarchy of concepts. Several extensions of FCA have been proposed to handle more complex objects and properties, such as multi-relational data: e.g., Relational Concept Analysis (RCA) [1] or Graph-FCA [2]. In the context of Graph-FCA, the notion of concepts of neighbors [3] has been introduced as a lazy learning approach to explore similarities between tuples of nodes in a graph. Inspired by the work of Kuznetsov on Pattern Structures for classification [4], concepts of neighbors are only those concepts that result from the intersection of a given node (or tuple of nodes) with other (tuples of) nodes, in order to support a decision process about the given node. Compared to computing all concepts, this decreases the worst case number of concepts from exponential to linear. This can be seen as a form of conceptual k Nearest Neighbors (kNN) algorithm on graph nodes, and like for the kNN approach, possible applications are numerous. So far, concepts of neighbors have been applied to knowledge graph completion [3], query relaxation [5], and relation classification in texts [6]. This paper introduce CONNOR, a Java library for the computation of concepts of neighbors in the context of the semantic web. Based on the widely-used Apache Jena library, this library introduces data structures for representing Graph-FCA notions and implements an efficient algorithm for computing concepts of neighbors on RDF graphs. Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16π‘‘β„Ž International Conference on Concept Lattices and Their Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 219–224. βˆ— Corresponding author. Envelope-Open hugo.ayats@irisa.fr (H. Ayats); peggy.cellier@irisa.fr (P. Cellier); sebastien.ferre@irisa.fr (S. FerrΓ©) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: (a) Context graph representing the British royal family (b) PGP representing the sibling relationship between π‘₯ and 𝑦 2. Theoretical Background This section introduces definitions related to the concepts of neighbors. Details can be found in the reference journal paper [7]. Definition 1. A graph context is a triple 𝐾 = (𝑂, 𝐴, 𝐼 ) where 𝑂 is a set of objects, 𝐴 a set of attributes and 𝐼 βŠ† 𝑂 βˆ— Γ— 𝐴 is an incidence relation between tuples of objects and attributes. Figure 1 (a) shows the graph context representing a fragment of the family tree of the British royal family as a graph context. The boxes are the objects (e.g. Charlotte or Harry), the types inside boxes are unary attributes (e.g. man, woman) and the arrow labels are the binary attributes (e.g. parent). Definition 2. A graph pattern 𝑃 βŠ† 𝒱 βˆ— Γ— 𝐴 is a set of labeled directed edges and labeled nodes with variables as nodes and attributes as labels. A projected graph pattern (PGP) is a couple 𝑄 = (π‘₯, 𝑃) where 𝑃 is a graph pattern and π‘₯ ∈ 𝒱 βˆ— a projection tuple. The arity of a PGP is the length of π‘₯. A PGP of arity π‘˜ is also called a π‘˜-PGP. Figure 1 (b) represents a 2-PGP defining the ”sibling” binary relation (two persons sharing a male parent and a female parent). Projection variables are in bold. In practice, PGPs can be seen as queries on the graph context. In this paper we call set of answers of a π‘˜-PGP the set of the π‘˜-tuples of objects that are the answers of the PGP considered as a query. A PGP inclusion relation (noted βŠ†π‘„ ) can be defined to express the fact that a π‘˜-PGP 𝑄1 is more specific than another π‘˜-PGP 𝑄2 (i.e. by renaming the variables of 𝑄2 we can obtain a PGP having the same projection tuple than 𝑄1 and having its graph pattern included in the graph pattern of 𝑄1 ). In this case, we denote 𝑄2 βŠ†π‘„ 𝑄1 . If 𝑄1 βŠ†π‘„ 𝑄2 and 𝑄2 βŠ†π‘„ 𝑄1 , they are said equivalent (𝑄1 ≑𝑄 𝑄2 ). The most specific query for a given answer set is the query (under equivalency relation) that is more specific to every other queries having this set as answer set. Figure 2: Concepts of Neighbors of Charlotte Definition 3. A graph concept of arity π‘˜ (also called π‘˜-concept) is a pair (𝑅, 𝑄), 𝑅 being a set of π‘˜-tuples of objects (called extension) and 𝑄 being a π‘˜-PGP (called intension), such that: β€’ 𝑅 is the set of answers of 𝑄; β€’ 𝑄 is the most specific π‘˜-PGP having 𝑅 as set of answers (under the equivalence relation). For example, let us consider (𝑅, 𝑄) where 𝑄 is the PGP presented in Figure 1 (b) and 𝑅 is the set of couples: {(𝐻 π‘Žπ‘Ÿπ‘Ÿπ‘¦, π‘Š π‘–π‘™π‘™π‘–π‘Žπ‘š), … , (𝐻 π‘Žπ‘Ÿπ‘Ÿπ‘¦, 𝐻 π‘Žπ‘Ÿπ‘Ÿπ‘¦), (πΊπ‘’π‘œπ‘Ÿπ‘”π‘’, πΆβ„Žπ‘Žπ‘Ÿπ‘™π‘œπ‘‘π‘‘π‘’), … , (πΆβ„Žπ‘Žπ‘Ÿπ‘™π‘œπ‘‘π‘‘π‘’, πΆβ„Žπ‘Žπ‘Ÿπ‘™π‘œπ‘‘π‘‘π‘’)}. This concept represents the notion of sibling: the intension is a PGP describing the couple of persons having a same mother and a same father, and the extension all the couples of entities respecting this pattern. Definition 4. The conceptual distance between two π‘˜-tuples of objects π‘œ1 and π‘œ2 is the π‘˜-concept 𝛿(π‘œ1 , π‘œ2 ) = (𝑅, 𝑄) where the intension 𝑄 is the most specific π‘˜-PGP such that the extension 𝑅 contains π‘œ1 and π‘œ2 . It can be proven that this conceptual distance has properties similar to a numerical distance, such as positivity, symmetry and triangle inequality, by using the concept extension inclusion as a partial order and a notion of conceptual supremum as addition. The extensional dis- tance 𝑑(π‘œ1 , π‘œ2 ) = |𝛿(π‘œ1 , π‘œ2 ).𝑒π‘₯𝑑| can be used as a (degraded) numerical distance to evaluate the dissimilarity between π‘œ1 and π‘œ2 as the number of π‘˜-tuples between them. Definition 5. The concepts of neighbours of a π‘˜-tuple π‘œ is the set of π‘˜-concepts C-N (π‘œ) = {𝛿(π‘œβ€² , π‘œ)|π‘œβ€² ∈ 𝑂 π‘˜ }. The proper extension of a concept 𝛿.π‘π‘Ÿπ‘œπ‘πΈπ‘₯𝑑 for 𝛿 ∈ C-N (π‘œ) is the set of π‘˜-tuples of its extension that are not in the extension of a more specific concept. The set of proper extensions forms a partition of 𝑂 π‘˜ , and therefore the set of extensions forms a hierarchical clustering of 𝑂 π‘˜. Figure 2 presents the five concepts of neighbors of Charlotte in the graph context presented in Figure 1 (a). On the right, the extensions are presented as a Venn diagram, and on the left the intensions are expressed in plain English for simplicity. The first concept has for intension the whole graph centered on Charlotte and has only Chalotte in its extension. Then there are two larger concepts, one describing the children of Kate and William (Charlotte and George), and another one describing the women. Then there is an even larger concept describing the people having a father, a mother and a sibling (Harry, William, Charlotte, and George). Finally, there is the top concept, having an empty intension and 𝑂 as extension. / / ( 1 ) C r e a t i o n o f t h e model ConnorModel model = new ConnorModel ( r d f M o d e l ) ; / / ( 2 ) Creation of the t a r g e t L i s t < S t r i n g > t a r g e t = new A r r a y L i s t < > ( { ” h t t p : / / example . o r g / royal / Charlotte ” } ) ; / / ( 3 ) Creation of the i n i t t a b l e Table table = Table . ofArity ( 1 ) ; r d f M o d e l . l i s t S u b j e c t s ( ) . f o r E a c h R e m a i n i n g ( s βˆ’> t a b l e . addInitRow ( IntStream . of ( 1 ) . mapToObj ( i βˆ’> s . asNode ( ) ) . collect ( Collectors . toList () ) ) ; ); / / ( 4 ) Creation of the p a r t i t i o n C o n n o r P a r t i t i o n p a r t i t i o n = new C o n n o r P a r t i t i o n ( model , t a r g e t , table , 0) ; / / ( 5 ) C r e a t i o n o f t h e t h r e a d and c o m p u t a t i o n o f t h e Cβˆ’N A t o m i c B o o l e a n c u t = new A t o m i c B o o l e a n ( f a l s e ) ; ConnorThread t h r e a d = new ConnorThread ( p a r t i t i o n , c u t ) ; thread . s t a r t ( ) ; t h r e a d . j o i n ( 2 βˆ— 1 0 0 0 ) ; / / 2 0 0 0 ms = 2 s cut . s e t ( true ) ; / / ( 6 ) P r i n t of the r e s u l t s System . o u t . p r i n t ( p a r t i t i o n . t o J s o n ( ) ) ; Table 1 Example of usage of CONNOR 3. CONNOR: Computation of CONcepts of NeighbORs with a Java Library In this section we present CONNOR, a Java library for the computation of the Concepts of Neighbors. This library is a free and open-source software1 , based on Apache Jena2 , a well- known Java library for the Semantic Web. As the Semantic Web is based on the RDF standard, and as RDF graphs only represent labelled directed multigraphs, the library only handles graph contexts with unary relations (node labels) and binary relations (labelled directed edges). This library implements data structures for manipulating concepts of neighbors, as well as an algorithm for their computation. Table 1 presents an example code for the computation of the concepts of neighbors of Charlotte in the graph context presented in Figure 1 (a). The CONNOR library is structured into four main classes, which are detailed below. 1 Accessible here: https://gitlab.inria.fr/hayats/CONNOR 2 https://jena.apache.org/ ConnorModel This class encapsulates an RDF model that represents a graph context. There are two ways to create a model using this class. The first way consists in creating an empty model and adding triples one by one. The second way consists in creating a model from a pre-existing RDF model. Section (1) of Table 1 shows an example of creation of ConnorModel from a RDF graph. ConceptOfNeighbors As its name tells, this class represents a concept of neighbors, and hence a graph concept too. As expected, an object of this class is characterized by its intension (decomposed into two attributes: the list of the projection variables and the graph pattern), its extension and its proper extension. Those elements can be accessed through the methods getProjectionVars() , getIntensionBody() , getExtension() and getProperExtension() . The creation of objects of this class is handled by the ConnorPartition class, and should not be done by library users. ConnorPartition This class is central to the computation of concepts of neighbors. It takes its name from the fact that, as presented above, the proper extensions of the concepts of neighbors form a partition of the set of objects. This class contains all the information needed for the computations of concepts of neighbors, such as of course the graph context (represented by a ConnorModel ), the concepts of neighbors once the computation is done (represented by a collection of ConceptOfNeighbors objects), but also the tuple of objects (called target) for which we want to compute the concepts of neighbors. A ConnorPartition object can be translated into JSON for serialization and further processing. An exemple of serialization in JSON is given by Section (6) of Table 1. A key aspect of this class is that it implements an anytime algorithm for the computation of concepts of neighbors. Presented in [5], this algorithm starts with the top concept and, by successively trying to add elements to the intension, refines it in more specific concepts that still include the target. This way, the algorithm can be interrupted at each refinement step. To use this class, the base constructor of this object takes as argument a ConnorModel object, the target (represented by a list of URIs), the partition domain and a maxDepth parameter. An example of usage is given by Section (4) of Table 1. The partitioning domain is called initializationTable , which is a Table object which represents a set of tuples of entities. The role of this argument is to stipulate which set of tuples should appear in the extensions of the concepts. An example of construction of such a table is given in Section (5) of Table 1. Concerning the maxDepth parameter, its role is to set, if desired, a limit to the depth of the intension from the elements of the target. If set to zero, no limit is applied. The class method to call in order to launch the computation of concepts of neighbors is called fullPartitioning(cut) . Taking a mutable Boolean named cut as a parameter, it refines the partition until convergence or until cut is switched to true . ConnorThread This class encapsulates the computation of concepts of neighbors using a ConnorPartition in a Java thread, so that the main thread just needs to launch this thread and switch cut to true when desired. An example of usage is given by Section (5) of Table 1. 4. Conclusion This paper introduces CONNOR, a Java library for the computation of concepts of neighbors on RDF graphs. After having introduced the theoretical notions related to Graph-FCA (an extension of FCA to graphs) and to the concepts of neighbors, this paper presents the main classes of CONNOR, and details their working through an example code for the computation of concepts of neighbors. Acknowledgments This research is supported by ANR project SmartFCA (ANR-21-CE23-0023). References [1] M. Rouane-Hacene, M. Huchard, A. Napoli, P. Valtchev, Relational concept analysis: mining concept lattices from multi-relational data, Annals of Mathematics and Artificial Intelligence 67 (2013) 81–108. doi:10.1007/s10472- 012- 9329- 3 . [2] S. FerrΓ©, A Proposal for Extending Formal Concept Analysis to Knowledge Graphs, in: International Conference on Formal Concept Analysis, volume 9113, Springer International Publishing, Cham, 2015, pp. 271–286. doi:10.1007/978- 3- 319- 19545- 2_17 . [3] S. FerrΓ©, Application of Concepts of Neighbours to Knowledge Graph Completion, Data Science Pre-press (2020) 1–28. [4] S. O. Kuznetsov, Fitting Pattern Structures to Knowledge Discovery in Big Data, in: Int. Conf. on Formal Concept Analysis, volume 7880, 2013, pp. 254–266. doi:10.1007/ 978- 3- 642- 38317- 5_17 . [5] S. FerrΓ©, Answers Partitioning and Lazy Joins for Efficient Query Relaxation and Application to Similarity Search, in: The Semantic Web, Cham, 2018, pp. 209–224. doi:10.1007/ 978- 3- 319- 93417- 4_14 . [6] H. Ayats, P. Cellier, S. FerrΓ©, A Two-Step Approach for Explainable Relation Extrac- tion, in: Int. Symposium on Intelligence Data Analysis, 2022, pp. 14–25. doi:10.1007/ 978- 3- 031- 01333- 1_2 . [7] S. FerrΓ©, M. Huchard, M. Kaytoue, S. O. Kuznetsov, A. Napoli, Formal Concept Analysis: From Knowledge Discovery to Knowledge Processing, A Guided Tour of Artificial Intelligence Research: Volume II: AI Algorithms (2020) 411–445. doi:10.1007/978- 3- 030- 06167- 8_13 .