=Paper=
{{Paper
|id=Vol-2925/paper5
|storemode=property
|title=Tackling scalability issues in mining path patterns from knowledge graphs: a preliminary study
|pdfUrl=https://ceur-ws.org/Vol-2925/paper5.pdf
|volume=Vol-2925
|authors=Pierre Monnin,Emmanuel Bresso,Miguel Couceiro,Malika Smaïl-Tabbone,Amedeo Napoli,Adrien Coulet
|dblpUrl=https://dblp.org/rec/conf/algos/MonninBCSNC20
}}
==Tackling scalability issues in mining path patterns from knowledge graphs: a preliminary study
==
TACKLING SCALABILITY ISSUES IN MINING PATH PATTERNS FROM KNOWLEDGE GRAPHS : A PRELIMINARY STUDY ⇤ Pierre Monnin Emmanuel Bresso Université de Lorraine, CNRS, Inria, LORIA Université de Lorraine, CNRS, Inria, LORIA F-54000 Nancy, France F-54000 Nancy, France pierre.monnin@loria.fr emmanuel.bresso@loria.fr Miguel Couceiro Malika Smaïl-Tabbone Université de Lorraine, CNRS, Inria, LORIA Université de Lorraine, CNRS, Inria, LORIA F-54000 Nancy, France F-54000 Nancy, France miguel.couceiro@loria.fr malika.smail@loria.fr Amedeo Napoli Adrien Coulet Université de Lorraine, CNRS, Inria, LORIA Université de Lorraine, CNRS, Inria, LORIA F-54000 Nancy, France F-54000 Nancy, France amedeo.napoli@loria.fr adrien.coulet@loria.fr A BSTRACT Features mined from knowledge graphs are widely used within multiple knowledge discovery tasks such as classification or fact-checking. Here, we consider a given set of vertices, called seed vertices, and focus on mining their associated neighboring vertices, paths, and, more generally, path patterns that involve classes of ontologies linked with knowledge graphs. Due to the combinatorial nature and the increasing size of real-world knowledge graphs, the task of mining these patterns immediately entails scalability issues. In this paper, we address these issues by proposing a pattern mining approach that relies on a set of constraints (e.g., support or degree thresholds) and the monotonicity property. As our motivation comes from the mining of real-world knowledge graphs, we illustrate our approach with PGxLOD, a biomedical knowledge graph. Keywords Path · Path Pattern · Ontology · Knowledge Graph · Scalability 1 Introduction Knowledge graphs [1] have a central role in knowledge discovery tasks. For example, Linked Open Data [2] have been used in all steps of the knowledge discovery process [3]. In particular, features mined from knowledge graphs have been used in multiple applications such as knowledge base completion [4], explanations [5], or fact-checking [6]. Here, we focus on knowledge graphs expressed using Semantic Web standards [7]. In this context, vertices are either individuals that represent entities of a world (e.g., places, drugs, etc.), literals (e.g., integers, dates, etc.), or classes of individuals (e.g., Person, Drug, etc.). Arcs are defined by triples hsubject, predicate, objecti in the Resource Description Format language. Such a triple states that the subject is linked to the object by a relationship qualified by the predicate (e.g., has-side-effect, has-name, etc.). Classes and predicates are defined in ontologies, i.e., formal representations of a domain [8], and organized into two hierarchies ordered by the subsumption relation. In Semantic Web standards, individuals, classes, and predicates are identified by a Uniform Resource Identifier (URI). We view such a knowledge graph as a directed labeled multigraph K = (⌃V , ⌃A , V, A, s, t, `V , `A ), where ⇤ Supported by the PractiKPharma project, founded by the French National Research Agency (ANR) under Grant ANR15-CE23- 0028, and by the Snowball Inria Associate Team. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). • V is the set of vertices. • A is the set of arcs connecting vertices through predicates2 . • ⌃V is the set of vertex labels, here, their URI3 . • ⌃A is the set of arc labels, here, URIs of predicates of K. • s : A ! V (respectively t : A ! V ) associates an arc to its source (respectively target) vertex. • `V : V ! ⌃V (respectively `A : A ! ⌃A ) maps a vertex (respectively an arc) to its label. Hence, a triple hs, p, oi is represented by two vertices vs , vo 2 V and an arc ahs,p,oi 2 A. The source and target vertices of ahs,p,oi are respectively vs and vo , i.e., s(ahs,p,oi ) = vs and t(ahs,p,oi ) = vo . The labels of vs , vo , and ahs,p,oi are respectively s, o, and p, i.e., `V (vs ) = s, `V (vo ) = o, and `A (ahs,p,oi ) = p. In this work, we consider the task of mining features from K that are associated with a set of vertices of interest, which we call seed vertices. The set of seed vertices can be defined in intension (i.e., all vertices that instantiate a specified ontology class) or in extension (i.e., by specifying the list of their URIs). For example, in the biomedical domain, an expert may be interested in mining features associated with vertices that represent drugs causing a specific side effect. We propose to mine from K the three following kinds of features: neighboring vertices, paths, and path patterns. Neighboring vertices are vertices that can be reached in K from at least one seed vertex. A neighbor is associated with all seed vertices from which it is reachable. Its support counts such seed vertices. For example, in the knowledge graph depicted in Figure 1, the neighbor v6 is reachable from the seed vertices nC 1 and n2 , and thus its support is 2. C p Paths are sequences of pairs ! e that represent an arc labeled by the predicate p incident to an individual e. A path is associated with all seed vertices that root it in K. The support of a path counts such seed vertices. For example, the p1 p2 C p1 p2 support of ! v2 ! v3 is 1 since only nC 1 root it, i.e., n1 ! v2 ! v3 exists in K. More generally, paths may share several characteristics. For instance, intermediate vertices in paths may instantiate the same ontology classes. We propose to capture these characteristics by considering path patterns in addition to paths. p Path patterns are sequences of pairs ! E, where p is a predicate and E is either an individual or a class. Such a pair indicates that an arc labeled by p is incident to (i) E if E is an individual or (ii) an individual that instantiates E if E is a class. A path pattern is associated with all seed vertices that root a path captured by the path pattern. Its support counts such seed vertices. In the example graph depicted in Figure 1, v2 instantiates T1 , and v3 instantiates T2 . Thus, p1 p2 p1 p2 p1 p2 p1 p2 ! v2 ! v3 is captured by ! T1 ! v3 , ! v2 ! T2 , and ! T1 ! T2 . Since T2 is a subclass of T3 , it is also p1 p2 p1 p2 p1 p2 captured by the pattern ! T1 ! T3 . Note that ! T1 ! T3 also captures ! v4 ! v5 , which is rooted by nC 2. p1 p2 Consequently, the support of ! T1 ! T3 is 2. This illustrates the fact that path patterns may capture additional common characteristics of seed vertices, and thus interestingly complete paths. Mining these patterns constitutes a challenging task due to the combinatorial nature and the size of real-world knowledge graphs, which naturally entail p1 p2 scalability issues. For example, ! v2 ! v3 can be generalized by up to 11 path patterns. This mining task and its inherent scalability issues constitute the main concerns of the present work. To the best of our knowledge, works available in the literature do not address such issues in knowledge graphs with the adopted granular modeling of path patterns. However, inspired by existing graph mining works [9], we propose an Apriori-based approach that alleviates these scalability issues by relying on (i) a set of constraints (e.g., support or degree thresholds), (ii) the hierarchy of ontology classes, (iii) an incremental expansion of paths and patterns, and (iv) the monotonic character of the support of paths and patterns. We provide a reusable implementation on GitHub4 . The remainder of this paper is organized as follows. In Section 2, we outline related works that motivated our proposed approach. In Section 3, we present in details our approach to mine paths and path patterns, and discuss how it tackles scalability issues. We illustrate our framework in Section 4 on PGxLOD, a real-world biomedical knowledge graph [10]. We comment on our results as well as indicate directions of future work in Sections 5 and 6. 2 Related work Path patterns have been widely studied in different settings, for example graph rewriting [11] and query answering [12]. Here, we recall some works that tackle the problem of mining path patterns from knowledge graphs in different application contexts. 2 Here, we discard literals from V and arcs that are incident to literals from A. 3 Hence, |⌃V | = |V |. 4 https://github.com/pmonnin/kgpm p1 p2 nC 1 v2 v3 p6 type v8 ... p4 type T2 p3 type ... T5 subClassOf subClassOf type ... ... subClassOf v1 T1 > T3 v6 subClassOf type ... T6 subClassOf subClassOf type ... p4 type T4 p3 v9 p6 type p5 v7 nC 2 v4 v5 p1 p2 Figure 1: Example of a canonical graph KC . nC 1 and n2 are canonical seed vertices, all vi are canonical individuals, C and all Ti are canonical ontology classes. Prefixes of URIs were omitted for readability purposes. The definition of “canonical” is given in Subsection 3.1. One major work dealing with feature mining from RDF graphs focuses on graph kernels that count common substructures (i.e., walks, subtrees) [13, 14]. To avoid an explosion in the number of features, the authors remove patterns with a low or high frequency [14]. Graph features can be used in various tasks such as knowledge base completion. For example, AMIE [15] mines Horn clauses, i.e., conjunction of triples, to predict another triple. Similarly, in Context Path Model [16], the authors model paths as sequences of predicates between the source and target entities, i.e., p1 p2 pk 1 pk s ! ! ··· ! ! t. Here, a triple is predicted based on the paths existing between the involved entities. Shi and Weninger [6] also model paths as sequences of predicates but they use them from a fact checking perspective. They check p p1 p2 pk 1 pk whether a triple s ! t is true by predicting it from a set of learned discriminative paths os ! ! · · · ! ! ot , where os and ot are respectively the set of classes instantiated by s and t. Our framework differs from the previous two as we aim at mining features by exploring the graph from the given seed vertices and we model path patterns using the intermediate entities and the ontology classes that they instantiate. Our motivation also comes from explainable approaches that rely on the descriptive power of features mined from knowledge graphs. For example, Explain-a-LOD [5] enriches statistical data sets with features from DBpedia. When correlations can be established between statistics and DBpedia features, these features can be used as explanations for the original statistics. For example, the quality of living in cities has been correlated with whether these cities are European capitals. Explain-a-LOD leverages the different outputs of FeGeLOD [17], some of which corresponding to our p p approach. For instance, the so called relations ! e are paths, whereas the so called qualified relations ! t, where e is replaced by a class t instantiated by e, are path patterns. Alternatively, Vandewiele et al. [18] propose to learn a decision tree to classify entities based on paths of a knowledge graph. The authors suggest that the predictions of their model are explainable as they are obtained by a “white-box” model (i.e., the decision tree) that combines interpretable features (i.e., paths from a knowledge graph). Interestingly, this system considers paths with their intermediate predicates and entities, p1 pk i.e., root ! e1 · · · ! ek . They allow a generalization of both predicates and entities by the use of a wildcard (*). In our context, this would correspond to generalizing entities by the top level ontology class >. However, unlike their * * framework, we do not generalize predicates. In their study, they focus on paths of the form root ! * · · · ! e, which somewhat corresponds to extracting neighbors and their distance from seed vertices. Finally, path patterns are somewhat similar to generalized association rules [19] and the concept of raising [20]. Indeed, both works replace entities in rules by ontology classes to increase the support while preserving a high confidence. Inspired by works that prune redundant generalized rules [21], our approach mines paths and path patterns that are non-redundant and comply with some given constraints. 4. Optional and 1. Canonicalizing K domain-dependent filtering Knowledge graph K 2. Mining interesting Binary matrix neighbors and types M 2 B|N |⇥|F | Set of seed vertices N 3. Mining interesting paths and path patterns Figure 2: Main steps to mine a set F of features (i.e., neighbors, paths, and path patterns) associated with a set N of seed vertices from a knowledge graph K. Step 4 is optional and depends on the application domain. Table 1: Parameters that configure the mining of interesting neighbors, paths, and path patterns in a knowledge graph K. Each parameter is associated with a domain and is used in specific steps (see Figure 2 for step numbers). Parameter m is specific to the considered application. Here, we illustrate the role of m with the biomedical domain. Parameter Domain Steps Description k N + 2, 3 Maximum length of paths and path patterns t N 3 Maximum level for generalization in class hierarchies d N 2, 3 Maximum degree (u = true) or out degree (u = false) to allow expansion lmin N 2, 3 Minimum support for features lmax N 2, 3 Maximum support for features u B 2, 3 Whether only out arcs (u = false) or all arcs (u = true) are traversed bpredicates List of URIs 2, 3 Blacklist of predicates not to traverse bexp-types List of URIs 2, 3 Blacklist of classes whose instances are not to reach bgen-types List of URIs 2, 3 Blacklist of classes not to use in generalization m {none, p, g, 4 Optional and domain-dependent filtering strategy m, pg, pgm} Illustrated here with the biomedical domain 3 Towards a scalable approach to mine interesting paths and path patterns In this paper, we consider a knowledge graph K and a set of seed vertices N = {n1 , n2 , . . . , np } ✓ V . The task is to mine neighbors, paths, and path patterns from K that are associated with these seed vertices. For example, given a set of drugs that cause or not a side effect, we aim to mine features that can later be used to classify these drugs. In the following subsections, we propose algorithms to build a binary matrix M of size |N | ⇥ |F| from the knowledge graph K and the set of seed vertices N . The set F consists of interesting neighbors, paths, and path patterns mined from K, i.e., neighbors, paths, and path patterns that satisfy the constraints defined in terms of the parameters summarized in Table 1. These parameters will be detailed in the following subsections. M associates a seed vertex n 2 N with its features f 2 F, i.e., if Mn,f = true, then n has feature f . We outline our approach in Figure 2 where steps 1, 2, and 3 are mandatory, while step 4 is optional and depends on the application domain. 3.1 Canonicalizing K The first step of our approach consists in canonicalizing the knowledge graph K, i.e., unifying vertices that represent the same real-world entity. We use the canonicalization word by analogy with the canonicalization of knowledge bases, which consists in unifying equivalent individuals into one [4]. Indeed, in knowledge bases under the Open Information Extraction paradigm, facts and entities can be represented by synonymous terms, which leads to co-existing and equivalent individuals. For example, in such knowledge bases, two individuals Obama and Barack Obama can co-exist. Similarly, in K, vertices can be connected through arcs labeled by the owl:sameAs predicate, indicating that these vertices are actually representing the same real-world entity. Such a situation typically arises when K comprises several data sets. For example, a drug can be represented by two vertices linked by an owl:sameAs arc, resulting from the information extraction of two independent drug-related databases. Therefore, their merging allows an easy access to the full extent of the knowledge in K about the drug they represent. Such a canonicalization process corresponds to edge contraction in graph theory (i.e., taking graph quotient). In our framework, it reduces to contracting arcs whose label is the owl:sameAs predicate. To perform this canonicalization, we must respect the semantics associated with the owl:sameAs predicate, and thus take into account its symmetry and transitivity. Indeed, an owl:sameAs arc between two vertices either is explicitly stated in K or follows from existing arcs and these two properties. Let us consider a vertex v in K. The canonicalization step merges v with all its identical vertices based on owl:sameAs arcs. To compute this set of vertices, it suffices to compute the connected component of v in the undirected spanning subgraph formed by the owl:sameAs arcs of K. Indeed, undirected edges comply with the symmetry of owl:sameAs and connected components comply with the transitivity of owl:sameAs. As a result, this step takes K as input and outputs its canonical graph KC = ⌃C V , ⌃A , V , A , s , t , ` V , ` A . C C C C C C C Similarly to K, K is a directed labeled multigraph where C • V C is the set of canonical vertices. • AC is the set of canonical arcs connecting canonical vertices through predicates. • ⌃C V is the set of canonical vertex labels. • ⌃C A is the set of canonical arc labels. • sC : AC ! V C (respectively tC : AC ! V C ) associates a canonical arc to its canonical source (respectively target) vertex. • `C V : V C V (respectively `A : A ! ⌃A ) maps a canonical vertex (respectively a canonical arc) to its ! ⌃C C C C label. Each canonical vertex in KC represents a vertex from K and all its identical vertices. It is possible for a canonical vertex in KC to only represent one vertex v from K if v has no identical vertices. This corresponds to creating a surjective mapping : V ! V C associating a vertex from K to its equivalent canonical vertex in KC . Canonical arcs in KC are constructed by using to map the source and target vertices of arcs in K to canonical vertices. Similarly, the set of seed vertices N is mapped to the set of canonical seed vertices, denoted by N C . Remark 1. Note that storing URIs has a high memory footprint. Thus, in KC , we use indices in N instead of URIs to label vertices and arcs, i.e., ⌃C V ✓ N and ⌃A ✓ N. This leads to a reduced memory consumption in subsequent algorithms. C Each canonical vertex has one unique label, differing from labels of other canonical vertices, i.e., |⌃C V | = |V |. This C “relabeling” is inspired by the work of de Vries and de Rooij [13] that use a structure named pathM ap to represent a path by an integer. We developed our own structure for this relabeling, which we named CacheManager. 3.2 Mining interesting neighbors and types 3.2.1 Mining interesting neighbors Here, we select all vertices that are neighbors of at least one seed vertex in N C by performing a breadth-first search constrained by parameters k, d, u, bpredicates , and bexp-types . Neighbors are selected by traversing at most k arcs from the seed vertices in N C . If u = false, then only outgoing arcs are traversed; otherwise, all arcs are traversed regardless of their orientation. However, not all neighboring vertices are of interest. For example, we want to avoid provenance metadata vertices. Indeed, they may not constitute discriminative features as they are specific to the vertex they describe. As we aim to use ontology classes to generate path patterns, we also need to keep the graph exploration over the individuals of K and avoid traversing rdf:type arcs. To this aim, we do not traverse arcs that are labeled by a predicate whose URI or prefix of URI is blacklisted in bpredicates . For example, we blacklist in bpredicates the prefix of the provenance ontology PROV-O5 and the URI of the rdf:type predicate6 . Additionally, we provide a blacklist bexp-types of URIs or prefixes of classes whose instances must not be reached. Hence, we do not reach individuals that instantiate directly or indirectly a blacklisted class, by following rdf:type and rdfs:subClassOf arcs. For example, in a use case of classifying drugs that cause or not a side effect, one may want to avoid neighbors that represent the side effect. That is why, the ontology class representing the side effect is blacklisted in bexp-types . 5 http://www.w3.org/ns/prov# 6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type When mining neighboring vertices, we may encounter vertices with a high degree, hereafter named hubs. If the graph exploration considered their numerous neighbors, then the size of the selected neighborhood would increase exponentially, thus causing a scalability issue. Additionally, hub neighbors may not constitute specific and discriminative features. Indeed, if a hub can be reached from some seed vertices, i.e., appears in their neighborhood, the neighbors of the hub will be reached by the same seed vertices. That is why, in our approach, we propose to stop the graph exploration at vertices whose degree is strictly greater than parameter d7 . Remark 2. If u = false, then the degree of a vertex only counts outgoing arcs, otherwise all arcs are counted. The degree does not count arcs whose predicate is blacklisted in bpredicates . The degree counts arcs incident to a vertex that instantiates a blacklisted class in bexp-types . As a result, with k, d, u, bpredicates , and bexp-types fixed, we obtain a set of neighboring vertices, denoted by N (N C ) ✓ V C . Each neighboring vertex v 2 N (N C ) may only appear in the neighborhood of some seed vertices from N C w.r.t. the parameters. Thus, v is associated with these seed vertices, which we indicate by defining the support set of v. Definition 1 (Support set of a neighbor). For a given choice of these parameters, the support set of a neighboring vertex v 2 N (N C ) is denoted by S UPPORT S ET(v) ✓ N C and defined as the set of seed vertices from N C having v as neighbor. The support of a neighbor is defined as the cardinal of its support set. Note that some vertices in N (N C ) are not very discriminative: when they are associated with very few vertices from N C or nearly all of them. This motivates the use of parameters lmin and lmax that define the minimum and maximum support for a neighbor to appear in the set F of features. Hence, a neighbor v 2 N (N C ) constitutes a feature in the output matrix M if and only if lmin |S UPPORT S ET(v)| lmax . We denote the set of interesting neighbors to appear in F by Nl (N C ) = v | v 2 N (N C ) and lmin |S UPPORT S ET(v)| lmax . In M, for nC 2 N C and v 2 Nl (N C ), we have MnC ,v = true if and only if nC 2 S UPPORT S ET(v). Example 1. From KC in Figure 1 and with k = 3, d = 4, lmin = 2, lmax = 3, u = false, bpredicates = {type, subClassOf}, and bexp-types = ;, we obtain: N (N C ) = {v1 , v2 , v3 , v4 , v5 , v6 , v8 , v9 } ; S UPPORT S ET(v1 ) = S UPPORT S ET(v6 ) = nC C 1 , n2 ; S UPPORT S ET(v2 ) = S UPPORT S ET(v3 ) = S UPPORT S ET(v8 ) = nC 1 ; S UPPORT S ET(v4 ) = S UPPORT S ET(v5 ) = S UPPORT S ET(v9 ) = nC 2 ; Nl (N C ) = {v1 , v6 } . The graph exploration stops at v1 . Indeed, it is considered a hub as its degree is greater than d. Thus, vertices on the left of Figure 1 are not explored. Because u = false, the graph exploration cannot reach v7 . If bexp-types = {T3 }, then v3 and v5 cannot be traversed, resulting in N (N C ) = {v1 , v2 , v4 , v8 , v9 }. 3.2.2 Mining interesting types Observe that we can use N (N C ) to compute interesting types over the considered neighborhood. These interesting types will alleviate a scalability issue arising when building path patterns in Subsection 3.3. Interesting types must be computed over N (N C ) and not Nl (N C ) as vertices whose support is below lmin can instantiate interesting types. As an intuitive example, in Figure 1, T3 is associated with both nC1 (because of v3 ) and n2 (because of v5 ). For lmin = 2, C v3 and v5 will not be selected as features, however T3 can be used in path patterns. Parameters t and bgen-types constrain the ontology classes considered in the construction of path patterns, and thus they are integrated in the computation of interesting types. Parameter t specifies the maximum level of considered classes in ontology hierarchies. This level is computed by starting at vertices to generalize and following rdf:type and rdfs:subClassOf arcs. t = 0 only allows to generalize vertices with >, which is considered to be instantiated by all vertices. For example, v3 can be generalized by T2 and > if t = 1, and by T2 , T3 , and > if t = 2. Additionally, types used for generalization must not be blacklisted in bgen-types . This blacklist consists of URIs or prefixes of ontology classes not to be used during the construction of path patterns. For example, we refrain from considering general classes such as pgxo:Drug8 . 7 From a similar assessment, de Vries and de Rooij [14] tackle the hub issue by removing edges based on frequency of pairs (source, predicate) and (predicate, target). 8 http://pgxo.loria.fr/Drug To compute interesting types, we must first compute their support set. This motivates the following predicate: ⇢ true if v instantiates T under parameters t and bgen-types inst(v, T, t, bgen-types ) = false otherwise We can then define the support set of an ontology class T as follows: Definition 2 (Support set of an ontology class). The support set of an ontology class T is the union of support sets of vertices v 2 N (N C ) that can be generalized by T under parameters t and bgen-types . Formally: [ S UPPORT S ET(T ) = S UPPORT S ET(v) v2N (N C ) inst(v,T,t,bgen-types ) Finally, the set of interesting types used to build path patterns is defined as T lmin = {T | lmin |S UPPORT S ET(T )|}. Example 2. With the same parameters as in Example 1, we obtain S UPPORT S ET(T2 ) = nC 1 ; S UPPORT S ET(T1 ) = nC C 1 , n2 ; T lmin = {T1 , T3 , T5 , T6 , >} . 3.3 Mining interesting paths and path patterns This step focuses on mining interesting paths and path patterns rooted by seed vertices nC 2 N C . We use the term path feature (PF) to indicate a path or a path pattern. p Definition 3 (Path feature). A path feature is a sequence of atomic elements that are pairs ! E where p is a predicate and E is either (i) an individual (for paths), or (ii) an individual or an ontology class (for path patterns). The length of a path feature counts the number of its atomic elements. p1 p2 p3 Example 3. In Figure 1, the path ! v2 ! v3 ! v6 can be rooted by nC 1 and is of length 3. The path pattern p1 p2 ! T1 ! T3 can be rooted by nC 1 and n C 2 and is of length 2. Interesting path features are built by a breadth-first expansion starting at vertices in N C . As previously, k defines the maximum number of arcs traversed. Hence, path features are of length 1 to k. Observe that a scalability issue arises when mining interesting path features. Indeed, there may be several paths between two vertices and each vertex in a p1 p2 p3 path can be generalized by several ontology classes. For instance, for t = 2, ! v2 ! v3 ! v6 can be generalized by up to 23 path patterns. We propose a mining procedure that alleviates the scalability issues associated with the mining of path patterns. This mining procedure relies on the monotonicity of the support set of path features, which is defined as follows: Definition 4 (Support set of a path feature). The support set of a path consists of all vertices from N C that root it in KC . Formally, pa pb n pa pb o S UPPORT S ET( ! va . . . ! vb ) = nC 2 N C | nC ! va . . . ! vb exists in KC . The support set of a path pattern consists of all vertices from N C that root a path in KC that is captured by the path pattern. Formally, pa pb n pa pb S UPPORT S ET( ! Ea . . . ! Eb ) = nC 2 N C | nC ! va . . . ! vb exists in KC and o 8vi , vi = Ei or inst(vi , Ei , t, bgen-types ) . The support of a path feature is defined as the cardinal of its support set. p1 p2 p3 Example 4. From Figure 1, we have S UPPORT S ET( ! v2 ! v3 ! v6 ) = nC 1 . Our approach of mining interesting path features is guided by the dependency structure as illustrated in Figure 3. At first, this structure is empty and it is then augmented at each iteration of Algorithm 1, whose operations are described and illustrated below. Remark 3. As introduced in Remark 1, there are some hacks to help mitigating some of the scalability drawbacks. Storing and manipulating a path feature as a list of elements has a high memory footprint. Thus, such a list is only stored once in our CacheManager structure and the returned index (from N) is used in mining algorithms. h=1 h=2 h=3 p6 p1 2p p1 p 2 3 p ! v8 ! v2 ! v3 ! v2 ! v3 ! v6 nC 1 nC 1 nC 1 p6 p6 p6 ! T5 ! T6 !> C p1 2 p p1 2 p p1 p2 p1 p 2 3 p C C T3 v3 nC 1 , n2 nC 1 , n2 nC 1 , n2 ! v2 ! ! v2 ! > ! T1 ! ! v2 ! v3 ! > nC 1 nC 1 nC 1 nC 1 p6 ! v9 nC 2 p1 2 p p1 2 p p1 p 2 3 p p1 2p 3 p ! T1 ! T3 ! T1 ! > ! T1 ! T3 ! v6 ! T1 ! T3 ! > C C C C nC 1 , n2 nC 1 , n2 nC 1 , n2 nC 1 , n2 p4 p1 ! v1 ! v2 C nC 1 , n2 nC 1 p1 2 p p1 2 p p1 p2 p1 p 2 3 p ! v4 ! T3 ! v4 ! > ! T1 ! v5 ! v4 ! v5 ! > p4 p1 p1 !> ! T1 !> nC 2 nC 2 nC 2 nC 2 C C C nC 1 , n2 nC 1 , n2 nC 1 , n2 p1 p1 2p p1 p 2 3 p ! v4 ! v4 ! v5 ! v4 ! v5 ! v6 nC 2 nC 2 nC 2 Figure 3: Dependency structure used when mining interesting path features from the canonical graph in Figure 1. Parameters are k = 3, t = 2, d = 4, u = false, bpredicates = {type, subClassOf}, bexp-types = ;, bgen-types = ;, lmin = 2, and lmax = 3. Path features are displayed with their support set. Solid arrows represent expansion, dashed arrows represent generalization, rectangles represent paths, and rounded rectangles represent path patterns. Hatched rectangles represent discarded path features because of support limits or more specific path features with identical support set. Green rectangles represent path features ultimately added to F. Blank rectangles represent path features that respect specificity and support constraints but are not in F because of other features (in green). For readability purposes, path features are displayed as the list of their elements instead of indices from N actually used to save memory. Algorithm 1 Mining interesting paths and path patterns Input: The canonical knowledge graph KC , the set of canonical seed vertices N C , the set of interesting types T lmin Parameters: k, t, d, lmin , lmax , u, bpredicates , bexp-types , bgen-types Output: F and M completed with interesting paths and path patterns 1: h 1 2: repeat 3: Expand paths in Ph 4: Generalize expanded paths into path patterns 5: Keep most specific path features 6: Select (i) generated path features to add to F (complete M accordingly), and (ii) paths to add to Ph+1 7: h h+1 8: until h > k or Ph = ; 3.3.1 Expand paths in Ph pe Each path P 2 Ph 9 is expanded with pairs ! ve that are chosen in the neighborhood of the last individual of P . This choice is constrained by parameters k, d, u, bpredicates , and bexp-types as in Subsection 3.210 . In the first iteration, the neighborhood of seed vertices in N C is used. If no path in Ph can be expanded, i.e., the neighborhood of their last vertex does not contain reachable vertices under the constraints, then Algorithm 1 ends. p4 p1 p1 Example 5. From the graph in Figure 1, the first expansion generates the following paths: ! v1 , ! v2 , ! v4 , p6 p6 p4 ! v8 , and ! v9 . In the second expansion, ! v1 is not expanded as v1 is a hub under d = 4. Since their respective p6 p6 p1 neighborhood does not contain reachable vertices, ! v8 and ! v9 are also not expanded. The expansion of ! v2 p1 p2 p1 p1 p2 generates ! v2 ! v3 whereas the expansion of ! v4 generates ! v4 ! v5 . 3.3.2 Generalize expanded paths into path patterns pe Let Pe be the expansion of P 2 Ph as previously described, i.e., Pe = P ! ve . We generalize Pe by: pe • Generating patterns P ! T with types T 2 T lmin for which the predicate inst(ve , T, t, bgen-types ) is verified. • Retrieving from the dependency structure the path patterns that generalize P 11 and expanding them with pe pe ! ve , and ! T for all T 2 T lmin for which the predicate inst(ve , T, t, bgen-types ) is verified. Intuitively, this generalization operation allows to expand path patterns. p1 p1 p1 p1 Example 6. In the first iteration, ! v2 is generalized by ! T1 and ! >. As we will see, ! > is not kept in the p1 dependency structure at the end of the first iteration (see Subsections 3.3.3 and 3.3.4). In the second iteration, ! v2 p1 p2 p1 p1 expands into ! v2 ! v3 . In the dependency structure, we retrieve ! T1 as the path pattern generalizing ! v2 , p1 p2 p1 p2 p1 p2 which is expanded into ! T1 ! v3 , ! T1 ! T3 , and ! T1 ! >. We only generalize paths with types T 2 T lmin to avoid the generation an important number of uninteresting path patterns that would then be discarded, thus reducing the memory footprint. Indeed, by definition, if T 62 T lmin , then |S UPPORT S ET(T )| < lmin . Additionally, given a path feature P , |S UPPORT S ET(P )| minE2P |S UPPORT S ET(E)|, where E can be a class or an individual involved in P . Thus, if T 62 T lmin is used in a path pattern P , we would have |S UPPORT S ET(P )| < lmin , therefore generating an uninteresting path pattern that would be discarded later. 3.3.3 Keep most specific path features Inspired by works that prune redundant generalized rules during their generation [21], we keep only the “most specific” path patterns among those that have the same support set. 9 Ph is explained in Subsection 3.3.4. 10 Additionally, to avoid loops, P can only be expanded at iteration h with individuals ve such that there exists at least one seed vertex in S UPPORT S ET(P ) whose shortest distance to ve is h. 11 Not all generated path patterns remain in the dependency structure at the end of an iteration, see Subsections 3.3.3 and 3.3.4. Definition 5 (More specific path pattern). A path pattern P1 is more specific than another path pattern P2 if every p1 atomic element of P1 is more specific than the atomic element of P2 at the same position. An atomic element ! E1 is p2 more specific than another atomic element ! E2 if and only if: (i) p1 = p2 , i.e., both atomic elements involve the same predicate12 , and (ii) E1 is more specific than E2 13 . When path patterns have the same support set, keeping the most specific ones remove redundant generalizations, thus reducing their number and the computational burden. Additionally, we ensure a high descriptive power because the most specific paths are the most descriptive. Intuitively, a path pattern involving a class T is less descriptive than another pattern involving a subclass or an instance of the class. However, keeping the most specific path patterns is computationally expensive, which led us to propose the following computational procedure. We notice that the support set of a path pattern is the union of the support sets of the paths it generalizes. Therefore, we discard path patterns that generalize only one path. Indeed, such path patterns have the same support set as their original path and are more general, by definition. p1 p2 p3 p1 p2 p3 Example 7. For h = 3, we discard the following path patterns: ! v2 ! v3 ! > and ! v4 ! v5 ! >. However, there may exist path patterns that generalize several more specific path features while having the same support set. Such path patterns should also be discarded. p1 p1 Example 8. For h = 1, ! > shares the same support set as the more specific path pattern ! T1 and thus should be discarded. To efficiently discard path patterns, we avoid computing their whole hierarchy. Instead, we focus on retaining only the most specific ones in the prefix tree depicted in Figure 4. This prefix tree is incrementally augmented and stores the most specific path patterns for a specific iteration and support set. In this tree, individuals/classes and predicates involved in path patterns are indexed separately. Thus, its depth is twice the length of path patterns of the current iteration. The prefix tree enables an efficient storage and selection of the most specific path patterns. Indeed, let P be a path pattern to be compared with those already stored. A breadth-first traversal is performed to detect more specific patterns than P : we only traverse identical or more specific elements according to Definition 5. At any depth, if such elements cannot be found, then P is one of the most specific patterns and the traversal stops. On the contrary, if the traversal reaches a leaf containing more specific elements, then there is a pattern more specific than P with the same support set. Consequently, P is discarded and removed from the dependency structure. If P is to be stored, another breadth-first traversal is performed by considering identical or more general elements than the ones in P . When the traversal reaches a leaf, it means that more general patterns than P are currently stored and have the same support set. These are removed from the prefix tree before storing P . They are also removed from the dependency structure. Remark 4. As the prefix tree relies on associative arrays and sets, the computational cost of traversal, insertion, and removal is reduced. Additionally, resetting the tree at each expansion and each support set reduces the number of patterns to traverse and thus the global computational cost. 3.3.4 Select generated path features to add to F, and paths to add to Ph+1 In this subsection, we determine (i) which paths and path patterns should be added as features in F, and consequently, (ii) which paths to add to Ph+1 , i.e., to expand during the next iteration. A path feature P can be added in F if: (C1) lmin S UPPORT S ET(P ) lmax , (C2) Prefixes of P with the same support set do not already exist in F, (C3) If P is a path pattern, it does not generalize a path with the same support set. When P is in F, the output binary matrix M verifies MnC ,P = true for all nC 2 S UPPORT S ET(P ). 12 It is noteworthy that we do not consider the hierarchy of predicates in this work. 13 A class is more specific than all its super-classes and an individual is more specific than all classes it instantiates. h=1 h=2 Support set nC C 1 , n2 Support set nC C 1 , n2 {>} p1 p1 T1 p2 {T3 } p6 {T5 , T6 } Figure 4: Prefix tree used when computing most specific path patterns during the first and second iterations considering the support set nC 1 , n2 . This structure is reset at each expansion and for each support set. For h = 1, when C p1 p1 considering ! T1 , the structure will evolve: ! > will be removed and replaced by the considered path. For h = 3, p1 p2 p1 p2 when considered, ! T1 ! > will be discarded as more general than ! T1 ! T3 . Remark 5. Note that (C2) allows to focus on shorter paths in F. However, (C2) is not applied if P ends with an individual and there exist a prefix of P in F that ends with a class. P is considered more descriptive than its prefix, because of the individual in the last position. Hence, we add P in F and remove its prefix. For example, in Figure 3, p1 p2 p1 p2 p3 ! T1 ! T3 is replaced by ! T1 ! T3 ! v6 in F for h = 3. Remark 6. (C3) is motivated as the path is more specific and thus more descriptive than P and should be added instead of P . We also select the paths and path patterns to expand during the next iteration. To reduce their number, we rely on the lmin constraint and the monotonicity of the support set. It is clear that, for a path feature P , we have |S UPPORT S ET(P )| minE2P |S UPPORT S ET(E)|, where E can be a class or an individual involved in P . Thus, when expanding a path feature, its support set remains identical (for paths and path patterns) or decreases (for path patterns). Consequently, we add to Ph+1 paths whose expansion may generate path features complying with the lmin constraint, i.e., paths with a support greater than lmin or paths that are generalized by a pattern with a support greater than lmin . This monotonicity property also lets us remove from the dependency structure patterns whose support is lower than lmin . Indeed, such patterns cannot be used during the generalization step of the next iteration since they will inevitably generate a pattern whose support is smaller than lmin . As a result, the monotonicity property does entail a reduction in the number of paths and path patterns considered in the next iteration, thus reducing the computational cost. p1 p1 Example 9. For example, in the first iteration, the path ! v2 is added to P2 as it is generalized by ! T1 whose p1 p2 support is greater than lmin = 2. At the end of the second iteration, we remove ! v2 ! T3 because its support is lower than lmin = 2, and thus its expansion cannot generate an interesting pattern. 3.4 Optional and domain-dependent filtering After the previous steps, we obtain a feature set F containing interesting neighbors, paths, and path patterns. These features have been mined without taking into account domain constraints known to experts. We propose to apply domain-dependent filtering on F with parameter m. Such filters reduce the size of F and integrate interestingness constraints based on expert knowledge. Example 10. To classify drugs causing or not a side effect, experts may want to focus on features containing a biological pathway, a gene or a GO class, or a MeSH class. Therefore, we propose three atomic filters, only keeping neighbors, paths, and path patterns containing at least a pathway (m = p), a gene or a GO class (m = g), or a MeSH class (m = m). Such atomic filters can be combined to form disjunctive filters. For example, the m = pg filter keeps features from F containing at least a pathway or a gene or a GO class. When a filter is applied to a neighbor, this neighbor must be, e.g., a pathway for the p filter. When a filter is applied to a path or a path pattern, it means that one of its individuals / ontology classes must be, e.g., a pathway for the p filter. This domain-dependent filtering is similar to approaches that generalize association rules and prune those that involve some specified ontology classes [21, 22]. 4 Experimental setup To illustrate our approach, we will address the following task: from a knowledge graph, mine a set of features to classify drugs depending on whether they cause a specific side effect. We explore PGxLOD14 [10], a knowledge graph that aggregates several sets of Linked Open Data (LOD) describing drugs, phenotypes, and genetic factors: PharmGKB, ClinVar, DrugBank, SIDER, DisGeNET, and CTD. This aggre- gation may lead to features combining units from several LOD sets. Indeed, LOD sets may contain different and incomplete knowledge. Their combined use then enables leveraging a greater amount of knowledge, where some LOD sets complete information provided by others. This asks for a canonical knowledge graph as described in Subsection 3.1. For instance, it is possible to complete the knowledge related to a drug described in PharmGKB if it is linked with an owl:sameAs arc to the same drug described in DrugBank. This constitutes the key interest in combining LOD sets in knowledge discovery and data mining tasks, as discussed by Ristoski and Paulheim [23]. We will use the following data sets that comprise positive ( ) and negative ( ) drug examples: Data set 1 (Drug Induced Liver Injury (DILI) [24]). It is formed by 1,036 drugs in 4 classes: “most DILI concern” (192 drugs), “ambiguous DILI concern” (254 drugs), “less DILI concern” (278 drugs), and “no DILI concern” (312 drugs). We mapped these drugs from their PubChem identifiers to identifiers from PharmGKB, otherwise DrugBank, otherwise KEGG, resulting in the set of seed vertices N DILI = N DILI [ N DILI such that: • |N DILI | = 146 drugs (118 from PharmGKB, 17 from DrugBank, and 11 from KEGG). The positive drug examples are from the “most DILI concern” class. • |N DILI | = 224 drugs (206 from PharmGKB, 9 from DrugBank, and 9 from KEGG). The negative drug examples are from the “no DILI concern” class. Data set 2 (Severe Cutaneous Adverse Reactions (SCAR)15 ). It is formed by 874 drugs in 5 classes: “very probable” (18 drugs), “probable” (19 drugs), “possible” (94 drugs), “unlikely” (697 drugs), and “very unlikely” (46 drugs). We mapped these drugs from their PubChem identifiers to identifiers from PharmGKB, otherwise DrugBank, otherwise KEGG, resulting in the set of seed vertices N SCAR = N SCAR [ N SCAR such that: • |N SCAR | = 102 drugs (100 from PharmGKB and 2 from DrugBank). The positive drug examples are from the “very probable”, “probable”, and “possible” classes. • |N SCAR | = 290 drugs (286 from PharmGKB and 4 from DrugBank). The negative drug examples are from the “unlikely” and “very unlikely” classes. We implemented our approach in Python16 . We used a server with 700 GB of RAM and the following parameter values k 2 {1, 2, 3, 4}, t 2 {1, 2, 3}, d = 500, u = false, lmin = 5, lmax = +1, and m 2 {p, g, m, pg, pgm}. It should be noted that k = 4 was only tested with t = 1 because of memory issues caused by the high number of generated features. Statistics about the features are detailed for k = 3, t = 3 and k = 4, t = 1 in Table 2 and discussed in the next section. We obtained the features associated with the DILI data set under k = 3, t = 3 in approximately 1 hour. However, computing the features with k = 4, t = 1 on the same data set required 4 days and 380 GB of RAM. 5 Results and discussion The first two lines of Table 2 show the number of neighbors and types reachable before applying support limits. Enforcing these limits constitutes a first reduction of these numbers, thus reducing the memory and computational footprints. Here, numbers are approximately divided by 2. The mining of path features always relies on lmin , and thus it is not possible to count the number of all possible paths and path patterns. However, we show the number of path features generated during the mining, which already illustrates the combinatorial explosion. Enforcing support constraints and removing redundant generalizations allow to reduce their number in F (here, approximately by 20). Finally, the domain-dependent filtering defined by m also radically scales down the number of features ultimately output. However, this filtering only happens as post-processing and does not alleviate the scalability issues arising during the mining of patterns. We observe a drastic increase in the number of neighbors and path features alongside k, which highlights the scalability issues of mining large knowledge graphs. Considering additional levels in ontology hierarchies by increasing t also 14 https://pgxlod.loria.fr 15 http://www.regiscar.org/ 16 https://github.com/pmonnin/kgpm Table 2: Number of features mined for the two considered data sets for parameters d = 500, u = false before and after applying lmin = 5, lmax = +1, and m = pgm. Path features are always computed considering at least lmin to avoid combinatorial explosion. We display the number of path features generated (gen.) during the exploration and the number of path features ultimately in F after keeping the most specific and applying limit constraints. Types are not considered as features but are displayed to illustrate the possible combinatorial explosion in path patterns. Statistics about the full neighborhood are given as comparison. DILI SCAR k = 3, t = 3 k = 4, t = 1 k = 3, t = 3 k = 4, t = 1 Neighbors 175,652 628,681 179,694 639,050 Before lmin and lmax Types 13,580 18,940 13,677 18,526 Neighbors 71,560 281,657 90,690 312,029 Types 7,372 12,421 8,800 13,177 After lmin and lmax Path features in F 790,605 10,169,975 1,146,585 10,729,002 Path features gen. 20,145,635 251,791,519 29,011,996 255,672,772 Neighbors 4,069 31,804 4,214 33,361 After m Path features in F 102,674 2,291,846 147,936 2,400,642 Neighbors 2,419,957 2,419,920 Full neighborhood Types 51,477 51,472 (d = 500) Reached at k, t k = 23, t = 21 k = 23, t = 21 Neighbors 5,488,531 5,488,510 Full neighborhood Types 53,486 53,484 (d = +1) Reached at k, t k = 19, t = 21 k = 20, t = 21 multiplies the number of path features. To illustrate, with 700 GB of RAM, we could not set t to values greater than 1 for k = 4. Consequently, there is still room for improvements in terms of memory consumption, e.g., through an efficient storing of paths and path patterns. These future improvements could enable to consider the full neighborhood of seed vertices, which is reached for greater values of k and t and involves a far greater amount of vertices. For example, for the DILI data set, the full neighborhood is reached for k = 23 and t = 21 (for d = 500), or k = 19 and t = 21 (when the degree constraint is disabled). The full amount of reachable neighbors is 4 times (d = 500) or 9 times greater (d = +1) than with k = 4. The values of parameters depend on the objectives and domain knowledge of the analyst guiding the mining process, especially for blacklists and support thresholds. However, metrics about the knowledge graph may provide guidance. Indeed, statistics about node degrees can help to find a trade-off between exploration and combinatorial explosion with parameter d. Similarly, the depth of class hierarchies influences the value of t. For example, general classes may not be of interest to the analyst, thus reducing t. The parameter k can be set by considering the graph diameter. As it is common in mining processes, iterations may be required to find the best configuration. Regarding m, it is for now hard-coded and only suitable to some biomedical applications. Inspired by ontologies that allow to interactively define mining workflows [3], we could adapt this parameter to other applications by proposing such an interactive definition. When manually reviewing the output features, we noticed multiple path features across the aggregated LOD sets. This is made possible by aggregating and canonicalizing multiple LOD sets in the knowledge graph. This result particularly illustrates one of the fundamental aspects of Linked Open Data: the combination of different data sets enables to go beyond their original purposes and coverage. However, it is clear that combining LOD sets leads to bigger knowledge graphs, exacerbating the scalability issues. Regarding our approach, we only canonicalize vertices, i.e., individuals and ontology classes. Nevertheless, predicates used on arcs can also be identified as identical, leading to a canonicalization of arcs. In this context, we could benefit from matching approaches, such as PARIS [25]. By identifying identical classes, predicates, and individuals, these matching approaches could further improve the canonicalization and, therefore, increase the number of common features between seed vertices from different data sets. Similarly, we could consider literals and arcs incident to literals that were purposely discarded here. However, the canonicalization of literals raises several challenging issues due to their heterogeneity in terms of syntactic variations, unit measures, and the precision of numerical values. Other reasoning mechanisms and semantics associated with Semantic Web standards could be taken into account. For example, predicates can be defined as transitive, and thus the canonical knowledge graph could also result from their transitive closure. Regarding the modeling of path patterns, we could generalize paths with both the hierarchy of classes and the hierarchy of predicates. In addition to keeping the most specific patterns, we could use other metrics to further reduce their redundancy (e.g., approaches relying on hierarchies [26, 27] and extents of ontological classes [27]). This could also reduce the number of generated patterns, therefore improving the scalability of the mining approach. Neighbors could be enriched with the distance between them and seed vertices, which would correspond to the generalized paths of KGPTree [18]. We could also use other approaches than binary features (e.g., counting [13, 14], relative counting [28]). More importantly, it remains to test our mined features within a complete classification task to measure the influence of k, t, and the three kinds of features (neighbors, paths, and path patterns). 6 Conclusion In this preliminary study, we addressed the scalability issues associated with the task of mining neighbors, paths, and path patterns from a knowledge graph and a set of seed vertices. We proposed a method for tackling these issues, which we illustrated by mining a real-world knowledge graph. Our results highlight the importance of considering the scalability of approaches when mining features from ever-growing knowledge graphs. Our work alleviates part of the computational cost (time and memory) of mining paths and path patterns but also reveals the need for a further reduction. Such future research works could enable the modeling of more complex path patterns, for example, considering the hierarchy of predicates. References [1] Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. Knowledge graphs. CoRR, abs/2003.02320, 2020. [2] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009. [3] Petar Ristoski and Heiko Paulheim. Semantic web in data mining and knowledge discovery: A comprehensive survey. J. Web Semant., 36:1–22, 2016. [4] Luis Galárraga, Geremy Heitz, Kevin Murphy, and Fabian M. Suchanek. Canonicalizing open knowledge bases. In Jianzhong Li, Xiaoyang Sean Wang, Minos N. Garofalakis, Ian Soboroff, Torsten Suel, and Min Wang, editors, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014, pages 1679–1688. ACM, 2014. [5] Heiko Paulheim. Generating possible interpretations for statistics from linked open data. In Elena Simperl, Philipp Cimiano, Axel Polleres, Óscar Corcho, and Valentina Presutti, editors, The Semantic Web: Research and Applications - 9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27-31, 2012. Proceedings, volume 7295 of Lecture Notes in Computer Science, pages 560–574. Springer, 2012. [6] Baoxu Shi and Tim Weninger. Discriminative predicate path mining for fact checking in knowledge graphs. Knowl.-Based Syst., 104:123–133, 2016. [7] Tim Berners-Lee, James Hendler, Ora Lassila, et al. The Semantic Web. Scientific American, 284(5):28–37, 2001. [8] Thomas R Gruber. A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2):199– 220, 1993. [9] Charu C. Aggarwal and Haixun Wang, editors. Managing and Mining Graph Data, volume 40 of Advances in Database Systems. Springer, 2010. [10] Pierre Monnin, Joël Legrand, Graziella Husson, Patrice Ringot, Andon Tchechmedjiev, Clément Jonquet, Amedeo Napoli, and Adrien Coulet. PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics, 20-S(4):139:1–139:16, 2019. [11] Guillaume Bonfante, Bruno Guillaume, and Guy Perrier. Application of Graph Rewriting to Natural Language Processing. Wiley Online Library, 2018. [12] Pablo Barceló, Leonid Libkin, and Juan L. Reutter. Querying graph patterns. In Maurizio Lenzerini and Thomas Schwentick, editors, Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, pages 199–210. ACM, 2011. [13] Gerben Klaas Dirk de Vries and Steven de Rooij. A fast and simple graph kernel for RDF. In Claudia d’Amato, Petr Berka, Vojtech Svátek, and Krzysztof Wecel, editors, Proceedings of the International Workshop on Data Mining on Linked Data, with Linked Data Mining Challenge collocated with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2013), Prague, Czech Republic, September 23, 2013., volume 1082 of CEUR Workshop Proceedings. CEUR-WS.org, 2013. [14] Gerben Klaas Dirk de Vries and Steven de Rooij. Substructure counting graph kernels for machine learning from RDF data. J. Web Semant., 35:71–84, 2015. [15] Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Daniel Schwabe, Virgílio A. F. Almeida, Hartmut Glaser, Ricardo A. Baeza-Yates, and Sue B. Moon, editors, 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013, pages 413–422. International World Wide Web Conferences Steering Committee / ACM, 2013. [16] Josua Stadelmaier and Sebastian Padó. Modeling paths for explainable knowledge base completion. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 147–157, 2019. [17] Heiko Paulheim and Johannes Fürnkranz. Unsupervised generation of data mining features from linked open data. In Dumitru Dan Burdescu, Rajendra Akerkar, and Costin Badica, editors, 2nd International Conference on Web Intelligence, Mining and Semantics, WIMS ’12, Craiova, Romania, June 6-8, 2012, pages 31:1–31:12. ACM, 2012. [18] Gilles Vandewiele, Bram Steenwinckel, Femke Ongenae, and Filip De Turck. Inducing a decision tree with discriminative paths to classify entities in a knowledge graph. In Zhe He, Jiang Bian, Cui Tao, and Rui Zhang, editors, Proceedings of the 4th International Workshop on Semantics-Powered Data Mining and Analytics co- located with the 18th International Semantic Web Conference (ISWC 2019), Aukland, New Zealand, October 27, 2019., volume 2427 of CEUR Workshop Proceedings. CEUR-WS.org, 2019. [19] Ramakrishnan Srikant and Rakesh Agrawal. Mining generalized association rules. In Umeshwar Dayal, Peter M. D. Gray, and Shojiro Nishio, editors, VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland., pages 407–419. Morgan Kaufmann, 1995. [20] Xuan Zhou and James Geller. Raising, to enhance rule mining in web marketing with the use of an ontology. In Data Mining with Ontologies: Implementations, Findings, and Frameworks, pages 18–36. IGI Global, 2008. [21] Marcos Aurélio Domingues and Solange Oliveira Rezende. Using taxonomies to facilitate the analysis of the association rules. CoRR, abs/1112.1734, 2011. [22] Claudia Marinica and Fabrice Guillet. Knowledge-based interactive postmining of association rules using ontologies. IEEE Trans. Knowl. Data Eng., 22(6):784–797, 2010. [23] Petar Ristoski and Heiko Paulheim. Rdf2vec: RDF graph embeddings for data mining. In Paul T. Groth, Elena Simperl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, and Yolanda Gil, editors, The Semantic Web - ISWC 2016 - 15th International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings, Part I, volume 9981 of Lecture Notes in Computer Science, pages 498–514, 2016. [24] Minjun Chen, Ayako Suzuki, Shraddha Thakkar, Ke Yu, Chuchu Hu, and Weida Tong. DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discov Today, 21(4):648–653, 2016. [25] Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. PARIS: probabilistic alignment of relations, instances, and schema. PVLDB, 5(3):157–168, 2011. [26] Petar Ristoski and Heiko Paulheim. Feature selection in hierarchical feature spaces. In Saso Dzeroski, Pance Panov, Dragi Kocev, and Ljupco Todorovski, editors, Discovery Science - 17th International Conference, DS 2014, Bled, Slovenia, October 8-10, 2014. Proceedings, volume 8777 of Lecture Notes in Computer Science, pages 288–300. Springer, 2014. [27] Claudia d’Amato, Steffen Staab, and Nicola Fanizzi. On the influence of description logics ontologies on concep- tual similarity. In Aldo Gangemi and Jérôme Euzenat, editors, Knowledge Engineering: Practice and Patterns, 16th International Conference, EKAW 2008, Acitrezza, Italy, September 29 - October 2, 2008. Proceedings, volume 5268 of Lecture Notes in Computer Science, pages 48–63. Springer, 2008. [28] Petar Ristoski and Heiko Paulheim. A comparison of propositionalization strategies for creating features from linked open data. In Ilaria Tiddi, Mathieu d’Aquin, and Nicolas Jay, editors, Proceedings of the 1st Workshop on Linked Data for Knowledge Discovery co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014), Nancy, France, September 19th, 2014., volume 1232 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.