-

Towards SHACL Learning from Knowledge Graphs

Pouya Ghiasnezhad Omran

P.G.Omran@anu.edu.au 0

Kerry Taylor

kerry.taylor@anu.edu.au 0

Sergio Rodriguez Mendez

Sergio.RodriguezMendez@anu.edu.au 0

Armin Haller

armin.haller@anu.edu.au 0 0 Australian National University

Knowledge Graphs (KGs) are typically large data- rst knowledge bases with weak inference rules and weakly-constraining data schemes. The SHACL Shapes Constraint Language is a W3C recommendation for the expression of shapes as constraints on graph data. SHACL shapes serve to validate a KG and can give informative insight into the structure of data. Here, we introduce Inverse Open Path (IOP) rules, a logical formalism which acts as a building block for a restricted fragment of SHACL that can be used for schema-driven structural knowledge graph validation and completion. We de ne quality measures for IOP rules and propose a novel method to learn them, SHACLearner. SHACLearner adapts a state-of-the-art embedding-based open path rule learner (oprl) by modifying the e cient matrix-based evaluation module. We demonstrate SHACLearner performance on real-world massive KGs, YAGO2s (4M facts), DBpedia 3.8 (11M facts), and Wikidata (8M facts), nding that it can e ciently learn hundreds of high-quality rules.

SHACL Learning Open Path Rule Rule Learning Knowledge Graph

While public knowledge graphs became popular with the development of DBpedia [ 1 ] and Yago [ 11 ] more than a decade ago, these KGs are massive, diverse, and incomplete. Regardless of the method that is used to build a KG (e.g. collaboratively vs individually, manually vs automatically), they are incomplete due the evolving nature of human knowledge, cultural bias [ 3 ], and resource constraints.

The power of KGs arises from their data- rst approach, enabling contributors to extend a KG in a relatively arbitrary manner. On the other hand, SHACL[ 7 ] Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). was formally recommended by the W3C in 2017 to express constraints on a KG described as shapes. For example, SHACL can be used to express that a person needs to have a name, birth date, and place of birth, and that these attributes have particular types: a string; a date; a location.

We present a system SHACLearner that mines a restricted form of shapes from KG data. We propose a predicate calculus formalism in which rules have one body atom and a chain of conjunctive atoms in the head with a speci c variable binding pattern. Since these rules are an inverse version of open path rules [ 6 ] (a fragment of existential rules), we call them inverse open path (IOP) rules. To learn IOP rules we adapt an embedding-based open path rule learner, oprl [ 6 ]. We de ne quality measures to express the validity of IOP rules. Each IOP rule is a SHACL shape, in the sense that it can be syntactically rewritten to SHACL.

While SHACL is customarily used to describe constraints over KG data, it can just as well be seen to describe structural patterns observed in KG data. When these patterns are frequently demonstrated in an incomplete KG, they suggest patterns that should occur. A SHACL constraint can be matched to pattern fragment in the KG to suggest how the fragment should be completed to satisfy the constraint in full. For example, suppose we have a human entity, bronte, in the KG but we know very little else about her. Now assume we have a SHACL shape that says humans have fathers who are also of type human. Then we can infer that our KG is missing bronte's father and also that he should be typed as human. We can use this information directly in a form-based data entry tool [ 12 ] to simultaneously prompt for missing data and to constrain its type or its relationships to other entities in the KG. It could also be used in an automated knowledge graph completion scenario.

The fragment of SHACL we learn expresses structural data schemas with least restriction in comparison with other formalisms including Closed Rules (CR) [ 9 ] and Graph Functional Dependencies (GFD) [ 4 ]. CR is used to predict a new fact and GFD is used to identify inconsistency in a graph. Our proposed shapes express facts associated in a connected path with some speci c entities. 2

Shape learning

Preliminaries We build on open path (OP) rules, rst introduced for active knowledge graph completion [ 6 ], of the form:

Pt(x; z0)

P1(z0; z1) ^ P2(z1; z2) ^ ::: ^ Pn(zn 1; y) (1) Here, Pi is a predicate in the KG and each of fx; zi; yg are entity variables. Unlike CR, OP rules do not necessarily form a loop, but every instantiation of a CR is also an instantiation of an OP rule. To assess the quality of open path rules, open path standard con dence (OPSC) and open path head coverage (OPHC) are derived in [ 6 ] from the closed path forms. IOP Rules Here we focus on the fragment of SHACL in which node shapes constrain a target predicate (e.g. the unary predicate human, with property shapes expressing constraints over facts related to the target predicate. We particularly focus on property shapes which act to constrain an argument of the target predicate. For example, a shape expresses that each entity x which satis es human(x) should satisfy the following paths: (1) citizenOf (x; z1) ^ country(z1), (2) f ather(x; z2) ^ human(z2), and (3) nativeLanguage(x; z3) ^ language(z3). We observe that the converse of OP rules, inverse open rules (IOP), correspond to a fragment of SHACL shapes. For example, the above constraints can be expressed as IOP rules: human(x) ! citizenOf (x; z1) ^ country(z1; z1)

human(x) ! f ather(x; z2) ^ human(z2; z2) human(x) ! nativeLanguage(x; z3) ^ language(z3; z3) The general form of an IOP rule is given by,

Pt0(x; z0) ! P10(z0; z1) ^ P20(z1; z2) ^ ::: ^ Pn0 (zn 1; y): (2) where each Pi0 is either a predicate in the KG or its inverse with the subject and object bindings swapped. These are not Horn rules. In an IOP rule the body of the rule is Pt and its head is the sequence of predicates, P1 ^ P2 ^ ::: ^ Pn. Hence we instantiate the atomic body to predict an instance of the head.

To assess the quality of IOP rules we follow the quality measures for OP rules [ 6 ]. Let r be an IOP rule of the form (2). Then a pair of entities (e; e0) satis es the head of r, denoted headr(e; e0), if there exist entities e1; :::; en 1 in the KG such that P1(e; e1); P2(e1; e2); :::; Pn(en 1; e0) are facts in the KG. A pair of entities (e00; e) satis es the body of r, denoted Pt(e00; e), if Pt(e00; e) is a fact in the KG. The inverse open path support, inverse open path standard con dence, and inverse open path head coverage of r are given respectively by

IOP supp(r) =j fe : 9e0; e00 s.t. headr(e; e0) and Pt(e00; e)g j IOP SC(r) =

IOP supp(r) j fe : 9e00 s.t. Pt(e00; e)g j ; IOP HC(r) =

IOP supp(r) j fe : 9e0 s.t. headr(e; e0)g j

Notably, the support for an IOP rule is the same as the support for its corresponding straight OP form, IOPSC is the same as the corresponding OPHC, and IOPHC is the same as the corresponding OPSC. This close relationship between OP and IOP rules helps us to mine both in the one process. IOP Learning through Representation Learning: We start with the open path rule learner oprl, proposed in [ 6 ] and adapt its embedding-based OP rule learning to learn annotated IOP rules. SHACLearner uses a sampling method which prunes the entities and predicates that are less relevant to the target predicate to obtain a sampled KG. The sample is fed to embedding learners such RESCAL [ 8 ]. Then SHACLearner uses the computed embedding representations of predicates and entities in heuristic functions that inform the generation of IOP rules bounded by a user-de ned maximum length. Eventually, potential IOP rules are evaluated, annotated, and ltered to produce annotated IOP rules. 3

Related Work

There are some exploratory attempts to address learning SHACL shapes from KGs [ 5, 10, 2 ]. They are procedural methods without logical foundations and are not shown to be scalable to handle real-world KGs. They work with a small amount of data and their representation formalism they use for their output is di cult to compare with the well-de ned IOP rules which we use in this paper. [ 2 ] carries out the task in a semi-automatic manner: it provides a sample of data to an of-the-shelf graph structure learner and provides the output in an interactive interface for a human user to create SHACL shapes. 4

Experiments

We have implemented SHACLearner6 and conducted experiments to assess it. Our experiments are designed to prove the e ectiveness of capturing shapes with varying con dence from various real-world massive KGs. Since our proposed system is the rst method to learn shapes from massive KGs automatically, we have no benchmark with which to compare. However, the performance of our system shows that it can handle the task satisfactorily and can be applied in practice.

We follow the established approach for evaluating KG rule-learning methods, that is, measuring the quantity and quality of distinct rules learnt. Rule quality is measured by open path standard con dence (IOPSC) and open path head coverage (IOPHC). We randomly selected 50 target predicates (using random.org) from Wikidata and DBPedia. We used all predicates of YAGO2s as target predicates. A 10 hour limit was set for learning each target predicate. Table 1 shows the average numbers of quality IOP rules (#Rules, IOPSC 0:1 and IOPHC 0:01 as in [ 6 ]), numbers of high quality rules (#ARules, IOPSC 0:8) mined for the selected target predicates, and the running times (in hours, averaged over the targets). 6 Detailed experimental results can be found at https://www.dropbox.com/sh/2s2hwah7z8m2495/AACDf0sem9qsxQ83aGbgWMRMa?dl=0 5

Conclusion

In this work, we propose a method to learn a fragment of SHACL shapes from KGs as a way to describe KG patterns and also to validate KGs and support new data entry. Our shapes describe conjunctive paths of constraints over properties of target predicates. To learn such rules we adapt an embedding-based Open Path rule learner (oprl) by introducing the following novel components: (1) we propose IOP rules which allows us to mine rules with free variables with one atom as body and a chain of atoms as the head, while keeping the complexity of the learning phase manageable; and (2) we propose an e cient method to evaluate IOP rules by exactly computing the qualities of each rule using fast matrix and vector operations.

1. Auer , S. , Bizer , C. , Kobilarov , G. , Lehmann , J. , Cyganiak , R. , Ives , Z.G. : Dbpedia: A nucleus for a web of open data . In: ISWC . vol. 4825 , pp. 722 { 735 . Springer ( 2007 )

2. Boneva , I. , Dusart , J. , Alvarez , D.F. , Labra

Gayo

, J.E. : Shape designer for ShEx and SHACL constraints . In: ISWC Posters . vol. 2456 , pp. 269 { 272 ( 2019 )

3. Callahan , E.S. , Herring , S.C. : Cultural bias in wikipedia content on famous persons . Journal of the American Society for Information Science and Technology 62 ( 10 ), 1899 { 1915 ( 2011 )

4. Fan , W. , Hu , C. , Liu , X. , Lu , P. : Discovering graph functional dependencies . In: SIGMOD . pp. 427 { 439 ( 2018 )

5. Fernandez-Alvarez , D. , Garc a-Gonzalez, H. , Frey , J. , Hellmann , S. , Gayo , J.E.L. : Inference of latent shape expressions associated to DBpedia ontology . In: ISWC Posters . vol. 2180 ( 2018 )

Ghiasnezhad

Omran , P. , Taylor , K., Rodriguez Mendez , S. , Haller , A. : Active Knowledge Graph Completion . Tech. rep., ANU Research Publication ( 2020 )

7. Knublauch , H. , Kontokostas , D.: Shapes Constraint Language (SHACL) ( 2017 )

8. Nickel , M. , Rosasco , L. , Poggio , T. : Holographic Embeddings of Knowledge Graphs . In: AAAI. pp. 1955 { 1961 ( 2016 )

9. Omran , P.G. , Wang , K. , Wang , Z. : Scalable Rule Learning via Learning Representation . In: IJCAI. pp. 2149 { 2155 ( 2018 )

10. Spahiu , B. , Maurino , A. , Palmonari , M. : Towards improving the quality of knowledge graphs with data-driven ontology patterns and SHACL . In: Workshop on Ontology Design and Patterns . vol. 2195 , pp. 52 { 66 ( 2018 ), https://www.w3.org/TR/shacl/

11. Suchanek , F.M. , Kasneci , G. , Weikum , G.: Yago: a core of semantic knowledge . In: Williamson, C.L. , Zurko , M.E. , Patel-Schneider , P.F. , Shenoy , P.J. (eds.) WWW. pp. 697 { 706 . ACM ( 2007 )

12. Wright , J. , Rodr guez Mendez, S. , Haller , A. , Taylor , K., Omran , P. : Schimatos: a SHACL-based web-form generator for knowledge graph editing . In: ISWC ( 2020 ), To appear.