=Paper=
{{Paper
|id=Vol-2721/paper523
|storemode=property
|title=Towards SHACL Learning from Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-2721/paper523.pdf
|volume=Vol-2721
|authors=Pouya Ghiasnezhad Omran,Kerry Taylor,Sergio José Rodríguez Méndez,Armin Haller
|dblpUrl=https://dblp.org/rec/conf/semweb/OmranTMH20a
}}
==Towards SHACL Learning from Knowledge Graphs==
<pdf width="1500px">https://ceur-ws.org/Vol-2721/paper523.pdf</pdf>
<pre>
      Towards SHACL Learning from Knowledge
                    Graphs

 Pouya Ghiasnezhad Omran1,2 , Kerry Taylor1,3 , Sergio Rodriguez Mendez1,4 ,
                         and Armin Haller1,5
                           1
                            Australian National University
                              2
                                P.G.Omran@anu.edu.au
                            3
                              kerry.taylor@anu.edu.au
                       4
                         Sergio.RodriguezMendez@anu.edu.au
                            5
                              armin.haller@anu.edu.au


        Abstract. Knowledge Graphs (KGs) are typically large data-first knowl-
        edge bases with weak inference rules and weakly-constraining data schemes.
        The SHACL Shapes Constraint Language is a W3C recommendation for
        the expression of shapes as constraints on graph data. SHACL shapes
        serve to validate a KG and can give informative insight into the structure
        of data. Here, we introduce Inverse Open Path (IOP) rules, a logical for-
        malism which acts as a building block for a restricted fragment of SHACL
        that can be used for schema-driven structural knowledge graph validation
        and completion. We define quality measures for IOP rules and propose a
        novel method to learn them, SHACLearner. SHACLearner adapts
        a state-of-the-art embedding-based open path rule learner (oprl) by
        modifying the efficient matrix-based evaluation module. We demonstrate
        SHACLearner performance on real-world massive KGs, YAGO2s (4M
        facts), DBpedia 3.8 (11M facts), and Wikidata (8M facts), finding that
        it can efficiently learn hundreds of high-quality rules.

        Keywords: SHACL Learning · Open Path Rule · Knowledge Graph ·
        Rule Learning · Knowledge Graph.


1     Introduction
While public knowledge graphs became popular with the development of DBpe-
dia [1] and Yago [11] more than a decade ago, these KGs are massive, diverse, and
incomplete. Regardless of the method that is used to build a KG (e.g. collabora-
tively vs individually, manually vs automatically), they are incomplete due the
evolving nature of human knowledge, cultural bias [3], and resource constraints.
    The power of KGs arises from their data-first approach, enabling contributors
to extend a KG in a relatively arbitrary manner. On the other hand, SHACL[7]
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
       P.G. Omran et al.

was formally recommended by the W3C in 2017 to express constraints on a KG
described as shapes. For example, SHACL can be used to express that a person
needs to have a name, birth date, and place of birth, and that these attributes
have particular types: a string; a date; a location.
    We present a system SHACLearner that mines a restricted form of shapes
from KG data. We propose a predicate calculus formalism in which rules have
one body atom and a chain of conjunctive atoms in the head with a specific
variable binding pattern. Since these rules are an inverse version of open path
rules [6] (a fragment of existential rules), we call them inverse open path (IOP)
rules. To learn IOP rules we adapt an embedding-based open path rule learner,
oprl [6]. We define quality measures to express the validity of IOP rules. Each
IOP rule is a SHACL shape, in the sense that it can be syntactically rewritten
to SHACL.
    While SHACL is customarily used to describe constraints over KG data, it
can just as well be seen to describe structural patterns observed in KG data.
When these patterns are frequently demonstrated in an incomplete KG, they
suggest patterns that should occur. A SHACL constraint can be matched to
pattern fragment in the KG to suggest how the fragment should be completed
to satisfy the constraint in full. For example, suppose we have a human entity,
bronte, in the KG but we know very little else about her. Now assume we have a
SHACL shape that says humans have fathers who are also of type human. Then
we can infer that our KG is missing bronte’s father and also that he should
be typed as human. We can use this information directly in a form-based data
entry tool [12] to simultaneously prompt for missing data and to constrain its
type or its relationships to other entities in the KG. It could also be used in an
automated knowledge graph completion scenario.
    The fragment of SHACL we learn expresses structural data schemas with
least restriction in comparison with other formalisms including Closed Rules
(CR) [9] and Graph Functional Dependencies (GFD) [4]. CR is used to predict
a new fact and GFD is used to identify inconsistency in a graph. Our proposed
shapes express facts associated in a connected path with some specific entities.


2   Shape learning

Preliminaries We build on open path (OP) rules, first introduced for active
knowledge graph completion [6], of the form:

                Pt (x, z0 ) ← P1 (z0 , z1 ) ∧ P2 (z1 , z2 ) ∧ ... ∧ Pn (zn−1 , y)   (1)

 Here, Pi is a predicate in the KG and each of {x, zi , y} are entity variables.
Unlike CR, OP rules do not necessarily form a loop, but every instantiation
of a CR is also an instantiation of an OP rule. To assess the quality of open
path rules, open path standard confidence (OPSC) and open path head coverage
(OPHC) are derived in [6] from the closed path forms.
                              Towards SHACL Learning from Knowledge Graphs

IOP Rules Here we focus on the fragment of SHACL in which node shapes con-
strain a target predicate (e.g. the unary predicate human, with property shapes
expressing constraints over facts related to the target predicate. We particu-
larly focus on property shapes which act to constrain an argument of the tar-
get predicate. For example, a shape expresses that each entity x which satisfies
human(x) should satisfy the following paths: (1) citizenOf (x, z1 ) ∧ country(z1 ),
(2) f ather(x, z2 ) ∧ human(z2 ), and (3) nativeLanguage(x, z3 ) ∧ language(z3 ).
We observe that the converse of OP rules, inverse open rules (IOP), correspond
to a fragment of SHACL shapes. For example, the above constraints can be
expressed as IOP rules:
                     human(x) → citizenOf (x, z1 ) ∧ country(z1 , z1 )

                       human(x) → f ather(x, z2 ) ∧ human(z2 , z2 )
                 human(x) → nativeLanguage(x, z3 ) ∧ language(z3 , z3 )
The general form of an IOP rule is given by,
                  Pt0 (x, z0 ) → P10 (z0 , z1 ) ∧ P20 (z1 , z2 ) ∧ ... ∧ Pn0 (zn−1 , y).         (2)

where each Pi0 is either a predicate in the KG or its inverse with the subject and
object bindings swapped. These are not Horn rules. In an IOP rule the body of
the rule is Pt and its head is the sequence of predicates, P1 ∧ P2 ∧ ... ∧ Pn . Hence
we instantiate the atomic body to predict an instance of the head.
    To assess the quality of IOP rules we follow the quality measures for OP
rules [6]. Let r be an IOP rule of the form (2). Then a pair of entities (e, e0 )
satisfies the head of r, denoted headr (e, e0 ), if there exist entities e1 , ..., en−1 in
the KG such that P1 (e, e1 ), P2 (e1 , e2 ), ..., Pn (en−1 , e0 ) are facts in the KG. A pair
of entities (e00 , e) satisfies the body of r, denoted Pt (e00 , e), if Pt (e00 , e) is a fact in
the KG. The inverse open path support, inverse open path standard confidence,
and inverse open path head coverage of r are given respectively by
               IOP supp(r) =| {e : ∃e0 , e00 s.t. headr (e, e0 ) and Pt (e00 , e)} |
                          IOP supp(r)                                     IOP supp(r)
  IOP SC(r) =                                      , IOP HC(r) =
                  | {e : ∃e00 s.t. Pt (e00 , e)} |               | {e : ∃e0 s.t. headr (e, e0 )} |

   Notably, the support for an IOP rule is the same as the support for its
corresponding straight OP form, IOPSC is the same as the corresponding OPHC,
and IOPHC is the same as the corresponding OPSC. This close relationship
between OP and IOP rules helps us to mine both in the one process.

IOP Learning through Representation Learning: We start with the open path
rule learner oprl, proposed in [6] and adapt its embedding-based OP rule learn-
ing to learn annotated IOP rules. SHACLearner uses a sampling method which
prunes the entities and predicates that are less relevant to the target predicate to
obtain a sampled KG. The sample is fed to embedding learners such RESCAL [8].
Then SHACLearner uses the computed embedding representations of predi-
cates and entities in heuristic functions that inform the generation of IOP rules
bounded by a user-defined maximum length. Eventually, potential IOP rules are
evaluated, annotated, and filtered to produce annotated IOP rules.
        P.G. Omran et al.

3     Related Work

There are some exploratory attempts to address learning SHACL shapes from
KGs [5, 10, 2]. They are procedural methods without logical foundations and are
not shown to be scalable to handle real-world KGs. They work with a small
amount of data and their representation formalism they use for their output is
difficult to compare with the well-defined IOP rules which we use in this paper.
[2] carries out the task in a semi-automatic manner: it provides a sample of
data to an of-the-shelf graph structure learner and provides the output in an
interactive interface for a human user to create SHACL shapes.


4     Experiments

We have implemented SHACLearner6 and conducted experiments to assess
it. Our experiments are designed to prove the effectiveness of capturing shapes
with varying confidence from various real-world massive KGs. Since our proposed
system is the first method to learn shapes from massive KGs automatically, we
have no benchmark with which to compare. However, the performance of our
system shows that it can handle the task satisfactorily and can be applied in
practice.
    We follow the established approach for evaluating KG rule-learning meth-
ods, that is, measuring the quantity and quality of distinct rules learnt. Rule
quality is measured by open path standard confidence (IOPSC) and open path
head coverage (IOPHC). We randomly selected 50 target predicates (using ran-
dom.org) from Wikidata and DBPedia. We used all predicates of YAGO2s as
target predicates. A 10 hour limit was set for learning each target predicate. Ta-
ble 1 shows the average numbers of quality IOP rules (#Rules, IOPSC≥ 0.1 and
IOPHC≥ 0.01 as in [6]), numbers of high quality rules (#ARules, IOPSC≥ 0.8)
mined for the selected target predicates, and the running times (in hours, aver-
aged over the targets).


            Table 1: Performance of SHACLearner on benchmark KGs
        Benchmark # Facts # Entities # Predicates #Rules #Arules Time (hours)
        YAGO2s      4.12M    1.65M             37   103        5          0.9
        DBpedia 3.8   11M     2.2M           650    863      58             2
        Wikidata     8.4M       4M           430    341      18           1.7


6
    Detailed experimental results can be found at
    https://www.dropbox.com/sh/2s2hwah7z8m2495/AACDf0sem9qsxQ83aGbgWMRMa?dl=0
                           Towards SHACL Learning from Knowledge Graphs

5    Conclusion

In this work, we propose a method to learn a fragment of SHACL shapes from
KGs as a way to describe KG patterns and also to validate KGs and support new
data entry. Our shapes describe conjunctive paths of constraints over properties
of target predicates. To learn such rules we adapt an embedding-based Open
Path rule learner (oprl) by introducing the following novel components: (1) we
propose IOP rules which allows us to mine rules with free variables with one
atom as body and a chain of atoms as the head, while keeping the complexity
of the learning phase manageable; and (2) we propose an efficient method to
evaluate IOP rules by exactly computing the qualities of each rule using fast
matrix and vector operations.


References
 1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: Dbpedia:
    A nucleus for a web of open data. In: ISWC. vol. 4825, pp. 722–735. Springer (2007)
 2. Boneva, I., Dusart, J., Álvarez, D.F., Labra Gayo, J.E.: Shape designer for ShEx
    and SHACL constraints. In: ISWC Posters. vol. 2456, pp. 269–272 (2019)
 3. Callahan, E.S., Herring, S.C.: Cultural bias in wikipedia content on famous persons.
    Journal of the American Society for Information Science and Technology 62(10),
    1899–1915 (2011)
 4. Fan, W., Hu, C., Liu, X., Lu, P.: Discovering graph functional dependencies. In:
    SIGMOD. pp. 427–439 (2018)
 5. Fernández-Álvarez, D., Garcı́a-González, H., Frey, J., Hellmann, S., Gayo, J.E.L.:
    Inference of latent shape expressions associated to DBpedia ontology. In: ISWC
    Posters. vol. 2180 (2018)
 6. Ghiasnezhad Omran, P., Taylor, K., Rodriguez Mendez, S., Haller, A.: Active
    Knowledge Graph Completion. Tech. rep., ANU Research Publication (2020)
 7. Knublauch, H., Kontokostas, D.: Shapes Constraint Language (SHACL) (2017)
 8. Nickel, M., Rosasco, L., Poggio, T.: Holographic Embeddings of Knowledge Graphs.
    In: AAAI. pp. 1955–1961 (2016)
 9. Omran, P.G., Wang, K., Wang, Z.: Scalable Rule Learning via Learning Represen-
    tation. In: IJCAI. pp. 2149–2155 (2018)
10. Spahiu, B., Maurino, A., Palmonari, M.: Towards improving the quality
    of knowledge graphs with data-driven ontology patterns and SHACL. In:
    Workshop on Ontology Design and Patterns. vol. 2195, pp. 52–66 (2018),
    https://www.w3.org/TR/shacl/
11. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In:
    Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) WWW.
    pp. 697–706. ACM (2007)
12. Wright, J., Rodrı́guez Méndez, S., Haller, A., Taylor, K., Omran, P.: Schimatos: a
    SHACL-based web-form generator for knowledge graph editing. In: ISWC (2020),
    To appear.

</pre>