<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards SHACL Learning from Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pouya Ghiasnezhad Omran</string-name>
          <email>P.G.Omran@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kerry Taylor</string-name>
          <email>kerry.taylor@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Rodriguez Mendez</string-name>
          <email>Sergio.RodriguezMendez@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Armin Haller</string-name>
          <email>armin.haller@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian National University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge Graphs (KGs) are typically large data- rst knowledge bases with weak inference rules and weakly-constraining data schemes. The SHACL Shapes Constraint Language is a W3C recommendation for the expression of shapes as constraints on graph data. SHACL shapes serve to validate a KG and can give informative insight into the structure of data. Here, we introduce Inverse Open Path (IOP) rules, a logical formalism which acts as a building block for a restricted fragment of SHACL that can be used for schema-driven structural knowledge graph validation and completion. We de ne quality measures for IOP rules and propose a novel method to learn them, SHACLearner. SHACLearner adapts a state-of-the-art embedding-based open path rule learner (oprl) by modifying the e cient matrix-based evaluation module. We demonstrate SHACLearner performance on real-world massive KGs, YAGO2s (4M facts), DBpedia 3.8 (11M facts), and Wikidata (8M facts), nding that it can e ciently learn hundreds of high-quality rules.</p>
      </abstract>
      <kwd-group>
        <kwd>SHACL Learning Open Path Rule Rule Learning Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        While public knowledge graphs became popular with the development of
DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Yago [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] more than a decade ago, these KGs are massive, diverse, and
incomplete. Regardless of the method that is used to build a KG (e.g.
collaboratively vs individually, manually vs automatically), they are incomplete due the
evolving nature of human knowledge, cultural bias [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and resource constraints.
      </p>
      <p>
        The power of KGs arises from their data- rst approach, enabling contributors
to extend a KG in a relatively arbitrary manner. On the other hand, SHACL[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
was formally recommended by the W3C in 2017 to express constraints on a KG
described as shapes. For example, SHACL can be used to express that a person
needs to have a name, birth date, and place of birth, and that these attributes
have particular types: a string; a date; a location.
      </p>
      <p>
        We present a system SHACLearner that mines a restricted form of shapes
from KG data. We propose a predicate calculus formalism in which rules have
one body atom and a chain of conjunctive atoms in the head with a speci c
variable binding pattern. Since these rules are an inverse version of open path
rules [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] (a fragment of existential rules), we call them inverse open path (IOP)
rules. To learn IOP rules we adapt an embedding-based open path rule learner,
oprl [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We de ne quality measures to express the validity of IOP rules. Each
IOP rule is a SHACL shape, in the sense that it can be syntactically rewritten
to SHACL.
      </p>
      <p>
        While SHACL is customarily used to describe constraints over KG data, it
can just as well be seen to describe structural patterns observed in KG data.
When these patterns are frequently demonstrated in an incomplete KG, they
suggest patterns that should occur. A SHACL constraint can be matched to
pattern fragment in the KG to suggest how the fragment should be completed
to satisfy the constraint in full. For example, suppose we have a human entity,
bronte, in the KG but we know very little else about her. Now assume we have a
SHACL shape that says humans have fathers who are also of type human. Then
we can infer that our KG is missing bronte's father and also that he should
be typed as human. We can use this information directly in a form-based data
entry tool [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to simultaneously prompt for missing data and to constrain its
type or its relationships to other entities in the KG. It could also be used in an
automated knowledge graph completion scenario.
      </p>
      <p>
        The fragment of SHACL we learn expresses structural data schemas with
least restriction in comparison with other formalisms including Closed Rules
(CR) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Graph Functional Dependencies (GFD) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. CR is used to predict
a new fact and GFD is used to identify inconsistency in a graph. Our proposed
shapes express facts associated in a connected path with some speci c entities.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Shape learning</title>
      <p>
        Preliminaries We build on open path (OP) rules, rst introduced for active
knowledge graph completion [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], of the form:
      </p>
      <p>Pt(x; z0)</p>
      <p>
        P1(z0; z1) ^ P2(z1; z2) ^ ::: ^ Pn(zn 1; y)
(1)
Here, Pi is a predicate in the KG and each of fx; zi; yg are entity variables.
Unlike CR, OP rules do not necessarily form a loop, but every instantiation
of a CR is also an instantiation of an OP rule. To assess the quality of open
path rules, open path standard con dence (OPSC) and open path head coverage
(OPHC) are derived in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] from the closed path forms.
IOP Rules Here we focus on the fragment of SHACL in which node shapes
constrain a target predicate (e.g. the unary predicate human, with property shapes
expressing constraints over facts related to the target predicate. We
particularly focus on property shapes which act to constrain an argument of the
target predicate. For example, a shape expresses that each entity x which satis es
human(x) should satisfy the following paths: (1) citizenOf (x; z1) ^ country(z1),
(2) f ather(x; z2) ^ human(z2), and (3) nativeLanguage(x; z3) ^ language(z3).
We observe that the converse of OP rules, inverse open rules (IOP), correspond
to a fragment of SHACL shapes. For example, the above constraints can be
expressed as IOP rules:
human(x) ! citizenOf (x; z1) ^ country(z1; z1)
      </p>
      <p>human(x) ! f ather(x; z2) ^ human(z2; z2)
human(x) ! nativeLanguage(x; z3) ^ language(z3; z3)
The general form of an IOP rule is given by,</p>
      <p>Pt0(x; z0) ! P10(z0; z1) ^ P20(z1; z2) ^ ::: ^ Pn0 (zn 1; y):
(2)
where each Pi0 is either a predicate in the KG or its inverse with the subject and
object bindings swapped. These are not Horn rules. In an IOP rule the body of
the rule is Pt and its head is the sequence of predicates, P1 ^ P2 ^ ::: ^ Pn. Hence
we instantiate the atomic body to predict an instance of the head.</p>
      <p>
        To assess the quality of IOP rules we follow the quality measures for OP
rules [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Let r be an IOP rule of the form (2). Then a pair of entities (e; e0)
satis es the head of r, denoted headr(e; e0), if there exist entities e1; :::; en 1 in
the KG such that P1(e; e1); P2(e1; e2); :::; Pn(en 1; e0) are facts in the KG. A pair
of entities (e00; e) satis es the body of r, denoted Pt(e00; e), if Pt(e00; e) is a fact in
the KG. The inverse open path support, inverse open path standard con dence,
and inverse open path head coverage of r are given respectively by
      </p>
      <p>IOP supp(r) =j fe : 9e0; e00 s.t. headr(e; e0) and Pt(e00; e)g j
IOP SC(r) =</p>
      <p>IOP supp(r)
j fe : 9e00 s.t. Pt(e00; e)g j
; IOP HC(r) =</p>
      <p>IOP supp(r)
j fe : 9e0 s.t. headr(e; e0)g j</p>
      <p>
        Notably, the support for an IOP rule is the same as the support for its
corresponding straight OP form, IOPSC is the same as the corresponding OPHC,
and IOPHC is the same as the corresponding OPSC. This close relationship
between OP and IOP rules helps us to mine both in the one process.
IOP Learning through Representation Learning: We start with the open path
rule learner oprl, proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and adapt its embedding-based OP rule
learning to learn annotated IOP rules. SHACLearner uses a sampling method which
prunes the entities and predicates that are less relevant to the target predicate to
obtain a sampled KG. The sample is fed to embedding learners such RESCAL [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Then SHACLearner uses the computed embedding representations of
predicates and entities in heuristic functions that inform the generation of IOP rules
bounded by a user-de ned maximum length. Eventually, potential IOP rules are
evaluated, annotated, and ltered to produce annotated IOP rules.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        There are some exploratory attempts to address learning SHACL shapes from
KGs [
        <xref ref-type="bibr" rid="ref10 ref2 ref5">5, 10, 2</xref>
        ]. They are procedural methods without logical foundations and are
not shown to be scalable to handle real-world KGs. They work with a small
amount of data and their representation formalism they use for their output is
di cult to compare with the well-de ned IOP rules which we use in this paper.
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] carries out the task in a semi-automatic manner: it provides a sample of
data to an of-the-shelf graph structure learner and provides the output in an
interactive interface for a human user to create SHACL shapes.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We have implemented SHACLearner6 and conducted experiments to assess
it. Our experiments are designed to prove the e ectiveness of capturing shapes
with varying con dence from various real-world massive KGs. Since our proposed
system is the rst method to learn shapes from massive KGs automatically, we
have no benchmark with which to compare. However, the performance of our
system shows that it can handle the task satisfactorily and can be applied in
practice.</p>
      <p>
        We follow the established approach for evaluating KG rule-learning
methods, that is, measuring the quantity and quality of distinct rules learnt. Rule
quality is measured by open path standard con dence (IOPSC) and open path
head coverage (IOPHC). We randomly selected 50 target predicates (using
random.org) from Wikidata and DBPedia. We used all predicates of YAGO2s as
target predicates. A 10 hour limit was set for learning each target predicate.
Table 1 shows the average numbers of quality IOP rules (#Rules, IOPSC 0:1 and
IOPHC 0:01 as in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), numbers of high quality rules (#ARules, IOPSC 0:8)
mined for the selected target predicates, and the running times (in hours,
averaged over the targets).
6 Detailed experimental results can be found at
https://www.dropbox.com/sh/2s2hwah7z8m2495/AACDf0sem9qsxQ83aGbgWMRMa?dl=0
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this work, we propose a method to learn a fragment of SHACL shapes from
KGs as a way to describe KG patterns and also to validate KGs and support new
data entry. Our shapes describe conjunctive paths of constraints over properties
of target predicates. To learn such rules we adapt an embedding-based Open
Path rule learner (oprl) by introducing the following novel components: (1) we
propose IOP rules which allows us to mine rules with free variables with one
atom as body and a chain of atoms as the head, while keeping the complexity
of the learning phase manageable; and (2) we propose an e cient method to
evaluate IOP rules by exactly computing the qualities of each rule using fast
matrix and vector operations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.G.</given-names>
          </string-name>
          :
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          .
          <source>In: ISWC</source>
          . vol.
          <volume>4825</volume>
          , pp.
          <volume>722</volume>
          {
          <fpage>735</fpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Boneva</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dusart</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez</surname>
            ,
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labra</surname>
            <given-names>Gayo</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J.E.</surname>
          </string-name>
          :
          <article-title>Shape designer for ShEx and SHACL constraints</article-title>
          .
          <source>In: ISWC Posters</source>
          . vol.
          <volume>2456</volume>
          , pp.
          <volume>269</volume>
          {
          <issue>272</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Callahan</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herring</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          :
          <article-title>Cultural bias in wikipedia content on famous persons</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>62</volume>
          (
          <issue>10</issue>
          ),
          <year>1899</year>
          {
          <year>1915</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Discovering graph functional dependencies</article-title>
          .
          <source>In: SIGMOD</source>
          . pp.
          <volume>427</volume>
          {
          <issue>439</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fernandez-Alvarez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Gonzalez,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>J.E.L.</given-names>
          </string-name>
          :
          <article-title>Inference of latent shape expressions associated to DBpedia ontology</article-title>
          .
          <source>In: ISWC Posters</source>
          . vol.
          <volume>2180</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ghiasnezhad</given-names>
            <surname>Omran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Taylor</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>Rodriguez Mendez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Active Knowledge Graph Completion</article-title>
          .
          <source>Tech. rep., ANU Research Publication</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Knublauch</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.: Shapes</given-names>
          </string-name>
          <string-name>
            <surname>Constraint Language (SHACL)</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosasco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Holographic Embeddings of Knowledge Graphs</article-title>
          . In: AAAI. pp.
          <year>1955</year>
          {
          <year>1961</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Omran</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Scalable Rule Learning via Learning Representation</article-title>
          . In: IJCAI. pp.
          <volume>2149</volume>
          {
          <issue>2155</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Spahiu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Towards improving the quality of knowledge graphs with data-driven ontology patterns and SHACL</article-title>
          .
          <source>In: Workshop on Ontology Design and Patterns</source>
          . vol.
          <volume>2195</volume>
          , pp.
          <volume>52</volume>
          {
          <issue>66</issue>
          (
          <year>2018</year>
          ), https://www.w3.org/TR/shacl/
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Yago: a core of semantic knowledge</article-title>
          . In: Williamson,
          <string-name>
            <given-names>C.L.</given-names>
            ,
            <surname>Zurko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.E.</given-names>
            ,
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.F.</given-names>
            ,
            <surname>Shenoy</surname>
          </string-name>
          , P.J. (eds.) WWW. pp.
          <volume>697</volume>
          {
          <fpage>706</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Rodr guez Mendez,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Haller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Taylor</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>Omran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Schimatos: a SHACL-based web-form generator for knowledge graph editing</article-title>
          .
          <source>In: ISWC</source>
          (
          <year>2020</year>
          ), To appear.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>