<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Active Knowledge Graph Completion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pouya Ghiasnezhad Omran</string-name>
          <email>P.G.Omran@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kerry Taylor</string-name>
          <email>kerry.taylor@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Rodriguez Mendez</string-name>
          <email>Sergio.RodriguezMendez@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Armin Haller</string-name>
          <email>armin.haller@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian National University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graphs (KGs) proliferating on the Web are known to be incomplete. Much research has been proposed for automatic completion, sometimes by rule learning, that scales well. All existing methods learn closed rules. Here we introduce open path (OP) rules and present a novel algorithm, oprl, for learning them. While closed rules are used to complete a KG by answering given queries, OP rules identify the incompleteness of a KG by inducing such queries to ask. We use adaptations of Freebase, YAGO2, and a synthetic but complete Poker KG to evaluate oprl. We nd that oprl mines hundreds of accurate rules from massive KGs with up to 1M facts. The learnt OP rules induce queries with precision up to 98% and recall of 62% on a complete KG, demonstrating the rst solution for active knowledge graph completion.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph Completion Learning Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Knowledge Graphs (KGs) are a convenient technology to model and store
massive quantities of weakly-structured data. However, their intended scope is
usually poorly de ned and they fail to record relevant entities, as well as relevant
relationships for the entities they do record. Techniques have been developed for
knowledge graph completion and rule learning to curate KGs automatically [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
In these approaches, models, often expressed as logical rules or vector
embeddings, are learnt from a given KG. The models are then used for curating tasks
including link prediction that predicts missing facts for extant entities.
      </p>
      <p>
        Rule learning methods for KGs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] consider closed path (CP) rules which are
used to predict a ground fact that fully instantiates the triple at the head of the
rule. Closed rules enable inference of speci c facts that, if true, are missing from
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
the KG. They draw attention to a potential missing fact only if the fully speci ed
fact is able to be inferred by the rule. Thus KG completion is driven by the
assumption that the KG knows what it does not know. In this paper we consider,
for the rst time, the problem of rules-based knowledge graph completion in the
situation that the KG does not know what it does not know. This problem requires
the KG, as it does for people, to look outside its own environment.
      </p>
      <p>
        We propose learning open path rules (OP) from which we infer less
constrained, open-ended queries to complete a knowledge graph. The proposed OP
rule formalism is a fragment of existential rule [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which is expressive enough
to infer the queries yet they are suitable for our embedding-based scalable rule
mining system. Our OP rules provide evidence that information is missing even
when there is no evidence for a speci c missing fact. The queries inferred from
OP rules identify that a new fact is needed when the answer is not known, but
also not obvious. Such queries could be posed to an active user engaged in a
curating task or to a Web question-answering engine, where the answer might be
found outside the KG. In particular, an answer to the query may well introduce
previously unknown entities to the KG, and thus address a previously unstudied
direction in knowledge graph completion, that is missing entities.
      </p>
      <p>
        As a bene cial side-e ect, our work addresses a long-standing gap in
traditional link prediction systems (e.g. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), that use a KG to propose new links,
but need to be seeded with queries about potential missing links. Such a query
takes the form of a triple with one free variable. Conventionally, for evaluation
purposes, these queries are generated from test facts that are held-back from the
KG in the hope that a high-performing predictor will rediscover the held-back
facts. However, once a link predictor is deployed over a working KG, test facts
cannot be held back. Why would we want to remove a fact from the KG so that
we can construct a query from it so that we might rediscover the same fact? And
how should we choose which facts to remove to ensure that we are generating
queries that are the most important to ask of our link predictor? Since
generating queries this conventional way is problematic, whence does the query arise?
We propose that the queries we infer from our OP rules can be widely used to
generate the queries that link predictors need to repair KGs.
      </p>
      <p>The contributions of this paper are as follows. We present a novel method for
learning open path rules from a KG. These are Horn rules with a di erent form
to the usual closed path rules that are used for knowledge graph completion
tasks. We propose an algorithm to learn these rules based on the embedding
presentations of the predicates and entities. As such, we introduce a rst solution
to the problem of active knowledge graph completion (AKGC), where we aim,
instead of suggesting missing facts, to ask the best questions to complete a KG.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Learning Rules with Free Variables</title>
      <p>Unlike earlier work in rule mining for KG completion, for our active knowledge
graph completion task we mine open path (OP) rules of the following form:
Pt(x; z0)</p>
      <p>P1(z0; z1) ^ P2(z1; z2) ^ ::: ^ Pn(zn 1; y)
(1)
Here, Pi is a predicate in the KG and each of fx; zi; yg are variables (x and y are
free while the zis are bound). Unlike CP rules, OP rules do not necessarily form
a looped path of variable connections, but every instantiation of a CP rule is also
an instantiation of an OP rule. From an instantiation of the body of an OP rule,
we can not infer a fact, but only a query. For example, the following OP rule,
citizenOf (x; t) livesIn(x; z): states that if an entity, x, lives in z, then that
entity is citizen of somewhere (t). By instantiating the body of this rule as follows,
livesIn(bronte; canberra), we could infer the query, citizenOf (bronte; ?).</p>
      <p>To assess the quality of our mined open path rules, we introduce open path
standard con dence (OPSC) and open path head coverage (OPHC) derived from
the closed path forms.</p>
      <p>Let r be an OP rule of the form (1). Then a pair of entities (e; e0) satis es
the body of r, denoted bodyr(e; e0), if there exist entities e1; :::; en 1 in the KG
such that P1(e; e1); P2(e1; e2); :::; Pn(en 1; e0) are facts in the KG. A pair (e0; e)
satis es the head of r, denoted Pt(e0; e), if Pt(e0; e) is a fact in the KG. The open
path support, open path standard con dence, and open path head coverage of r
are given respectively by</p>
      <p>OP supp(r) = jfe : 9e0; e00 s.t. bodyr(e; e0) and Pt(e00; e)gj
OP SC(r) = OP supp(r) ; OP HC(r) = OP supp(r)
jfe : 9e0 s.t. bodyr(e; e0)gj jfe : 9e0 s.t. Pt(e0; e)gj</p>
      <p>
        OP Rule Learning: Our objective is to mine a KG for high-quality OP
rules about a speci c target predicate Pt. While we adhere to the architecture of
RLvLR [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that learns CP rules, we propose the following novelties for mining
OP rules: (i) a novel tness function which can estimate the quality of an OP rule
based on the embedding representations of its predicates (based on the learnt
embeddings from RESCAL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]); and (ii) a novel vector computation which allows
the system to evaluate the OP rules against a massive KG to compute quality
measures, OPSC and OPHC.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>We have conducted two sets of experiments to evaluate oprl6, demonstrating:
(i) Oprl can mine quality OP rules from a range of KGs. Oprl can mine
massive KGs in reasonable time. (ii) Queries generated from oprl's rules are
relevant with good recall and precision in multiple KGs. They far outperform a
distribution-based baseline. The four benchmark datasets are given in Table 1.
All experiments were conducted on an Intel Xeon CPU E5-4620v2 @ 2.60GHz,
66GB RAM and running CentOS 7.</p>
      <p>
        OP Rule Learning: First, we assess how well oprl nds high quality rules.
We are not aware of other OP rule learners with which to compare, but we do
compare the performance of tness functions. The quality of rules are reported
based on their OPSC/OPHC scores calculated against the full benchmark KGs,
6 The datasets used in the experiments and detailed results can be found at
www.dropbox.com/sh/y1f7zut09dheius/AADofv9c18Rzm-CFc64dw2yVa?dl=0
not the samples. Later we will use the mined rules for generating queries, so we
need some holdout facts for query evaluation. For FB15KSE, test and training
sets are available [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For Poker and YAGO2 core we can nd no previously
prepared data, so we randomly partition 90% for training and 10% for testing.
FPYKToBAGka1Ge5brOKl[e23Sc]E1or:[e7B][4e]n#chF92m47a182caMtKKrsk#KEGn4ti71s9t05pi5eKKeksc#i cParetdioicna2st332e277s
TmaabrFPBYlkoBAeekn1GKec25rhOGK:m2SsPaEcreokrrfeo#rmR1ua06ln208e934sce#Aofru25lo60e199sprTlimeon(hob000ue...r152ns720)chIn this paper, we proposed a method for learning rules with free variables from
Knowledge Graphs (KGs). Such rules can be used to generate queries soliciting
missing facts. Notably, the queries could be fed to link predictors, so obtaining
a fully automated framework for KG completion. Our experiments show that
oprl can learn rules for KGs with varying sizes and degrees of incompleteness.
We show the usefulness of the mined rules by applying them to three di erent
KGs to infer relevant queries.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bellomarini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The Vadalog system: Datalogbased reasoning for knowledge graphs</article-title>
          .
          <source>In: VLDB</source>
          . vol.
          <volume>11</volume>
          , pp.
          <volume>975</volume>
          {
          <issue>987</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <volume>2787</volume>
          {
          <issue>2795</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dua</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gra</surname>
            ,
            <given-names>C.:</given-names>
          </string-name>
          <article-title>UCI machine learning repository (2017), archive</article-title>
          .ics.uci.edu/ml Retrieved Nov 2019
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Galarraga</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Te ioudi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          :
          <article-title>Fast rule mining in ontological knowledge bases with AMIE+</article-title>
          .
          <source>The VLDB Journal</source>
          pp.
          <volume>707</volume>
          {
          <issue>730</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrilovich</surname>
          </string-name>
          , E.:
          <article-title>A Review of Relational Machine Learning for Knowledge Graphs</article-title>
          . In: IEEE. vol.
          <volume>104</volume>
          , pp.
          <volume>11</volume>
          {
          <issue>33</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosasco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Holographic Embeddings of Knowledge Graphs</article-title>
          . In: AAAI. pp.
          <year>1955</year>
          {
          <year>1961</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Omran</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Scalable Rule Learning via Learning Representation</article-title>
          . In: IJCAI. pp.
          <volume>2149</volume>
          {
          <issue>2155</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>