<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Rules With Attributes and Relations in Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pouya Ghiasnezhad Omran</string-name>
          <email>p.g.omran@anu.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhe Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kewen Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Grifith University</institution>
          ,
          <addr-line>Brisbane, QLD</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Australian National University</institution>
          ,
          <addr-line>Canberra, ACT</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Rules capture high-level patterns in knowledge graphs and sometimes can provide virtual schemata for the data. Recent research has witnessed an increasing interest in learning rules automatically and applying the learned rules in knowledge graph inferences. While quite a number of rule learning systems have been developed, the formats of learned rules are still restricted, mostly resemble paths in the knowledge graphs. Such methods focus on the graph structure of the knowledge graph, assuming all the vertices and all the edges in the graph have an equal role (while only named or labelled diferently). In this paper, we propose a method HARL (Hub Augmented Rule Learning) for learning rules containing attributes by treating certain binary predicates as unary predicates. A major component of HARL is an algorithm for identifying attributes for a given knowledge graph. HARL has been implemented through a scalable rule learner RLvLR. Our experimental results also show that HARL is scalable and outperforms RLvLR in most cases on major benchmark datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Rule Learning</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Knowledge Graph Completion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge graphs (KGs) have proven to be useful for building interlinks between information
sources by connecting entities of diverse types like people, universities, movies, animals, etc.
This is particularly useful for information synthesis over multimodal media data, such as the
fusion of medical documents consisting of X-ray images and medical reports (texts and tables).
In artificial intelligence, a KG is usually a set of triples like (, bornInCity, ),
meaning that Barty was born in the city of Brisbane. Tools are available for retrieving such
triples from various sources [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thus, solutions for retrieving information can be obtained
by technologies for KGs, such as eficient algorithms for link prediction. The objective of link
prediction is to determine whether a pair of entities are connected via a relation. For instance,
the link prediction task answers a query, (, bornInCity, ?) that enquiries for a city name
where Barty was born in. Existing methods for link prediction are mostly based on embedding
based representation and graph neural networks (GNNs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which adopt a black-box approach
and are known to sufer from issues of explainability and flexibility. As a result, some researchers
advocated providing explainable solutions to KG link prediction by learning rules. In particular,
rules that contain constants are useful for both link prediction and reasoning. However, existing
scalable rule learners based on embedding, e.g. [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], can only learn the class of closed path
rules (CP-rules) but are unable to learn rules containing constants. This assumption is vital since
we face millions of entities in large KGs, which causes an explosion in the search space.
      </p>
      <p>On the other hand, in the KGs, specific entities in specific predicates acts as
attributes. For example, in FB15k, the entity USA in the second argument of the predicate
/film/film/country(, ) acts as a class that distinguish all films ( ) that are produced in the
USA and from those who are not. Such binary attribute reflects a reality of dominance of the
USA film industry globally. We refer to such entity-predicates as hubs and model them through
attribute facts. We propose a method to identify a set of hubs automatically. Consider an
example from YAGO2 KG, we have P : isCitizenOf_E : Netherland() as an attribute or hub. We can
learn a number of patterns that cannot be expressed without such attributes, for instance, the
rule P : isCitizenOf_E : Netherlands() ∧ isCitizenOf(, ) → livesIn(, ) would have only a
much lower confidence degree if without the attribute P : isCitizenOf_E : Netherlands() (three
times less confidence degree than the rule including attribute). Thus, we will consider the more
specific rules such as, P : isCitizenOf_E : Netherlands() ∧ isCitizenOf(, ) → livesIn(, ),
instead of the more general rule, isCitizenOf(, ) → livesIn(, ).</p>
      <p>
        In this paper, we propose a method HARL (Hub Augmented Rule Learning), for learning rules
containing attributes. The idea is to regard such a rule as a closed path rule so that existing
scalable rule learners can be employed. This is achieved by treating a binary predicate as a
unary predicate, for example, an atom /film/film/country(,  ) can be considered as an
attribute /film/film/country_USA(). Thus, a major component of HARL is an algorithm for
identifying attributes for a given KG. HARL can keep the search space in a manageable size
while essential entities for learning rules are not missed. HARL has been implemented through
a scalable rule learner RLvLR [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our experimental results also show that HARL outperforms
RLvLR in most cases on major benchmark datasets.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>A knowledge graph consists of a set of entities ℰ as its vertices and its edges are directed and
labelled with a set of predicates . An entity  is an object such as a place, a person, etc., and a
fact (or link) is a triple (, , ′), which means that the entity  is related to another entity ′ via
the binary predicate . Following the convention in knowledge representation, we denote such
a fact as (, ′).</p>
      <p>
        The format of rules we aim to learn extends closed-path rules (or CP rules), which is a language
bias that is widely adopted in the rule learning literature for knowledge graphs; for instance,
CP rules are the underlying formalism of Path Ranking Algorithms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], RuleEmbedding [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
RDF2Rules [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and ScaLeKB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We extends CP rules with atoms about entity attributes.
      </p>
      <p>An attribute closed-path rule (or an ACP rule or simply a rule)  is of the form
1(0, 1) ∧ 2(1, 2) ∧ · · · ∧
(− 1, ) ∧ 1(1 ) ∧ · · · ∧
( ) → (0, ). (1)
Here each  (0 ≤  ≤ ) is a variable and 0 ≤  ≤  for each 1 ≤  ≤ ,
each (, ) is called an atom, and  and  are called respectively, the subject and
object argument for , and each () is an atom with  being a unary predicate.
Intuitively, the rule  reads that if 1(0, 1), 2(1, 2), . . . , (− 1, ), 1(1 ), . . . , ( )
hold, then (0, ) holds too. The atom (0, ) is the head of  and the set of atoms
{1(0, 1), 2(1, 2), . . . , (− 1, ), 1(1 ), . . . .( )} is the body of . The rule
obtained from  by removing unary predicates in the body, that is, 1(0, 1) ∧ 2(1, 2) ∧ · · · ∧
(− 1, ) → (0, ) is usually referred to as a closed-path rule or CP rule as the sequence
of predicates in the rule body forms a path from the subject argument to the object argument of
the head predicate.</p>
      <p>
        To assess the quality of mined rules, we recall measures that are used in some major
approaches to rule learning [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ]. Let  be a rule of the form (1). A pair of entities (, ′) satisfies
the body of , denoted ()(, ′), if there is a way of substituting variables in () with
entities in the KG such that (i) all atoms in () (after substitution) are facts in the KG,
and (ii) 0 and  are substituted with  and ′ respectively. And (, ′) satisfies the head of ,
denoted (, ′), if (, ′) is a fact in the KG. Then the support degree of  is defined as
() = #(, ′) : ()(, ′) ∧ (, ′)
That is, () is defined as the number of entity pairs that satisfy both the head and the body
of .
      </p>
      <p>The degrees of standard confidence (SC) and head coverage (HC) are defined as two forms of
normalisation for ():
() =</p>
      <p>()
#(, ′) : ()(, ′)
() =</p>
      <p>()
#(, ′) : (, ′)
So, () is the normalisation of () through the number of entity pairs that satisfy the
body, while () is the normalisation of () through the number of entity pairs that
satisfy the head.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Learning Rules with Attributes</title>
      <p>In this section, we present our approach to learn rules with attributes through a simple and
efective method. We first describe how to automatically identify facts in the KG that express
entity attributes , which we refer to as attribute facts, and then explain how to transform such
attribute facts to a form that can be easily processed by existing rule learning methods.</p>
      <sec id="sec-3-1">
        <title>3.1. Identifying Attribute Facts</title>
        <p>
          As argued in [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ], the facts in existing KGs can be separated into (at least) two
categories: those representing entity relations and those for entity attributes (hubs). For example,
hasFather(Joe Biden, Joseph R. Biden) expresses the relationship between two entities Joe Biden
and Joseph R. Biden, whereas gender(Joe Biden, Male) is naturally considered as an attribute of
the entity Joe Biden. Sometimes it is not easy to draw a clear line between the two types of
facts, but in general, entity relations often involve large numbers of entities, like Joe Biden and
Joseph R. Biden, but the links between an attribute and entities are relatively sparse; in another
word, entity attributes often involve a small number of entities to represent attribute values,
such as the gender Male, which are linked to a huge number of other entities.
        </p>
        <p>Existing rule learning approaches often do not distinguish entity attributes from entity
relations as they can only learn the class of CP rules like</p>
        <p>hasFriend(, ) ∧ gender(, ) → gender(, ).</p>
        <p>However, in many cases it makes more sense to consider ‘gender’ as an attribute when rules of
the following more general form are learned.</p>
        <p>hasParent(, ) ∧ gender(, Male) → hasFather(, ).</p>
        <p>To identify attribute facts (, ′), we need to identify ′ that can be used for attribute values,
such as Male, and  that is suitable for representing attributes, such as gender. That is, we need
to identify the attribute values ′ and attribute predicates .</p>
        <p>For attribute values, we observe that they are often linked to a huge number of entities and
thus form kind of “hubs” in the network. Based on the above discussion, such hubs can be
identified by their connection densities.</p>
        <p>We define the density of a hub, (, ), as the (in and out) degrees of an entity  w.r.t. a
predicate  as follows.</p>
        <p>(, ) =</p>
        <p>#′ : (′, )
#(′, ′′) : (′, ′′)
(, ) =</p>
        <p>#′ : (, ′)
#(′, ′′) : (′, ′′)
(2)
In the definition of density, #′ : (, ′) or (′, ) is the total number of occurrences of  (as
a subject or object) in a fact of the KG, and it is normalized by the total number of facts about 
in the KG.</p>
        <p>Yet a hub  with high density is not necessarily suitable for an attribute value. Consider an
extreme case where all the other entities ′ are connected with  through a predicate , then  is
not an attribute value that can distinguish ′ from other entities. From information theory, such
an attribute does not provide any information gain as it does not express any distinguishing
feature of associated entities. Thus, we need to select the hubs that both have high density and
provide distinguishing features. We compute the entropy of hubs as follows.
(, ) = − ( * () + (1 − ) * (1 − )), where  ∈ {, },  = (, ) (3)
Next, we want to identify those predicates that are suitable for attributes. This is based on
the observation that such a predicate often associates an entity with its attribute values, and
the number of possible values (like Male) is often far smaller than the number of entities that
can be associated with the attribute (like Joe Biden). So we define the imbalance degree (imb) of
a predicate  as follows.</p>
        <p>() =</p>
        <p># : (, ′)
(#′ : (, ′)) * |ℰ |
() =</p>
        <p># : (′, )
(#′ : (′, )) * |ℰ |
(4)</p>
        <p>Intuitively, the degree () being large means that  has more potential subjects than
objects, which suggests the objects of  could potentially be attribute values (whereas the
subjects are more likely to be entities). In other words, the vertices with  being an in-coming
edge are more likely to be attribute values than those with  being an out-going edge.</p>
        <p>Finally, we combine the two measures to determine the likeliness of a fact with a hub  and a
predicate  being an attribute fact.</p>
        <p>(, ) = (, ) * (), where  ∈ {, }.
(5)
The larger (, ) (or (, )) is the more likely a fact (, ′) (resp., (′, )) is an
attribute fact, with ′ being an entity and  representing its attribute value.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. KGs with Attributes</title>
        <p>
          Unlike existing embedding methods that embed attributes separately from the other relations [
          <xref ref-type="bibr" rid="ref10 ref3">3,
11, 10</xref>
          ], we propose a method to transform the attribute facts, so that existing rule learners like
RLvLR can be easily adapted to learn rules with attributes from such facts. We illustrate the
data flow of our data transformation in Figure 1, where we obtain an enriched KG through the
identified attribute facts.
        </p>
        <p>In general, attributes can be seen as unary predicates, whereas other relations are
binary ones. For an attribute fact gender(Joe Biden, Male), it can be seen as a unary fact
gender_Male(Joe Biden). Yet, such unary fact cannot be directly processed by existing rule
learners as they only accept binary facts (i.e., triples) as inputs. Hence, we represent the
unary fact gender_Male(Joe Biden) as a fact that is essentially a self-loop in the graph, i.e.,
gender_Male(Joe Biden, Joe Biden).</p>
        <p>Formally, each attribute fact in the original KG of the form (, ′) (or (′, )) with ′ being
the attribute value can be transformed into a fact  : _ : ′(, ) (resp.,  : ′_ : (, )),
where  : _ : ′ (resp.,  : ′_ : ) is a new predicate. In Figure 2, we illustrate how
attribute facts are transformed into self-loop in the KG.</p>
        <p>The intuition of such a transformation is to represent attributes as self-loops in the KG, so that
path-based rule learning method can be easily adapted to handle such a KG while the attribute
atoms (as self-loops) can be conveniently identified in the learned rules. For example, a path
hasParent(Joe Biden, Joseph R. Biden), gender(Joseph R. Biden, Male) is transformed into a
path hasParent(Joe Biden, Joseph R. Biden), P : gender_E : Male(Joseph R. Biden, Joseph R. Biden),
which together with many other similar paths may induce a rule</p>
        <p>hasParent(, ) ∧ P : gender_E : Male(, ) → hasFather(, ).</p>
        <p>The rule can be rewritten as</p>
        <p>hasParent(, ) ∧ P : gender_E : Male() → hasFather(, ).</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Learning Rules with Attributes</title>
        <p>In Algorithm 1, we explain the complete process of identifying attribute facts, expanding the
KG with transformed facts, and learning rules with attributes. Intuitively, we first select fact
patterns (, · ) or (· , ) that can be considered attribute facts. We represent such a pattern as
a triple (, , ) with  ∈ {, }, which should not be confused with a triple in the KG. We
select such triples where the total number of facts based on such patterns that can be added to
the KG is bounded by a expansion degree . In what follows, we call such a triple a hub.</p>
        <p>In line 1, we compute the frequency of all potential hubs. For each candidate hub (, , ),
its frequency is the unnormalized density which is the numerator of Equation (2). To compute
the frequencies, we just need to parse all facts in the KG once. We select top  candidate hubs,
where  =    , to form the .</p>
        <p>In lines 2 to 3, we compute the  measure for all potential hubs in  as we explained
in Equation (5). Then, we sort the list of candidate hubs by their  values.</p>
        <p>In lines 5 to 9, we consider the potential hubs one by one from the  with the highest
 degree and create the corresponding new attribute predicate and facts. We stop this process
when we hit the maximum expansion degree. The expansion degree is computed by the number
of new facts divided by the number of facts in the original KG. The maximum expansion degree
is given as input parameter,  . In line 10 we feed the new set of facts to RLvLR and
obtain a set of attribute rules. In line 11 we return a set of rules with attributes.
Algorithm 1 Learn Rules with Attributes
Input: a KG as a set of facts ℱ
Parameter: a max expansion degree   and an integer    
Output: a set of rules with attributes ℛ</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Context Aware Inference</title>
      <p>
        There is a body of works for applying the learned rules to the known facts for inferring new
facts in KGs. Most of these works focus on how the rules can be used in a restricted iterative
way [12]. There is another stream of works that consider applying the uncertain rules once in
the knowledge graph community [
        <xref ref-type="bibr" rid="ref4 ref8">4, 13, 8</xref>
        ].
      </p>
      <p>In the latter stream of works, each rule is augmented with a degree of confidence and the
inference module takes to the account these degrees to infer a new fact (which is also augmented
with a degree of confidence). To control the quality and number of new inferred facts, these
systems put a maximum threshold on the number of new facts and/or put a minimum threshold
on the confidence of new facts.</p>
      <p>The problem with this method is they do not consider the demand of KG regarding each new
fact; they merely focus on the process which results in the new facts. For example, consider the
predicate isParentOf. For each human person, usually, we have at most two entities as parents.
In the case that we want to infer new facts regarding this predicate for a specific person, Ann,
we consider all rules that result in the following partially grounded fact (isParentOf(, )).
On the other hand, we need to consider how many instantiations of this fact are needed. If
we have already two entities as parents of Ann, we should reject any more facts even if those
facts are inferred via quite confident rules. For another entity, Pit, that we have no facts that
state the parent of him, we should accept the new inferred fact even if the confidences of the
corresponding rules are not high. Thus, in the context-aware inference method, we try to
consider the confidence of the new fact production process and the demand of the new fact in
the KG.</p>
      <p>In the context-aware inference method, we will infer the new facts by considering the already
related existing facts in the KG. We could obtain an average number of facts regarding each
predicate and entity in a specific predicate argument. Later, when we want to infer new facts, if
there are enough facts already in the KG, inferring new facts are likely to be overpredicting.
To avoid overpredicting, we consider just the predictions which securely fill the gap (between
the current number of facts and the average number). We give higher priority to the facts with
higher SC if there are alternatives. So we sort all predictions regarding their SCs, and for each
one (1(1, 2)) we consider the gap between the average number of facts regarding specific
entity and predicate and the concerning new fact. We obtain the Average Number of Facts
(ANF) for each entity, predicate, and argument. Although this averaging method ignores the
variations of the number of facts for a specific entity in the specific predicate, we propose it
inferring a new fact in an extra cautious fashion. We do not claim the rejected new facts are not
valid but based on this mechanism; we do not have strong support that the new fact is needed.
 1 =
 1 =
#(, ′) : 1(, ′)
# : ∃ 1(, )
#(, ′) : 1(, ′)
#′ : ∃ 1(, ′)
((#′ : 1(1, ′)) &lt;  1) ∧ ((# : 1(, 2)) &lt;  1 )</p>
      <p>We accept a new fact if the condition (8) is satisfied. This condition demands the new fact
populate the KG whence both subject and object have less than average relations regarding the
predicate.</p>
      <p>By this extra cautious method, we can avoid overpopulating KG by predicting too many facts
w. r. t. any specific pair of subject-predicate or object-predicate with the cost of excluding many
facts that might be valid. For example, in the case of nationality, the number of entities with
the same nationality varies from a hundred thousand to a billion based on real-world data. By
considering the average, we might exclude many new valid facts for a more populated country.
Hence, the excluded facts by this mechanism are not wrong, but if we have a high priority to
infer fewer facts, we exclude them.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>Based on the methods presented in previous sections, we have implemented Hub Augmented
Rule Learning (HARL) and conducted three sets of experiments to evaluate the new system.</p>
      <p>The first set of experiments aim to evaluate the efect of new hub predicates on rule learning.
In the second set of experiments, we use the learned rules to infer new facts via the original
inference module of RLvLR and the new context aware inference module. The results show the
usefulness of hub-augmented rules to infer more quality facts. The third set of experiments aim
to evaluate the improvements of RLvLR on link prediction task regarding using hubs as new
predicates.</p>
      <p>In Table 1, we show the statistics of three widely used Knowledge graphs. We randomly
selected 50 target predicates for target predicates, and for YAGO2 core, we used all 32 predicates
as target predicates. All results shown in the following experiments are average results for the
corresponding target predicates.
(6)
(7)
(8)</p>
      <sec id="sec-5-1">
        <title>5.1. Eliminating Redundant Rules: Concise Rule Set</title>
        <p>
          Rule learners can be evaluated by the number of distinct quality rules that they can learn.
AMIE+ [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] eliminates duplicate rules. In this process, two rules are considered duplicated
only if they have the same head predicate, the same number of atoms (body length), and
the same head coverage (or support). While such elimination merely considers rules with
same syntax as the same rule (e.g. 3(, ) ∧ 2(, ) → 1(, ). and 2(, ) ∧ 3(, ) →
1(, ).), it does not consider the semantic similarity like, livesIn(, ) → isCitizenOf(, ).
and livesIn(, ) ∧ Person() → isCitizenOf(, ). Intuitively the last second atom of second
rule is obvious and does not add any information to the rule while it makes a new (syntactically)
rule. These kind of semantically redundancy can discredit the evaluation of rule learning
systems based on the number of quality rules. To overcome this challenge we introduce a novel
method to obtain a set of rule with minimum redundancy.
        </p>
        <p>We deployed a greedy search to find a set of rules with minimum redundancy w.r.t. fact
inferences. Each rule  is associated with a set of facts that can be inferred from the rule, denoted
 (). For a set of rules ℛ,  (ℛ) = ⋃︀∈ℛ  (). To obtain a concise set of rules, we
start with an empty set ℛ and add rules into ℛ one by one from the set of all learned rules,
following the descending order of the confidences of the rules (i.e., their SCs). Before adding a
rule  to ℛ, we first check if the facts inferred by  can already be inferred by those in ℛ. We
define the novelty degree for each rule  as follows:
noveltyDegree() = | ()| − |  () ∩  (ℛ))|
| (ℛ))|
We add the rule  to ℛ only if the novelty degree is higher than a threshold.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Learning Rules with Attributes</title>
        <p>
          Table 2 shows the average numbers of rules (#) with the minimum quality SC≥ 0.1 and
HC≥ 0.01 as these values are used in RLvLR [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and AMIE+ [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. # indicates the average
number of rules with the following minimum qualities, SC≥ 0.8 and HC≥ 0.01. # indicates
the average size of the concise rule sets (obtained from all # learned rules). # indicates
the average size of the quality concise rule sets (obtained by from # high quality rules).
# −  indicates the average number of concise rules that contains at least one hub predicate
(obtained from the concise rule set of size #).
        </p>
        <p>Compared to RLvLR, HARL showed significantly better performance in terms of both the
rules with ordinary filtering and concise filtering. The superiority of HARL is more obvious in
learning high quality rules, that is, among all the rules learned, HARL has a consistently higher
number of high quality rules.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Inferring New Facts via Learnt Rules</title>
        <p>
          To estimate the predictive power of the corpus of learned rules, we calculated the number of
facts can be predicated by applying learned rules on the given facts via RLvLR inference module
and context-aware inference module. To aggregate the diferent rules with same target predicate
we use Noisy-OR similar to RLvLR [13] and AMIE+ [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Table 3 shows the numbers of RLvLR
inference predicted facts (#UNF), those predictions via context-aware inference facts (#NF), and
those context-aware inferred facts with CD≥ 0.8 (#QNF).
        </p>
        <p>The running of two systems have the following characteristics regarding time and other
features 4. #Hub indicates the number of hubs that we consider before the number of new
facts hits the max expansion degree (  in algorithm 1), which we assigned 0.1 for max
expansion degree in these experiments. EXP is the actual expansion degree. Min is the 
value of the last hub that we consider.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Link Prediction via Learnt Rules</title>
        <p>
          We additionally evaluated the quality of rules through link prediction. We consider two queries,
(, ?) and (?, ′), for each fact, (, ′), in the testing data in each benchmark. For FB15k and
FB15KSE the separation of test and train data has been done, while for YAGO2core, there is
no previously prepared data, so we randomly separate 90% for train and 10% for the test. We
applied learnt rules on the training data to predict the missing entities, and the testing data
verified the correctness of such predictions. The predictions were measured using Mean Rank
(MR) in which the less value indicates better performance and Mean Reciprocal Rank (MRR) in
which the higher value indicates better performance. We also deployed Hits@1 and Hits@10.
These four quality measurements are commonly used in the Knowledge Graph Completion
literature [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In particular, we first ranked each inferred fact based on the number of rules
inferring it. MR is the average number of ranks. MRR is the average number of the reciprocal
ranks of correctly predicated entities. Hits@10 and Hits@10 are the proportion of correctly
predicted entities that are ranked among the top ten and one, respectively. Table 5 summaries
the results, where H10 and H1 are the Hits@10 and Hits@1 scores.
        </p>
        <p>HARL outperforms RLvLR in FB15K in all quality measures significantly while in other two
benchmark it performs in par. Since the FB15kSE and YAGO2core are more dificult tasks
regarding link prediction, it might be the case that proposing the hub could not help the link
prediction.</p>
        <p>To see how the hubs can efect the mined rules consider the following two rules from
YAGO2core.</p>
        <p>: 0.03,  : 0.07, wasBornIn(, ) ∧ isLocatedIn(, ) → isCitizenOf(, ).
 : 0.61,  : 0.01, P :&lt; graduatedFrom &gt; _E :&lt; University_College_Dublin &gt;()
∧wasBornIn(, ) ∧ isLocatedIn(, ) → isCitizenOf(, ).</p>
        <p>The standard confidence of the second rule improved by about 20 times. Since we add a
condition at the body of the rule, we make it more specified, so it fires less number of facts, but
those shots are more accurate and improve the SC significantly. On the other hand, since the
number of resulting facts in the second rule decreased, the head coverage also reduced, making
the rule less general. While, the first rule is filtered due to its low SC, the specified version of it
has higher SC and qualified to be in the output set of rules.</p>
        <p>From FB15k KG, the following hubs are created as they have top .
1. P :&lt; /people/person/gender &gt; _E :&lt; male &gt;()
2. P :&lt; /people/person/spouse_s./people/marriage/type_of_union &gt; _E :&lt; matrimony &gt;()</p>
        <p>From this example, we can observe the identified hubs in FB15k are intuitive and shows
which binary attributes can separate the related entities into two sets. In the real case scenario,
regarding the film industry, we have some dominant categories that can distinguish the entities,
like the movies produced in the USA in contrast with those that were produced in any country
other than the US. With our proposed modelling, we can reflect on this disparity between
diferent values of one attribute.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Related Works</title>
      <p>
        Several rule learning systems, including AMIE+ [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], RuDiK [14] and anyBURL [15], allow a set
of entities to be considered in the partially grounded rules. Gu et al [16] introduce a language
bias for their rule learning method, in which the head atom and last atom of the rule’s body are
partially grounded but no entities are allowed in other atoms. However, they treat all entities
equally in the partially grounded rules that are leaned. These method do not have a mechanism
for selecting a set of entity-predicate pairs for attribute values. Consequently, the search space
of possible rules grows vastly and these systems struggle to scale to the real-world massive KGs.
      </p>
      <p>The other related stream of works is attribute knowledge graph modelling. In general, our
work is diferent from this line of works for at least two reasons: (1) they assume an input KG
contain attributes while we create attributes out of the given KG; (2) the nature of attributes
they consider is only numeric or categorical, while ours is a particular case of categorical ones.</p>
      <p>
        The attribute KGs difer in terms of the nature of attributes. [ 11] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] can model KGs
that contain numeric feature vectors as attributes of nodes (entities). In [11], authors consider a
text related to an entity like the title of a paper as a feature of paper node. They use methods in
Natural Language Processing to construct a feature vector for representing the title feature in a
numeric representation rather a string of characters.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], authors distinguish the relations from attributes and consider only categorical attributes.
The attribute KG that they consider, FB24K, is extracted from Freebase. The number of values
of attributes range from 1 to 877 categorical attributes and the number of attributes is 314.
While what an attribute in their approach is similar to the hubs in our method, our attributes
are binary. Furthermore, we do not remove the original facts, and an entity can act as both an
attribute value and an entity. For example, the USA, in regards to predicate nationality, acts as
an attribute value. At the same time, it acts as an entity in the predicate neighbour. However,
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] does not consider the attribute values as entities, so ignore the dual role of a thing as an
entity and attribute/value.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this paper, we proposed an approach to learn attributed rules over KGs. In contrast to
existing attribute KG modelling approaches that assume attributes as input, we can mine the
attributed rules from KG without an explicit set of attributes. We achieved this by proposing
a binary attribute that can easily be embedded into the closed path rules and an identifying
mechanism that automatically mines such attributes from KG. We enhance the original KG
with new predicates (hubs) and corresponding facts. Then we feed the improved KG to a
state-of-the-art rule learner, RLvLR, to learn attributed rules. We also proposed a novel inferring
mechanism that considers the demands of a new inferred fact and the quality of the rules that
infer it. Finally, we evaluated our system HARL on widely used benchmarks in rule learning
and link prediction. Our experimental results demonstrate that HARL improved RLvLR in terms
of both the number of mined quality rules, inferred facts, and accuracy in link prediction.
[11] Z. Hu, Y. Dong, K. Wang, K. W. Chang, Y. Sun, GPT-GNN: Generative Pre-Training of</p>
      <p>Graph Neural Networks, in: SIGKDD, ACM, 2020, pp. 1857–1867.
[12] M. Svato, S. Schockaert, J. Davis, STRiKE : Rule-Driven Relational Learning Using Stratiefid
k-Entailment, in: ECAI, 2020.
[13] P. G. Omran, K. Wang, Z. Wang, An Embedding-based Approach to Rule Learning in</p>
      <p>Knowledge Graphs, TKDE 33 (2021) 1348—-1359.
[14] S. Ortona, V. V. Meduri, P. Papotti, Robust discovery of positive and negative rules in
knowledge bases, in: International Conference on Data Engineering (ICDE), IEEE, 2018,
pp. 1180–1191.
[15] C. Meilicke, M. W. Chekol, D. Rufinelli, H. Stuckenschmidt, Anytime bottom-up rule
learning for knowledge graph completion, in: IJCAI, 2019.
[16] Y. Gu, Y. Guan, P. Missier, Towards Learning Instantiated Logical Rules from Knowledge
Graphs, arXiv preprint (2020).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pellissier-Tanon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , YAGO 4:
          <string-name>
            <given-names>A</given-names>
            <surname>Reason-able Knowledge</surname>
          </string-name>
          <string-name>
            <surname>Base</surname>
          </string-name>
          , in: ESWC,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , J. Cheng,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <article-title>Knowledge graph completion: A review</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>192435</fpage>
          -
          <lpage>192456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , S. Goldberg,
          <article-title>ScaLeKB: scalable learning and inference over large knowledge bases</article-title>
          ,
          <source>The International Journal on Very Large Data Bases</source>
          <volume>25</volume>
          (
          <year>2016</year>
          )
          <fpage>893</fpage>
          -
          <lpage>918</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Omran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Scalable rule learning via learning representation</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gardner</surname>
          </string-name>
          , T. Mitchell,
          <article-title>Eficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction</article-title>
          ,
          <source>in: Conference on Empirical Methods on Natural Language Processing</source>
          , September,
          <year>2015</year>
          , pp.
          <fpage>1488</fpage>
          -
          <lpage>1498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>Embedding entities and relations for learning and inference in knowledge bases</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2015</year>
          , p.
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles</article-title>
          ., arXiv preprint (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Galárraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Teflioudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <article-title>Fast rule mining in ontological knowledge bases with AMIE+</article-title>
          ,
          <source>The International Journal on Very Large Data Bases</source>
          <volume>24</volume>
          (
          <year>2015</year>
          )
          <fpage>707</fpage>
          -
          <lpage>730</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Knowledge representation learning with entities, attributes and relations</article-title>
          , in: IJCAI,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          , X. Cheng, S. Xie,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Learning High-order Structural and Attribute information by Knowledge Graph Attention Networks for Enhancing Knowledge Graph Embedding, arXiv preprint (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>