=Paper=
{{Paper
|id=Vol-3020/KR4L_paper_7
|storemode=property
|title=The Effect of Rule Injectionin a Leakage Free Dataset
|pdfUrl=https://ceur-ws.org/Vol-3020/KR4L_paper_7.pdf
|volume=Vol-3020
|authors=Mirza Mohtashim Alam,Mojtaba Nayyeri,Chengjin Xu,Md Rashad Al Hasan Rony,Hamed Shariat Yazdi,Afshin Sadeghi,Jens Lehmann
|dblpUrl=https://dblp.org/rec/conf/ecai/AlamNXRYS020
}}
==The Effect of Rule Injectionin a Leakage Free Dataset==
<pdf width="1500px">https://ceur-ws.org/Vol-3020/KR4L_paper_7.pdf</pdf>
<pre>
                        The Effect of Rule Injection
                        in a Leakage Free Datasets

 Mirza Mohtashim Alam1 , Mojtaba Nayyeri1 , Chengjin Xu1 , Md Rashad Al
Hasan Rony2 , Hamed Shariat Yazdi1 , Afshin Sadeghi1,2 , and Jens Lehmann1,2
                1
                 SDA Research Group, University of Bonn, Bonn, Germany
                        2
                          Fraunhofer IAIS, Dresden, Germany
                               s6mialam@uni-bonn.de
                  {Nayyeri, Xu, shariat, sadeghi}@cs.uni-bonn.de
            {md.rashad.al.hasan.rony, jens.lehmann}@iais.fraunhofer.de


         Abstract. Knowledge graph embedding (KGE) has become a promi-
         nent topic for many AI-based tasks such as recommendation systems,
         natural language processing, and link prediction. Inclusion of additional
         knowledge such as ontology, logical rules and text improves the learning
         process of KGE models. One of the main characteristics of knowledge
         graphs (KGs) is the existence of relational patterns (e.g., symmetric and
         inverse relations) which usually remain unseen by the embedding mod-
         els. Inclusion of logical rules provides embedding models with additional
         information about the patterns already present in the KGs. The injection
         of logical rules has not yet been studied in depth for KGE models. In this
         paper, we propose an approach for rule-based learning on top of the two
         embedding models namely RotatE and TransE within this scope of the
         paper. We first study the effect of rule injection in the performance of
         the selected models. Second, we explore how the removal of leakage from
         popular KGs such as FB15k and WN18 affects the results. By leakage
         we are referring to the patterns exist in the training set from the test set
         (e.g. if the test set contains (h, r, t) then it also contains t, r, h in the
         training set which is considered as a symmetric leakage where t, r and h
         refers to tail, relation and head respectively). Empirical results suggest
         that incorporation of logical rules in the training process improves the
         performance of KGE models.

         Keywords: Relational Patterns · Knowledge Graphs Embedding · Logic.


1      Introduction

Nowadays, knowledge graphs (KGs) are one of the leading technologies in knowl-
edge representation and representation learning. The main characteristic of this
technology is its representation of facts in a triple set KG = {(h, r, t)|h, t ∈ E, r ∈
R} in which h, t, r refer to the head and tail and relation respectively, where E

    Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
    International (CC BY 4.0).
2       M.M. Alam et al.

is a set of entities (nodes), and R is the set of relations (links). Relational learn-
ing is the act of applying learning models on KGs. Among the already existing
learning approaches, knowledge graph embedding (KGE) models have shown
influential results in link prediction. An embedding model provides vector rep-
resentations of entities and relations in a KG where a triple (h, r, t) is mapped
to its corresponding d dimensional vector as h, r, t ∈ Rd .
    In recent years, there have been a series of embedding models which were pro-
posed with different learning skills. Some of the popular models are TransE [1],
RESCAL [9], RotatE [11], and ComplEx [12]. However, not all the models are
capable of capturing potential properties of KGs. One of the main characteristics
of KGs, which originates from the relational representation of knowledge, is the
existence of relational patterns. More precisely, the nodes and relations of a KG
are very likely to form multiple types of patterns such as symmetric, inverse,
transitive, and reflexive relationships - which can be necoded by many of the
KGE models. For example a family of translation-based KGE models (namely
TransE, TransR, TransH) is designed for encoding such patterns [6].
    Empowerment of KGE models can be done internally by adding comple-
mentary knowledge to the underlying KGs. In the case of encoding relational
patterns, the injection of logical rules is seen as an approach when the model
cannot capture the patterns directly from the KG itself.
    In order to further study the influence of logical rules in the learning pro-
cess of KGE models, firstly, we have developed a pipeline where we are able to
inject rules in popular embedding models. Research suggests that rule injection
can aid learning of KGE models. After injection of logical rules, we have done
experiments to get an intuition about how rule injection is affecting the perfor-
mance of KGEs. Secondly, we removed known patterns (leakage) from the test
set, which we call refined test set. In this step, we have additionally evaluated
the performance of the underlying models. Results approve our hypothesis such
that rule injection can be beneficial to KGE models such as TransE and RotatE.
Removal of known patterns as leakage from the test set causes the results of the
same KGE models to become worse. Unfortunately, rule injection is unable to
help when the leakage has been removed from the test set. Within the scope of
this paper we investigate on four types of rules, namely implication, symmetric,
inverse and equivalence (Table 1).
2    Related Work
Since patterns are implicitly hidden all over the KGs, in order to empower KGEs
with complementary knowledge of logical rules, one need to firstly extract them.
Here we represent three of the related rule extraction and injection attempts
in the context of KGEs. There are multiple rule extraction frameworks such
as AMIE [3] and AMIE+ [2]. They are mainly designed for rule extraction.
The extracted rules are comprehensive to some extent, however for encoding
of KGEs, some steps of post-processing is needed. KALE [4] is another rule
extraction and injection framework which focuses on two types of rules only:
implication and composition. To model these rules explicitly, they first ground
the rules. Grounding a rule means instantiating that rule with concrete en-
                      The Effect of Rule Injection in a Leakage Free Datasets      3

tities. For instance, grounding a rule (h, isCapitalOf, t) ⇒ (h, locatedIn, t)
would produce a set of groundings such as (Helsinki, isCapitalOf, F inland) ⇒
(Helsinki, locatedIn, F inland). RUGE [5] uses an iterative approach for query-
ing the KG and retrieving soft rules to label the unlabeled triples. RUGE
aims at modeling logical rules directly. Soft rules are the one that are usu-
ally hold, even if they are sometimes mistaken. An example of a soft rule is
(h, bornIn, t) ⇒ (h, hasN ationality, t). RUGE needs labeled triples which is ad-
ditionally costly if the underlying KG does not have that. IterE [13] is a recent
paper which focuses on learning embedding and rules in an iterative fashion.
Our approach differs from their in many ways (e.g., we have our own ground-
ing generation technique and our rule loss somehow acts as a regularizer to the
model rule loss). There are other tools such as WARMR [7] and ALEPH 3 which
are not suitable for KGEs due to the low scalability.

3     Rule Extraction
As mentioned before, KGs include relational patterns. However, they are often
not explicitly modelled via schema axioms or rules. If we want to use the logical
rules as complementary knowledge in KGEs, we need to first extract patterns
from KGs and make them available for such models. Towards having a set of
rules extracted from KGs, a set of certain steps have been followed in this work
using AMIE [3] tool as a core tool. In this section, we discuss the required steps
for rule extraction from KGs that will be used later for injection into KGEs.

3.1    Input: Knowledge Graph
The KGs that are considered as an input of our rule extractor step are seen
as raw data in the form of RDF (Resource Description Framework) 4 . This
representation is also in triple format (s,p,o) (Subject, Predicate and Object).
Subject and object are considered as entities. Predicate is considered as relation
between the two entities. We use two main standard datasets as input KGs
namely FB15k [1] and WN18 [1] for our experimental benchmark. The selection
of these datasets is due to the fact that both of these KGs contain leakage in the
test set. For example, if any inverse pattern of a training triple exists in the test
triple it becomes a lot easier for any model to infer. We call such inverse pattern
as leakage in the test set. Generally a leakage containing test set yields better
result than a leakage free test data. We removed such patterns to make those test
triples leakage free. Hence, we wanted to investigate the fact that, whether the
models are generalized to infer unknown data which does not contain any known
pattern from the training set. In this paper, we deal with four types of rules
namely, implication, inverse, symmetric and equivalence. Therefore, during this
research, when considering the number of rules, we only refer to these four types
of rules. The number of rules in FB15k and WN18 are 414 and 16 accordingly.
After the generation of groundings we have found about 71% and 93% leakage
in the test set of FB15k and WN18.
3
    http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph toc.html
4
    https://www.w3.org/RDF/
4       M.M. Alam et al.

3.2   Output: Extracted Rules

The core of rule extraction step is based on AMIE [3], a rule mining tool to
extract logical rules from underlying KGs without parameter tuning or expert
input. With a systematic analysis we selected AMIE as in comparison to other
rule extraction tools, it has a scalable algorithm which is employed for rule
mining [10]. For other tools (e.g., WARMR [7]), we needed to provide specific
information related to predicates but AMIE is more efficient in this regard. In
some other tools (e.g., ALEPH), a user is required to input background knowl-
edge and positive examples for the target predicates where in AMIE we just
need to provide the KG as an input [3]. Hence, AMIE is particularly useful for
the extraction of logical rules from large structured data or KGs. It also assigns
confidence scores for each of the rules which specifically indicates the plausibility
of the extracted rule. For a given KG as an input to AMIE, it generates output
as statistical measures such as standard confidence, PCA (Principal Component
Analysis) confidence, head coverage for each of the mined rules. In our study,
we used its latest version named as AMIE+ [10]. In order to have a final set of
rules, a set of steps needs to be performed (Fig 1).
    Rule generation is a significant part for this pipeline. Not all of the ex-
tracted rules provide valuable information in the training process. Therefore,
rules which have a lower confidence level in the dataset are discarded. Based on
the comparisons of the results and evidences in the KGs, a threshold value of
0.8 is being set to abandon unwanted rules.


                    Fig. 1. The pipeline of grounding generation.


    Grounding generation is required for the rule injection in the selected
models. Grounding are the set of triples from the dataset that follows the patters
of rules extracted by AMIE. After mining the rules from the underlying KG using
AMIE+ and filtering out unwanted rules, we have generated groundings from
those filtered rules, which are required for generating rule loss in the training
phase. To do so, for each of the rules among the extracted results, the components
are decomposed. After matching the corresponding patterns of the rules in the
form of premise → conclusion for each of the class types (e.g., implication,
inverse, symmetric and equivalence), the relations from premise and conclusion
of the rule pattern is taken with matched class type together into a final set of
rule (named as rule bag). Secondly, after the rule bag is created for each of the
                         The Effect of Rule Injection in a Leakage Free Datasets           5

items, it is taken and matched with the training triple of the respective KG. An
example of how this task of grounding generation from original triple is depicted
in Fig. 1. While extracting pattern of rules from KG using AMIE tool we enforce
a confidence threshold value of 0.8 to filter out irrelevant pattern of rules.
4     Rule Injection

Rule injection is performed by adding groundings in the training phase. Ground-
ing loss has been incorporated alongside of the KGE training by adding it up
with the total loss. In this paper, two KGEs have been adopted in such a way
that they learn grounding loss and triple score loss simultaneously as a total loss.
Based on the total loss of the models (TransE and RotatE), back-propagation is
performed and parameters are updated accordingly. In this section, each of the
models that we use for rule injection are described in brief.
4.1   Selected KGE Models

TransE [1] calculates the plausibility of triples by measuring the distance be-
tween the addition of the head and relation h + r and their respective tail t
i.e. d(h + r, t). The Margin Ranking Loss (MRL) is used to distinguish positive
(h, r, t) and negative triples (h0 , r, t0 ) by setting a margin between them.
     RotatE [11] uses rotation of the head towards tail via relations in complex
embedding space to compute score as dr = kh ◦ r − tk. Every embedding vector
is complex and contains real and imaginary parts. RotatE uses the following loss
                               n
                                   1              0 0
                              P
L = −logσ(γ − dr (h, t)) −        k logσ(dr (hi , ti ) − γ). γ is the margin. The total
                                    i=1
loss has also been achieved by summing up the models loss and the rule loss as
discussed previously. In both TransE and RotatE, negative triples are sampled
from a probability distribution which works as a weight in the calculation of self
adversarial negative sampling as proposed in RotatE [11].
4.2   Injection of Rules in Loss Function of KGE Model

In our context, we incorporate the groundings to the training phase which we
obtained from the extracted rules. In this paper, we define this as rule injection.
After obtaining the loss of the groundings in training phase, it is added with
the mainstream loss. The principal goal of rule injection into KGE models is to

              Rule        Definition                 Formulation based on score function
              Implication (h, r1 , t) ⇒ (h, r2 , t)                r1 ≤ f r2
                                                                 fh,t     h,t
              Symmetric (h, r, t) ⇐⇒ (t, r, h)                 r
                                                              fh,t    r +ξ
                                                                   = ft,h     h,t
              Inverse     (h, r1 , t) ⇒ (t, r2 , h)                r1     r2
                                                                 fh,t ≤ ft,h
              Equivalence (h, r1 , t) ⇐⇒ (h, r2 , t)           r1     r2
                                                              fh,t = fh,t + ξh,t

                               Table 1. Representation of rules


improve the models capability to capture the pattern of rules from knowledge
graphs. Definition and the formulation of used rules are defined in Table 1 [8]. We
denote the models score function fh,t (in table1) discussed in 4.1, as L. Hence,
6        M.M. Alam et al.

it is further possible to obtain the scores, for each of the premise and conclusion
represented by the definition of rules.
     We call these scores as output1 and output2 . output2 is always bigger than
output1 (for TransE and RotatE) based on the formulation. Thus, it can act as
a clear baseline for the grounding loss. Our grounding loss can be depicted in
the equation GL = φ(L(output2 ) − L(output1 ) + ξ), where φ refers to the output
function of the injected model and ξ is an additional constant value. Here, L can
be stated as the abstraction of KGE model’s (e.g., RotatE) score function.
5    Evaluation
Hyperparameter settings. A series of hyperparameters are used to achieve
the result of table 2. For FB15K the learning rate α is 0.1, the rule loss multiplier
ξmp is 0.05. The embedding dimension D is 200, batch size β is 2750, the rule
loss multipliers {implication = 0, inverse = 0.1, symmetric = 0.01, equivalence
= 0.05}, number of negative sample N is 10, γ is 24. For WN18 everything is
the same except the rule loss multipliers and the margin γ. Since for WN18 we
have only inverse and symmetric groundings, the multipliers are {inverse = 7,
symmetric = 0.5}. The γ is kept at 10.
Results. In this section, we present the results of our evaluations. Selected
comparison criteria are followed by the best practices of evaluation in embedding
models. We consider Mean Rank (MR) and Hits@10 as the main criteria to
evaluate the models [1]. We report the results in Raw and Filtered settings [1].
    For both of these criteria, we consider looking into the details of the behaviour
of KGE models with or without injection of rules. These are all evaluated on two
sets of the data: raw and filtered as discussed in section 3. In table 2, we present
the results of our evaluation on FB15k and WN18. The results show that rule
injection has a positive influence in the result. In case of RotatE and TransE
rule injection yields better result. It is clear that for FB15k dataset RotatE has
a significant performance increment in both raw and filtered settings. I.e. the
Hits@10 has increased from 0.85 to 0.88.
                                               FB15K
Models                    Standard                                    Leakage Free
         Without injection        With injection      Without injection          With injection
          Raw       Filtered     Raw       Filtered    Raw       Filtered       Raw        Filtered
       MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10
RotatE 190 0.5247 38    0.85 150 0.5705 35 0.8791 276 0.4355 113 0.6453 271 0.4354 108 0.6410
TransE 204 0.4605 68 0.6268 156 0.5199 51 0.6998 270 0.4304 123 0.5677 271 0.4251 125 0.5624
                                                WN18
Models                    Standard                                    Leakage Free
         Without injection        With injection      Without injection          With injection
          Raw       Filtered     Raw       Filtered    Raw       Filtered       Raw        Filtered
       MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR Hits@10
RotatE 295 0.8171 281 0.9571 212 0.9262 209 0.9565 4077 0.3406 4058 0.3884 3034 0.3464 3015 0.3783
TransE 219 0.8102 207 0.9502 192 0.8978 190 0.9281 2991 0.2667 2972 0.3072 2429 0.2130 2412 0.2333
             Table 2. Effect of rule injection on FB15k and WN18 test set


   For TransE the Hits@10 has also been increased from 0.63 to 0.70 for FB15k
dataset if we inject rules. Though it can be seen from the table that for WN18
dataset, the result does not improve upon injecting rules in filtered settings but
upon injection of rules there is significant improvement in raw settings. I.e. the
                     The Effect of Rule Injection in a Leakage Free Datasets       7

Hits@10 has been improved from 0.81 to 0.90 for raw settings. However, if we
refine the test set by removing the leakage, the result in every case is reduced by
a significant number. Even the injection of the rules does not help in this regard.
As seen in the table 2 the results are still poor even after injection.

6   Conclusion
In this work, we showcased the use of logical rules in extraction and injection
of knowledge from knowledge graphs into embedding models. One of the main
contributions of our work is that we removed leakage from testing set of FB15K,
therefore, we have many inverse patterns in training dataset of the underlying
KG, but not the leakage in testing. We mainly focused on the two main mod-
els namely: a) TransE (baseline model), and b) RotatE (current state-of-the-art
model) designed for encoding the patterns. This study is a prototype of ap-
proving the role of rules in improving the learning process and performance of
KGE models. As future work, we will extend the pipeline as a comprehensive
framework to cover most of the other embedding models, and more datasets
(FB15k-237, WN18RR) and real KGs. We also consider to automatize the rule
extraction and injection which is currently done externally using AMIE with
post-processing steps in grounding the rules. We also plan to train RUGE on
these KGs that we have created and compare the results.


References

 1. A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translat-
    ing embeddings for modeling multi-relational data. In Advances in neural infor-
    mation processing systems, pages 2787–2795, 2013.
 2. L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. Fast rule mining in
    ontological knowledge bases with AMIE+. The VLDB Journal – The International
    Journal on Very Large Data Bases, 24(6):707–730, 2015.
 3. L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule
    mining under incomplete evidence in ontological knowledge bases. In WWW, pages
    413–422, 2013.
 4. S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo. Jointly embedding knowledge
    graphs and logical rules. In Proceedings of the 2016 Conference on Empirical
    Methods in Natural Language Processing, pages 192–202, 2016.
 5. S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo. Knowledge graph embedding
    with iterative guidance from soft rules. In Thirty-Second AAAI Conference on
    Artificial Intelligence, 2018.
 6. S. M. Kazemi and D. Poole. Simple embedding for link prediction in knowledge
    graphs. In Advances in neural information processing systems, pages 4284–4295,
    2018.
 7. R. D. King, A. Srinivasan, and L. Dehaspe. Warmr: a data mining tool for chemical
    data. Journal of Computer-Aided Molecular Design, 15(2):173–181, 2001.
 8. M. Nayyeri, C. Xu, J. Lehmann, and H. S. Yazdi. Logicenn: A neural based knowl-
    edge graphs embedding model with logical rules. arXiv preprint arXiv:1908.07141,
    2019.
8       M.M. Alam et al.

 9. M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning
    on multi-relational data. In Icml, volume 11, pages 809–816, 2011.
10. P. G. Omran, K. Wang, and Z. Wang. Scalable rule learning via learning repre-
    sentation. In IJCAI, pages 2149–2155, 2018.
11. Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang. Rotate: Knowledge graph embedding
    by relational rotation in complex space. arXiv preprint arXiv:1902.10197, 2019.
12. T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embed-
    dings for simple link prediction. International Conference on Machine Learning
    (ICML), 2016.
13. W. Zhang, B. Paudel, L. Wang, J. Chen, H. Zhu, W. Zhang, A. Bernstein, and
    H. Chen. Iteratively learning embeddings and rules for knowledge graph reasoning.
    In The World Wide Web Conference, pages 2366–2377, 2019.

</pre>