<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Is the Proof Length a Good Indicator of Hardness for Reason-able Embeddings?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jedrzej Potoniec</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computing Science, Poznan University of Technology</institution>
          ,
          <addr-line>ul. Piotrowo 2, 60-965 Poznan</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Reason-able embeddings are recently proposed embeddings for knowledge bases (KBs) in the description logic ℒ capable of casting multiple KBs into a single latent space using a transferable neural reasoner. While they exhibit remarkable performance on real-world KBs, it is so far unknown what are their exact limits. In this paper, we systematically investigate their performance using a set of synthetic KBs in the description logic ℰℒ, a subset of ℒ. We use the proof length as a measure of reasoning complexity and present a random KB generator taking the proof length into account. We train the reason-able embeddings with and without transfer learning and investigate whether the complexity of the training set and the test set is related to the reasoning performance of the embeddings and their neural reasoner.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;description logics</kwd>
        <kwd>reason-able embeddings</kwd>
        <kwd>transfer learning</kwd>
        <kwd>neural-symbolic reasoning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The past 10 years brought numerous solutions for embedding concepts from knowledge bases
(KBs) into high-dimensional latent spaces, from a relatively simple TransE to complex
architectures such as recurrent transformers [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ]. However, with a few notable exceptions
such as RDF2Vec [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], OWL2Vec* [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and TransOWL [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the focus was on large KBs with
shallow semantics. On the other side, there were approaches to replace deductive approaches
traditionally used to perform inference from logical KBs, with neural reasoners based on the
recent advancements in deep learning, e.g., [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
        ].
      </p>
      <p>
        Recently proposed reason-able embeddings are a combination of these two trends, ofering
learnable embeddings for KBs with complex semantics and a transferable, neural reasoner [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
In experimental evaluation with a set of real-world KBs, they exhibited remarkable performance,
yet there remains a suspicion that due to the complexity of the description logic (DL) ℒ
they target, they cannot address all the aspects of ℒ. In this paper, we aim to tackle this
problem, by constructing a set of artificial KBs with measurable complexity and assessing the
performance of reason-able embeddings as a function of the complexity of their input. Our goal
is to answer the following research questions:
RQ1 Does the complexity of the training set influence the reasoning performance on the test
set derived from the same KBs?
RQ2 Does the complexity of the training set of the embeddings inuflence the reasoning
performance on previously unseen KBs?
RQ3 What is the interplay between the complexity of the training set for the reasoner, the
complexity of the training set for the embeddings, and the reasoning performance on
previously unseen KBs?
      </p>
      <p>To facilitate reproducibility we publish the experimental code in Python in a GitHub repository
at https://github.com/jpotoniec/reasonable-embeddings/tree/main/src/elpp.</p>
      <p>
        The contributions of the paper are as follows:
• A way to measure reasoning complexity by means of the proof length, described in
subsection 2.2. It exploits a preexisting reasoning algorithm by Baader et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
• A generator of synthetic, dificult knowledge bases, described in subsection 2.4.
• A method to split training and test sets according to the reasoning complexity, described
in subsection 2.5.
      </p>
      <p>• Experimental evaluation and analysis presented in section 3.</p>
      <p>Neither the description logic ℰ ℒ (presented in subsection 2.1) nor the reason-able embeddings
(described in subsection 2.3) are contributions of this work, and they are presented here only to
make the paper self-contained.</p>
      <p>The rest of the paper is organized as follows: subsection 2.1 introduces the DL ℰ ℒ,
subsection 2.2 explains how we measure the complexity, the reason-able embeddings are introduced in
subsection 2.3, we describe the KB generator in subsection 2.4, and the details of the employed
dataset in subsection 2.5. In subsections of section 3, we report the answers to the posed research
questions. We discuss the results and conclude in section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and methods</title>
      <p>
        2.1. The description logic ℰ ℒ
Description logics (DLs) are a family of logics developed for their desirable computational
properties, to be used in computer systems requiring formal semantics and reasoning capabilities.
In this paper, we concentrate on the DL ℰ ℒ, one of the simplest of practical use [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Concepts in ℰ ℒ are constructed inductively from a set of constructors, a set of atomic concepts
 , and a set of atomic roles . Their semantics is defined in terms of an interpretation
ℐ = (Δℐ , · ℐ ), where Δℐ is a non-empty domain, and · ℐ is the interpretation function, mapping
every atomic concept  to ℐ ⊆ Δℐ , and every atomic role  to ℐ ⊆ Δℐ × Δℐ . Since ℰ ℒ has
formal semantics, it is possible to extend · ℐ to arbitrary concepts, summarized in Table 1.</p>
      <p>A ℰ ℒ KB consists of a set of subsumption axioms  ⊑ , and we say that ℐ is a model of the
KB if for every such axiom ℐ ⊆ ℐ . In the DLs, one of the core problems is the subsumption
problem, i.e., given two concepts , , and a KB , deciding whether a subsumption  ⊑ 
holds in every model of the KB, denoted  |=  ⊑ . One of the interesting properties of
ℰ ℒ is that there exists a polynomial algorithm for solving the subsumption problem.
Syntax and semantics of ℰℒ. ,  denote arbitrary concepts and  an arbitrary role.
{ ∈ Δℐ : ∃ ∈ Δℐ (, ) ∈ ℐ ∧  ∈ ℐ}
to a KB in normal form in polynomial time.</p>
      <sec id="sec-2-1">
        <title>2.2. Measuring the proof length</title>
        <p>In this paper, we call a subsumption  ⊑  an atomic subsumption if ,  ∈  ∪ {⊥, ⊤}.
If  |=  ⊑ , we say the subsumption follows from the KB.</p>
        <p>
          Throughout this work, we are interested in KBs in the normal form, where the axioms
must be of one of the following forms:  ⊑ ,  ⊓  ⊑ ,  ⊑ ∃., ∃. ⊑ , where
,  ∈  ∪ {⊤} and  ∈  ∪ {⊤, ⊥}. It was shown that any ℰ ℒ KB can be transformed
Since we consider only KBs in the normal form, to solve the subsumption problem, we can apply
classification
a rule-based algorithm presented in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for ℰ ℒ++, a superset of ℰ ℒ. The algorithm performs
of a KB, i.e., it computes all atomic subsumptions following from the KB. We refer
to this set of atomic subsumption as logical consequences of the KB1.
        </p>
        <p>We extend the algorithm to keep track, for every atomic subsumption, of the shortest proof,
i.e, a set of rules and their parameters that must be executed in order to generate this particular
conclusion. The algorithm starts by creating a set () = {(, ∅), (⊤, ∅)} for every  ∈
 ∪ {⊤, ⊥}, and a set () = ∅ for every  ∈ . Then, rules presented in Table 2 are
executed to update the sets () and () until a fixpoint is not reached, i.e., there are changes
to () or to (). After the algorithm terminates, () represents the subsumption hierarchy:
if (,  ) ∈ (), it means that  |=  ⊑  and at least | | rule executions are necessary
to prove it. We call | | the proof length of  ⊑  and use it as a complexity measure of atomic
subsumptions throughout the paper. It must be noted that this measure is defined only for
subsumptions that follow from a KB and that it measures the length of the shortest proof, not
any proof. However, since it is supposed to measure reasoning complexity, we believe the
shortest proof is the right choice, as it denotes the minimal number of reasoning steps one
must make in order to prove a given subsumption, and all the rules seem to be of comparable
complexity.</p>
        <p>As an example, consider the following KB:  = {

⊑ , 
⊑ , 
⊓ 
⊑ , 
⊑
∃.,
∃.
⊑ ,</p>
        <p>⊑ ⊥}. Unconventionally, we use Greek letters to clearly distinguish concept
names present in the KB from the variable names used in the rules. We will consider a few steps
of the algorithm, and we chose one possible order of execution for ease of explanation. First,
{(, ∅), (⊤, ∅)}, ( ) = {(, ∅), (⊤, ∅)}, () = ∅. Next, the rules are executed:
sets (· ) and (· ) are initialized: ( ) = {(, ∅), (⊤, ∅)}, ( ) = {(, ∅), (⊤, ∅)}, ( ) =
1The term usually denotes a much broader set. However, we concentrate only on atomic subsumptions in the paper
and thus can hack the term without introducing ambiguity.</p>
        <p>CR1 If (′, ′ ) ∈ (), ′ ⊑  ∈ , (, · ) ̸∈ () or (, ) ∈ () and |  | &lt; ||, then
() ←</p>
        <p>()∖{(, )} ∪ {(,   )}, where   = ′ ∪ {(CR1, , ′, )}
If ′ is known to be a superclass of , it is asserted in the KB that ′ ⊑ , and either  is not
known to be a superclass of  or it is known, but the proof is longer than can be achieved by
using this rule, then add/replace the existing proof with the new proof   .</p>
        <p>CR2 If (1, 1 ), (2, 2 ) ∈ (), 1 ⊓ 2 ⊑  ∈ , (, · ) ̸∈ () or (, ) ∈ ()
()∖{(, )} ∪ {(,   )}, where   = 1 ∪ 2 ∪
′ ∪ {(CR4, , , , ′, )}
() and |  | &lt; | |, then () ←</p>
        <p>()∖{(,  )} ∪ {(,   )}, where   =  ∪
from the KB. Each rule is first presented using formal symbols, and then a corresponding textual
description is given. (, · ) ̸∈  is an abbreviation for ∀ (, ) ̸∈ .</p>
        <p>{(, ∅), (⊤, ∅), (, {(CR1, , , 
2. CR1 for 
=
′</p>
        <p>=  , 
isfied:
(, ∅
)
∈
( ), 
) )
} }
⊑</p>
        <p>proof can thus be computed as  
{(, ∅), (⊤, ∅), (, {(CR1, , , 
3. CR2 for 
= ,  1
= ,  2</p>
        <p>= , 
{(CR1, , , 
( ), 
⊓ 
) .
}</p>
        <p>The conditions are satisfied:
⊑

∈
, (, · )</p>
        <p≯∈
If  is known to be a subclass of ∃., ′ is known to be a superclass of , it is asserted in the
KB that ∃.′ ⊑ , and either  is not known to be a superclass of  or it is known, but the
proof is longer than can be achieved by using this rule, then add/replace the existing proof with
the new proof   .</p>
        <p>CR5 If ((, ), ) ∈ (), (⊥, ) ∈ (), (⊥, · ) ̸∈ () or (⊥,  ) ∈ () and |  | &lt; | |,
then () ←
()∖{(⊥,  )} ∪ {</p>
        <p>(⊥,   )}, where   =  ∪  ∪ {(CR5, , , )}
If  is known to be a subclass of ∃.,  is known to be an unsatisfiable concept, and either 
is not known to be an unsatisfiable concept or it is known, but the proof is longer than can be
achieved by using this rule, then add/replace the existing proof with the new proof   .
1. CR1 for 
= ′ =  , 
=  and ′
=
. The conditions are satisfied:
(, ∅</p>
        <p>) ∈ ( ), 
computed as   =
⊑ 
∅ ∪ {(CR1, , , 
∈  and (, ·</p>
        <p>) ̸∈ ( ). The new proof can thus be
)} and included in the set ( ): ( ) =
{(CR2, , 1, 2, )}
and |  | &lt; ||, then () ←
{(3, , ′, , )}
|  | &lt; ||, then () ←
If it is known that 1 and 2 are superclasses of , it is asserted in the KB that 1 ⊓ 2 ⊑ ,
and either  is not known to be a superclass of  or it is known, but the proof is longer than can
be achieved by using this rule, then add/replace the existing proof with the new proof   .
CR3 If (′, ′ ) ∈ (), ′ ⊑ ∃. ∈ , ((, ), · ) ̸∈ () or ((, ), ) ∈ () and
()∖{((, ), )} ∪ {((, ),   )}, where   = ′ ∪
by using this rule, then add/replace the existing proof with the new proof   .</p>
        <p>If ′ is known to be a superclass of , it is asserted in the KB that ′ ⊑ ∃., and either  is
not known to be a subclass of ∃. or it is known, but the proof is longer than can be achieved
CR4 If ((, ), ) ∈ (), (′, ′ ) ∈ (), ∃.′ ⊑  ∈ , (, · ) ̸∈ () or (,  ) ∈
)}), (, {(CR1, , , 
) )
} }
=  and ′
∅
.</p>
        <p>The conditions are
sat∈
=</p>
        <p>and (, · )
∅ ∪ {(CR1, , , 
̸∈
( ).</p>
        <p>The new
)
} and thus: ( ) =
∅
=
= ,  1
=
{(CR1, , ,</p>
        <p>)}, 2
(,  1 )
∈</p>
        <p>( )), (,  2 )
().</p>
        <p>The new proof can thus
=
∈
be computed as   = 1 ∪ 2 ∪ {(CR2, , , ,  )}, and we arrive at:
( ) = {(, ∅), (⊤, ∅), (, {(CR1, , ,  )}), (, {(CR1, , ,  )}), (, {(CR1, , ,  ),
(CR1, , ,  ), (CR2, , , ,  )})}
4. CR3 for  = ′ = ,  = ,  ′ = ∅. The conditions are satisfied: (, ∅) ∈ ( ),
 ⊑ ∃. ∈ , ((,  ), · ) ̸∈ (). The new proof can thus be computed as   =
{(3, , , ,  )} and () can be updated: () = {((,  ), {(3, , , ,  )})}.
5. CR4 for  = ,  = ,  ′ = ,  = ,   = {(3, , , ,  )}, ′ =
∅,  = {(CR1, , ,  ), (CR1, , ,  ), (CR2, , , ,  )}. The conditions are satisfied:
((,  ), ) ∈ (), (, ∅) ∈ ( ), ∃. ⊑  ∈ , and (,   ) ∈ ( ). This
application of a rule is diferent from previous ones: there already is a proof (  ), and it must
be determined whether it can be replaced. The new proof is thus computed as   =
{(3, , , ,  )}∪∅∪{(4, , , , ,  )}, and we conclude it is shorter than the
preexisting proof, since | | = 3 &gt; |  | = 2. We thus proceed with updating ( ), which
now becomes: ( ) = {(, ∅), (⊤, ∅), (, {(CR1, , ,  )}), (, {(CR1, , ,  )}),
(, {(3, , , ,  ), (4, , , , ,  )})}
6. CR1 for  = ′ = ,  = ⊥, ′ = ∅. The conditions are satisfied: (, ∅) ∈
( ),  ⊑ ⊥ ∈  and (⊥, · ) ̸∈ ( ). The new proof can thus be
computed as   = ∅ ∪ {(CR1, , , ⊥)} and included in the set ( ): ( ) =
{(, ∅), (⊤, ∅), (⊥, {(CR1, , , ⊥)})}.
7. CR5 for  = ,  = ,   = {(3, , , ,  )},  = {(CR1, , , ⊥)}. The
conditions are satisfied: ((,  ), ) ∈ () and (⊥, ) ∈ ( ). The new proof can
thus be computed as   = {(3, , , ,  )} ∪ {(CR1, , , ⊥)} ∪ {(5, , ,  )}
and ( ) can be extended with (⊥,   ).</p>
        <p>The presented example covers all the proposed rules and the crucial part, namely, replacing
longer proofs with shorter ones. While it does not present the entirety of the execution for the
given KB, nor the termination condition by reaching a fixpoint, we can still read proof lengths
(albeit - since the algorithm did not terminate - not necessarily the shortest), as seen in item 5.</p>
        <p>
          A similar idea is present in the research on DLs under the term justifications , which
concentrates on computing a minimal subset of KB for a subsumption to follow from it [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The
proofs we compute are more complex, e.g., because the rule CR5 does not reference any KB
axiom, yet is included in proofs. The purpose is also diferent: justifications are meant as means
of explanations for humans, whereas here proofs are an internal artifact to assess complexity.
        </p>
        <p>
          Exploiting the properties of rule-based reasoning for lightweight DLs was also used in
the context of finding unexplainable triples in RDF graphs [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] or in the work on emulating
deductive reasoning with recurrent neural networks [17].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Reason-able embeddings</title>
        <p>
          Reason-able embeddings are a recently proposed framework to compute embedding vectors
for concepts in the ℒ DL [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. They have several interesting properties that diferentiate
them from pre-existing embeddings. First, they are tailored to an expressive DL, instead of
to a simple semantics of a knowledge graph. Second, the framework consists of two major
components: the embeddings themselves which must be trained for each KB separately, and a
neural reasoner, which is shared between multiple KBs, and once trained can be reused in a
transfer learning manner for reasoning on KBs not available during its training. Third, they are
capable of computing an embedding of a complex expression given the embeddings of its parts.
        </p>
        <p>The neural reasoner consists of two parts: a reasoner head, which is a binary classifier that,
given two concept embeddings from the same KB, answers whether one of the concepts is a
subclass of the other; and a set of neural operators, which correspond to the logical operators
of ℒ and can, e.g., compute the embedding of an intersection of two concepts given the
embeddings of those concepts.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], a two-step training procedure was proposed: first, a neural reasoner and embeddings
are trained jointly on a set of KBs. Multiple KBs are employed during this step to ensure the
reasoner is not tailored to any particular KB but rather is capable of shaping the latent space of
embeddings such that it can accommodate multiple KBs. In the second step, the reasoner is
frozen and the embeddings for the KB(s) of interest are trained by backpropagation through the
frozen reasoner.
        </p>
        <p>Both training and testing a neural reasoner require subsumptions along with a binary label
denoting whether a subsumption follows from the corresponding KB or not. We call such a
pair an example. Since the label is contextual w.r.t. a KB, it is necessary to keep track of which
example is related to which KB. However, this information is not a part of an example and is
not presented to the neural reasoner during training. The subsumptions themselves may be
synthetic, as long as the label associated with them is correct.</p>
        <p>A two-step procedure requires two-level separation into training and test set: first, there is a
training set of KBs used to jointly train the reasoner and the embeddings, and there is a test set
of KBs used to assess the quality of the trained, frozen reasoner. However, embeddings for the
KBs in the test must somehow be trained. Thus, for each KB, be it from the training set or from
the test set, a training set and a test set of examples must be constructed. The training set is
used to train embeddings for the KB, while the test set is used to evaluate them.</p>
        <p>
          There are numerous hyperparameters that could be tuned in the framework. In this paper,
we follow the setup from [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]: the latent space of embeddings has a dimension of 10, there are
16 neurons in the hidden layer of the reasoner head, and the training takes 15 epochs.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Synthetic knowledge bases generator</title>
        <p>To generate a random KB, we start by defining an approach to generate a random axiom in
the context of a KB . First, we select with uniform probability one of the following forms
for the axiom:  ⊑ ,  ⊓  ⊑ ,  ⊑ ∃., ∃. ⊑ . Then, we populate the form with
concrete values by selecting uniformly at random: , ,  from  ,  from , ,  from
 ∪ {⊥, ⊤}.</p>
        <p>To construct dificult KBs, i.e., KBs such that some of the atomic subsumptions following
from them have long proofs, we have devised the following algorithm. First, an empty KB is
initialized with  concepts and  roles. Then, we repeat at most  times: an axiom not yet
present in the KB is randomly generated and added to the KB, then the logical consequences
with the proof length of at least  are counted, and if there are at least , we terminate
and return the KB. If we exceed  without reaching , the process is restarted from an
empty KB.</p>
        <p>
          In the paper we fixed  = 100,  = 4,  = 300,  = 4 and  = 200. The choice
of  and  was guided by the original work on reason-able embeddings [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], the others were
chosen arbitrarily, as they seemed to represent a good trade-of between the generation time
and the complexity of the final KBs. We use two sets of synthetic KBs: a training set of 40 KBs
 1, . . . ,  40, and a test set of 20 KBs  1, . . . ,  20. We report detailed statistics on them in
Table 3.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.5. Training and test set split</title>
        <p>For each KB, we generate several training – test splits using the following procedure. Let ≤ 
be the set of all atomic subsumptions following from a KB  such that their proofs are no
longer than . Conversely, let &gt; be the set of all atomic subsumptions following from the KB
with the proof length greater than . Let ˜ be the set of all atomic subsumptions  ⊑  that
uses the vocabulary of the KB, do not follow from the KB nor the atomic subsumption  ⊑ 
follow from the KB:</p>
        <p>˜ = { ⊑  : ,  ∈  ∪ {⊥, ⊤} ∧  ̸|=  ⊑  ∧  ̸|=  ⊑ }
This final requirement on ˜ is introduced to ensure that knowledge from ˜ cannot be trivially
exploited for information about &gt; . Since both  and ˜ consist of atomic subsumptions, they
cannot be separated syntactically and some semantic information from the labels must be
exploited by the neural reasoner.</p>
        <p>The -th training set for  consists of the whole set ≤  , labeled with the positive label,
and a random subset of ˜ counting ⃒⃒ ≤  ⃒⃒ to achieve a balanced training set and labeled with the
negative label. The -th test set for  is equal to &gt; labeled with the positive label. There are
no negative examples in the test set, as we have no complexity measure for subsumptions that do
not follow from a KB, and thus the answers of a neural reasoner would ofer no insight into the
RQs. We thus use recall, i.e., the number of examples labeled as positive by the neural reasoner
to the number of positive examples in the test set, as the measure to assess the performance.</p>
        <p>Since we have two levels of training-test splits, there is a tremendous possibility for confusion.
We thus introduce the following symbols and names:  ≤  (resp.  ≤  ) denotes the -th
training set for the KB   (resp.  ), and  &gt; (resp.  &gt; ) denotes the -th test set for the
=1  ≤  , and the -th embeddings
KB   (resp.  ). The -th joint training set is  ≤  = ⋃︀40
training set is  ≤  = ⋃︀2=0 1  ≤  . We use the name joint to underline that both embeddings
and the neural reasoner are trained jointly using these training sets; conversely, the embeddings
training sets are used to train embeddings while the neural reasoner is already trained and frozen.
The -th internal test set is  &gt; = ⋃︀4=0 1  &gt; , and the -th external test set  &gt; = ⋃︀2=0 1  &gt; .
Here, internal underlines that the neural reasoner was trained with these KBs, whereas external
denotes KBs that did not participate in the training of the neural reasoner.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental results</title>
      <sec id="sec-3-1">
        <title>3.1. The influence of complexity of the joint training set on the performance on the internal test set</title>
        <p>To answer RQ1, we employed the joint training sets  ≤  and their respective internal test
sets  &gt; for  ∈ {2, 3, . . . , 20}. For each  ≤  , we trained a neural reasoner using the
training procedure described earlier, and tested it using  &gt; , arriving at the average recall
0.701 ± 0.044 [0.587, 0.764]. We then computed the Pearson correlation coeficient  between
 and the recall, arriving at  = 0.931 and the -value &lt; 10− 5 (0 :  = 0, using the exact
probability distribution as computed by SciPy2).</p>
        <p>To ensure this is not due to using diferent test sets for each neural reasoner (since  &gt; ⊇
 &gt;+1), we also tested all the neural reasoners on the common test set  &gt;20, arriving at
 = 0.920 and the -value &lt; 10− 5.</p>
        <p>We have decided not to use more complicated approaches for measuring the influence since
the Pearson correlation coeficients are high enough (and the p-values low enough) to give us
reasonable confidence that the answer to RQ1 is positive and the complexity of the training set
is an indicator of the performance on the test set even when transfer learning is not used.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The influence of complexity of the embeddings training set on the performance on the external test set</title>
        <p>To answer RQ2, we used the neural reasoner trained with  ≤ 20, i.e., presumably the best
one. We froze it and used it to train embeddings using the embeddings test sets  ≤ 
for  ∈ {2, 3, . . . , 20}. Then, we tested the neural reasoner and the newly trained
embeddings on the respective external test sets  &gt;. Averaging the recall, we arrived at
0.610 ± 0.094 [0.387, 0.699]. The Pearson correlation coeficient between  and recall is
 = 0.913, with the -value &lt; 10− 5 (the same test as for RQ1). Similarly to the previous
experiment, we also tested all the embeddings on the common subset  &gt;20, arriving at  = 0.982
with the -value &lt; 10− 5.
2https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html
embeddings training set is a good indicator of the reasoning performance on the test set when
transfer learning is employed.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. The interplay between the complexity of the joint training set, the complexity of the embeddings training set, and the performance on the external test set</title>
        <p>After seeing RQ1 and RQ2, a natural question about the interplay of the complexity  of the
joint training set and the complexity  of the embeddings training set arises. To answer RQ3,
we have computed recall for all combinations of  ∈ {2, 3, . . . , 20} and  ∈ {2, 3, . . . , 20}, by
ifrst training the neural reasoner using  ≤  , then training embeddings on  ≤ , and finally
testing them using  &gt;. We report the results in Figure 1 and observe that they are somewhat
counterintuitive. It turns out that for fixed , the higher  the worse the recall. Conversely, for
ifxed , the higher  the better the recall.</p>
        <p>We hypothesize that, while using only atomic subsumptions, the reasoning capabilities of
the neural reasoner are not necessary and the most important part is the interplay of diferent
dimensions in the latent space of embeddings. Thus, on the one hand, higher values of  cause
the embeddings to be more attuned to the properties of the latent space, whereas the lower
values of  cause the reasoner to shape the latent space less strictly.</p>
        <p>We note in passing that, following [18], we have conducted the Friedman test assuming
every KB is a separate dataset, and every pair (, ) constitutes a separate classifier. The test
indicated there are diferences between classifiers (the
-value &lt; 10− 5). However, deciding
which pair-wise diferences are significant would require making
and thus it would be extremely hard to achieve reliable results.
︀( 192)︀ = 64, 980 comparisons
2</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and conclusions</title>
      <p>In the paper, we have presented an approach to generate non-trivial KBs in the ℰ ℒ DL and to
measure the complexity of their logical conclusions. We then used these results to evaluate the
influence of the complexity on the performance of reason-able embeddings, a recently proposed
framework ofering a transferable neural reasoner for deciding the subsumption problem in the
ℒ DL.</p>
      <p>We have shown that within the assumed framework the complexity of the training set is a
good performance indicator in scenarios both employing and not employing transfer learning.
Moreover, the experimental results indicate that the complexity of the embeddings training set
is much more important than the complexity of the joint training set, and the latter can even
influence the performance negatively, possibly because it constrains the latent space too tightly.</p>
      <p>The study has numerous limitations: we have concentrated on a very simple DL ℰ ℒ, we
considered only atomic subsumptions in the training and test sets, and measured the complexity
of only subsumptions following from the KBs. The first assumption limits the generalizability
of this study, but since ℰ ℒ is a proper subset of ℒ, one can suspect that the results on ℒ
KBs will be no better than on ℰ ℒ KBs. Concentrating on atomic subsumptions only efectively
means we have ignored neural operators, one of the most interesting parts of the reason-able
embeddings. The third assumption means we were only able to assess the incompleteness of the
neural reasoner, i.e., its tendency to reject subsumptions that in fact follow from the KB. Due to
these assumptions, this is only the first step to properly evaluating the scope of usefulness and
applicability of the reason-able embeddings.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>This research was partially supported by 0311/SBAD/0726 and by TAILOR, a project funded by
EU Horizon 2020 research and innovation programme under GA No 952215.
[17] M. Ebrahimi, A. Eberhart, F. Bianchi, P. Hitzler, Towards bridging the neuro-symbolic gap:
deep deductive reasoners, Appl. Intell. 51 (2021) 6326–6348. URL: https://doi.org/10.1007/
s10489-020-02165-6. doi:10.1007/s10489-020-02165-6.
[18] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn.</p>
      <p>Res. 7 (2006) 1–30. URL: http://jmlr.org/papers/v7/demsar06a.html.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Luthra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>A survey of knowledge graph embedding and their applications</article-title>
          ,
          <source>CoRR abs/2107</source>
          .07842 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2107.07842. arXiv:
          <volume>2107</volume>
          .
          <fpage>07842</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>29</volume>
          (
          <year>2017</year>
          )
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          . URL: https://doi.org/ 10.1109/TKDE.
          <year>2017</year>
          .
          <volume>2754499</volume>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2017</year>
          .
          <volume>2754499</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>A survey on knowledge graph embedding: Approaches, applications</article-title>
          and benchmarks,
          <source>Electronics</source>
          <volume>9</volume>
          (
          <year>2020</year>
          )
          <article-title>750</article-title>
          . URL: https://doi.org/10. 3390/electronics9050750. doi:
          <volume>10</volume>
          .3390/electronics9050750.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of graph embedding: Problems, techniques, and applications</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>30</volume>
          (
          <year>2018</year>
          )
          <fpage>1616</fpage>
          -
          <lpage>1637</lpage>
          . URL: https://doi.org/10.1109/TKDE.
          <year>2018</year>
          .
          <volume>2807452</volume>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2018</year>
          .
          <volume>2807452</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey on knowledge graph embeddings for link prediction</article-title>
          ,
          <source>Symmetry</source>
          <volume>13</volume>
          (
          <year>2021</year>
          ). URL: https://www.mdpi.com/2073-8994/13/3/485. doi:
          <volume>10</volume>
          .3390/ sym13030485.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Noia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Leone</surname>
          </string-name>
          , H. Paulheim,
          <article-title>Rdf2vec: RDF graph embeddings and their applications</article-title>
          ,
          <source>Semantic Web</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>721</fpage>
          -
          <lpage>752</lpage>
          . URL: https://doi.org/10.3233/ SW-180317. doi:
          <volume>10</volume>
          .3233/SW-180317.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Holter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antonyrajah</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Owl2vec*: embedding of OWL ontologies, Mach</article-title>
          . Learn.
          <volume>110</volume>
          (
          <year>2021</year>
          )
          <fpage>1813</fpage>
          -
          <lpage>1845</lpage>
          . URL: https://doi.org/ 10.1007/s10994-021-05997-6. doi:
          <volume>10</volume>
          .1007/s10994-021-05997-6.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>C. d'Amato</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <string-name>
            <surname>Quatraro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Fanizzi</surname>
          </string-name>
          ,
          <article-title>Injecting background knowledge into embedding models for predictive tasks on knowledge graphs</article-title>
          , in: R.
          <string-name>
            <surname>Verborgh</surname>
          </string-name>
          , et al. (Eds.),
          <source>ESWC</source>
          <year>2021</year>
          , volume
          <volume>12731</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>441</fpage>
          -
          <lpage>457</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -77385-4_
          <fpage>26</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -77385-4\_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Makni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <article-title>Deep learning for noise-tolerant RDFS reasoning</article-title>
          ,
          <source>Semantic Web</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>823</fpage>
          -
          <lpage>862</lpage>
          . URL: https://doi.org/10.3233/SW-190363. doi:
          <volume>10</volume>
          .3233/SW-190363.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hohenecker</surname>
          </string-name>
          , T. Lukasiewicz,
          <article-title>Ontology reasoning with deep neural networks</article-title>
          ,
          <source>J. Artif. Intell. Res</source>
          .
          <volume>68</volume>
          (
          <year>2020</year>
          )
          <fpage>503</fpage>
          -
          <lpage>540</lpage>
          . URL: https://doi.org/10.1613/jair.1.11661. doi:
          <volume>10</volume>
          .1613/ jair.1.11661.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>On the capabilities of logic tensor networks for deductive reasoning</article-title>
          , in: A.
          <string-name>
            <surname>Martin</surname>
          </string-name>
          , et al. (Eds.),
          <source>AAAI-MAKE</source>
          <year>2019</year>
          , volume
          <volume>2350</volume>
          <source>of CEUR Workshop Proc</source>
          .,
          <year>2019</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2350</volume>
          /paper22.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Doran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic deductive reasoning for cross-knowledge graph entailment</article-title>
          , in: A.
          <string-name>
            <surname>Martin</surname>
          </string-name>
          , et al. (Eds.),
          <source>AAAI-MAKE</source>
          <year>2021</year>
          , volume
          <volume>2846</volume>
          <source>of CEUR Workshop Proc</source>
          .,
          <year>2021</year>
          . URL: http: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2846</volume>
          /paper8.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>D. M. Adamski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Potoniec</surname>
          </string-name>
          ,
          <article-title>Reason-able embeddings: Learning concept embeddings with a transferable neural reasoner</article-title>
          , Semantic
          <string-name>
            <surname>Web</surname>
          </string-name>
          (
          <year>2023</year>
          ). URL: https://www.semantic
          <article-title>-web-journal.net/content/ reason-able-embeddings-learning-concept-embeddings-transferable-neural-reasoner, under review</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Lutz, Pushing the EL envelope</article-title>
          , in: L. P.
          <string-name>
            <surname>Kaelbling</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Safiotti (Eds.),
          <source>IJCAI-05</source>
          , Professional Book Center,
          <year>2005</year>
          , pp.
          <fpage>364</fpage>
          -
          <lpage>369</lpage>
          . URL: http://ijcai.org/Proceedings/ 05/Papers/0372.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          ,
          <article-title>From justifications towards proofs for ontology engineering</article-title>
          , in: F.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Sattler</surname>
          </string-name>
          , M. Truszczynski (Eds.),
          <source>Principles of Knowledge Representation and Reasoning: Proceedings of the Twelfth International Conference, KR 2010</source>
          , Toronto, Ontario, Canada, May 9-
          <issue>13</issue>
          ,
          <year>2010</year>
          , AAAI Press,
          <year>2010</year>
          . URL: http://aaai.org/ocs/index.php/ KR/KR2010/paper/view/1263.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Potoniec</surname>
          </string-name>
          ,
          <article-title>Finding unexplainable triples in an RDF graph</article-title>
          , in: A.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gentile</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          <string-name>
            <surname>Nuzzolese</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          <string-name>
            <surname>Pan</surname>
          </string-name>
          , M. Alam (Eds.), The Semantic Web:
          <article-title>ESWC 2018 Satellite Events - ESWC 2018 Satellite Events</article-title>
          , Heraklion, Crete, Greece, June 3-7,
          <year>2018</year>
          , Revised Selected Papers, volume
          <volume>11155</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2018</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>7</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -98192-
          <issue>5</issue>
          _1. doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>319</fpage>
          -98192-5\_1.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>