=Paper= {{Paper |id=Vol-2600/paper5 |storemode=property |title=Completion Reasoning Emulation for the Description Logic EL+ |pdfUrl=https://ceur-ws.org/Vol-2600/paper5.pdf |volume=Vol-2600 |authors=Aaron Eberhart,Monireh Ebrahimi,Lu Zhou,Cogan Shimizu,Pascal Hitzler |dblpUrl=https://dblp.org/rec/conf/aaaiss/EberhartEZSH20 }} ==Completion Reasoning Emulation for the Description Logic EL+== https://ceur-ws.org/Vol-2600/paper5.pdf
               Completion Reasoning Emulation for the Description Logic EL+

           Aaron Eberhart, Monireh Ebrahimi, Lu Zhou, Cogan Shimizu, and Pascal Hitzler
                                                       Data Semantics Lab
                                                  Kansas State University, USA
                                  {aaroneberhart,monireh,luzhou,coganmshimizu,hitzler}@ksu.edu



                            Abstract                                 is also considerable research into path-finding in embed-
                                                                     ded triple data using recurrent neural networks (RNNs) or
  We present a new approach to integrating deep learning with
  knowledge-based systems that we believe shows promise.
                                                                     LSTMs.(Xiong, Hoang, and Wang 2017; Das et al. 2016;
  Our approach seeks to emulate reasoning structure, which can       2017) Memory networks also successfully operate over
  be inspected part-way through, rather than simply learning         triple embeddings and are capable of transferring training
  reasoner answers, which is typical in many of the black-box        to work over triple embeddings from completely different
  systems currently in use. We demonstrate that this idea is fea-    vocabularies.(Ebrahimi et al. 2018) There is even an attempt
  sible by training a long short-term memory (LSTM) artificial       to embed EL++ with TransE using a novel concept of n-
  neural network to learn EL+ reasoning patterns with two dif-       balls, though it currently does not consider the RBox as well
  ferent data sets. We also show that this trained system is re-     as certain EL++ axioms that do not translate into a triple
  sistant to noise by corrupting a percentage of the test data and   format.(Kulmanov et al. 2019; Bordes et al. 2013)
  comparing the reasoner’s and LSTM’s predictions on corrupt
  data with correct answers.
                                                                     Discussion
                                                                     Many of the systems just mentioned are very successful, yet
                        Introduction                                 have a tendency to adopt familiar evaluation strategies from
Machine learning and neural network techniques are often             deep learning. Sure, it’s great to be able to get a 99% F1-
contrasted with logic-based systems as opposite in many re-          score. When we do not know the underlying structure of the
gards. The challenge of integrating inductive learning strate-       data that we are using, this is often the best we can hope for.
gies with deductive logic has no easy or obvious solution.           However, precision and recall do not make much sense in a
Though there are doubtless many reasons for this, one major          knowledge base evaluation. For an integrated system, some
roadblock is that solutions and evaluations which work well          new goals and evaluation strategies seem warranted that are
for logic, or for deep learning and machine learning, do not         neither completely deductive nor entirely empirical.
work well in the opposite, and likely do not further the goal           In a deductive reasoning system the semantics of the data
of integration. In this paper we present a new approach that         is known explicitly. Why then would we want to blindly al-
embraces the liminality of this specific integration task. Our       low such a system to teach itself what is important when
approach is by its very nature ill-suited to either machine          we already know what and how it should learn? Possibly we
learning or logic alone. But it tries to avoid the pitfalls of       don’t know the best ways to guide a complex network to the
unintentionally favoring one paradigm over the other, aim-           correct conclusions, but surely more, not less, transparency
ing instead to grasp at something new in the space between.          is needed for integrating logic and deep learning. Trans-
                                                                     parency often becomes difficult when we use extremely deep
Related Work                                                         or complex networks that cannot be reduced to components.
A popular approach for training neural networks to recog-            It also makes things difficult when we pre-process our data
nize logic is to use embedding techniques borrowed from              to help the system train by doing embeddings or other dis-
the field of natural language processing (NLP). For instance,        tance adjustments. When we embed all the similar things
Makni and Hendler designed a system that layers Resource             close together and all the dissimilar things far apart the sys-
Description Framework (RDF) graphs into adjacency matri-             tem can appear to succeed, but we can’t tell if it was the em-
ces and embedding them.(Makni and Hendler 2018) There                bedding or the system itself or one of a dozen other things
                                                                     that might have caused this.
Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel-
                                                                        In response to these and other concerns we have devel-
mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen
(Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com-        oped some research questions that we hope to address in this
bining Machine Learning and Knowledge Engineering in Practice        paper. Though we certainly will not be able to definitively
(AAAI-MAKE 2020). Stanford University, Palo Alto, California,        answer all of them, we do hope that by starting this inves-
USA, March 23-25, 2020. Use permitted under Creative Commons         tigation we can demonstrate future potential for this type of
License Attribution 4.0 International (CC BY 4.0).                   inquiry.
Research Questions                                                what is happening, as well as indicate the future potential of
1. Can a neural network emulate reasoner behavior?                this method.

2. Can we still achieve success without embeddings?
                                                                                            Approach
3. Can learning happen with arbitrary numerical names?
                                                                  We present a method for learning the structure of EL+ rea-
4. Will intermediate answers help our evaluation?                 soning patterns using an LSTM that tests the questions posed
5. Is F1-score sufficient to evaluate an integrated logic and     above. In our system we avoid, as much as possible, em-
   neural network system? If not, what would be?                  beddings or any distance adjustments that might help the
                                                                  system learn answers based on embedding similarity with-
Question one is the primary focus of this project, and the one    out first learning the reasoning structure. In fact, a primary
to which all the others are subordinate. It is the intention of   feature of our network is its lack of complexity. We take
this paper and the prototype system we have implemented to        a very simple logic, reason over knowledge bases in that
demonstrate that this is a feasible goal. Whatever disagree-      logic, and then extract supports from the reasoning steps,
ments may arise concerning our later investigations, we will      mapping the reasoner supports back to sets of the original
argue that this is a necessary and plausible alternative course   knowledge base axioms. This allows us to encode the in-
of research.                                                      put data in terms of only knowledge base statements. It also
   The second question we have already mentioned in the           provides an intermediate answer that might improve results
last section. It is our intention to avoid any pre-processing     when provided to the system. This logic data is fed into three
that will allow the system to give correct answers without        different LSTM architectures with identical input and output
learning the reasoning patterns. In this way we can better        dimensionalities. One architecture, which we call ”Deep”,
verify whether we have truly solved the reasoning problem         does not train with support data but has a hidden layer the
and not some other easier task.                                   same size as the supports we have defined. Another architec-
   The third question is related to the second, but imposes       ture, called ”Piecewise”, trains two separate half-systems,
a more broad restriction on what we will be allowed to do.        one with knowledge bases and one with supports from the
Not only do we want to avoid embedding the data in a way          reasoner. The last system, called ”Flat”, simply learns to
that will help the system to learn the answers. Also we must      map inputs directly to outputs for each reasoning step. We
take care when we go from logic to numerical data, and back       will discuss in detail each part of the system in the follow-
again, not to interfere with the latent semantics in the logic.   ing sections, then provide some results.
By avoiding distance comparisons in the naming convention
for the input we ensure that the names are arbitrary in both
                                                                  EL+
systems. The only distances we intend to measure are be-
tween predictions and answers.                                    EL+ is a lightweight and highly tractable description logic.
   The fourth question is very important for our conception       A typical reasoning task in EL+ is a sequential process
of how future systems should work. In a logic-based system        with a fixed endpoint, making it a perfect candidate for se-
there is transparency at any level of reasoning. We cannot,       quence learning. Unlike RDF, which reasons over instance
of course, expect this in most neural networks. With a net-       data in triple format, EL+ reasoning occurs on the predicate
work that aims to emulate reasoner behavior, however, we          level. Thus we will have to train the system to actually learn
can impose a degree of intermediate structure. This inter-        the reasoning patterns and logical structure of EL+ directly
mediate structure allows us to inspect a network part-way         from encoded knowledge bases.
through and perform a sort of ”debugging” since we know
exactly what it should have learned at that point. This is a      Syntax The signature Σ for EL+ is defined as:
crucial break from current thinking that advocates more and
deeper opaque hidden layers in networks that improve accu-          Σ = hNI , NC , NR i with NI , NC , NR pairwise disjoint.
racy but detract from explainability. Inspection of intermedi-
ate answers could indicate whether a proposed architecture        NI is a set of individual names.
is actually learning what we intend, which should aid devel-      NC is a set of Concept names that includes >.
opment of more correct systems.                                   NR is a set of Role names.
   The fifth and final question is a difficult but necessary
question, likely beyond the scope of this paper. For now, it is     Expressions in EL+ use the following grammar:
primarily a reminder that the method we choose intention-
ally does not try to find best answers for individual state-
                                                                      R    ::=   NR
ments. It tries to emulate reasoner behavior by matching the
shape of the reasoning patterns. That is not to say we don’t          C    ::=   NC     |    CuC      |   ∃R.C
care about right answers, or that they don’t matter. Those
values are reported for our system. And if reasoning struc-       Semantics An interpretation I where I = (∆I , ·I ) maps
ture is matched well enough then correct answers should fol-      NI , NC , NR to elements, sets, and relations in ∆I with
low. But we believe that the reasoning distance results we        function ·I . For an interpretation I and C(i), R(i) ∈ Σ, the
report are far more compelling and tell a better story about      function ·I is defined in Table 1.
                                                      Table 1: EL+ Semantics

                Description               Expression                                       Semantics
                 Individual                       a                                         a ∈ ∆I
                    Top                           >                                             ∆I
                  Bottom                          ⊥                                               ∅
                  Concept                         C                                        C I ⊆ ∆I
                    Role                          R                                     R I ⊆ ∆I × ∆ I
                Conjunction                  C uD                                          C I ∩ DI
           Existential Restriction            ∃R.C                 { a | there is b ∈ ∆ such that (a, b) ∈ RI and b ∈ C I }
                                                                                       I

           Concept Subsumption               CvD                                           C I ⊆ DI
            Role Subsumption                 RvS                                           RI ⊆ S I
                Role Chain             R1 ◦ · · · ◦ Rn v R                            R1 ◦ · · · ◦ RnI ⊆ RI
                                                                                        I

                                             with ◦ signifying standard binary composition


              Table 2: EL+ Completion Rules                                 seed0 = C1 v C2                 C2 v ∃R1.C3
                                                                            C2 v C3                         C1 v ∃R3.C4
(1) A v C        CvD                  |= A v D                              C3 u C4 v C5                    C2 v ∃R1.C3
                                                                            C1 v ∃R1.C1                     ∃R1.C2 v C4
(2) A v C1      A v C2 C1 u C2 v D |= A v D                                 C1 v ∃R2.C3                     R1 v R2
(3) A v C      C v ∃R.D               |= A v ∃R.D                           C2 v ∃R2.C3                     R1 ◦ R3 v R4
(4) A v ∃R.B     BvC       ∃R.C v D |= A v D
                                                                            Which entails
(5) A v ∃S.D     SvR                  |= A v ∃R.D
(6) A v ∃R1 .C C v ∃R2 .D R1 ◦ R2 v R |= A v ∃R.D
                                                                            Step 1:                         Step 2:
                                                                            C1 v C3                         seed1 = C1 v C5
                                                                            C1 v C4
Reasoning It is a well established result that any EL+                      C1 v ∃R1.C3
knowledge base has a least fixed point that can be deter-                   C1 v ∃R2.C1
mined by repeatedly applying a finite set of axioms that pro-               C1 v ∃R4.C4
duce all entailments of a desired type.(Besold et al. 2017;
Kazakov, Krötzsch, and Simančı́k 2012) In other words,
we can say that reasoning in EL+ often amounts to a se-                              Figure 1: First Iteration of Sequence
quence of applications of a set of pattern-matching rules.
One such set of rules, the set we have used in our exper-
iment, is given in Table 2. The reasoning reaches comple-                forced-lower-bound reasoning sequence with a connected
tion when there are no new conclusions to be made. Because               randomized knowledge base. This allows us to rapidly gen-
people are usually interested most in concept inclusions and             erate many normal semi-random EL+ knowledge bases of
restrictions, those are the types of statements we choose to             arbitrary reasoning difficulty. For this experiment we choose
include in our reasoning.                                                a moderate difficulty setting so that it can compare with non-
   After the reasoning has finished we are able to recursively           synthetic data. An example of one iteration of the two-part
define supports for each conclusion the reasoner reaches.                sequence is provided in Figure 1. To ensure that the ran-
The first step, of course, only has supports from the knowl-             domized statements do not shortcut this pattern, the random
edge base. After this step supports are determined by effec-             statements are generated in a nearly disjoint space and con-
tively running the reasoner in reverse, and replacing each               nected only to the initial seed term. This ensures that at least
statement that is not in the original knowledge base with a              one element of the random space will also produce random
superset that is, as you can see by the colored substitutions            entailments for the duration of the sequence, possibly longer.
in Table 3. When the reasoner proved the last statement it did           Our procedure also guarantees that each completion rule will
not consider all of the supports, since it had already proved            be used at least once every iteration of the sequence so that
them. It used the new facts it had learned in the last iteration         all reasoning patterns can potentially be learned by the sys-
but we have drawn their supports back out so that we can                 tem.
define a fixed set of inputs from the knowledge base.
                                                                         SNOMED
Synthetic Data                                                           We have also imported data from the SNOMED 2012 ontol-
To provide sufficient training input to our system we provide            ogy to ensure that our method is applicable to non-synthetic
a synthetic generation procedure that combines a structured              data. SNOMED is a widely-used, publicly available, ontol-
                                                     Table 3: Support Generation

                       New Fact        Rule    Support
          Step 1        C1 v C3         (1)    C1 v C2,C2 v C3
                        C1 v C4         (4)    C1 v C2,C1 v ∃R1.C1,∃R1.C2 v C4
                    C1 v ∃R1.C3         (3)    C1 v C2,C2 v ∃R1.C3
                    C1 v ∃R2.C1         (5)    C1 v ∃R1.C1,R1 v R2
                    C1 v ∃R4.C4         (6)    C1 v ∃R1.C1,R1 ◦ R3 v R4,C1 v ∃R3.C4
          Step 2        C1 v C5         (2)    C3 u C4 v C5,C1 v C2,C2 v C3,C1 v C2,C1 v ∃R1.C1,∃R1.C2 v C4


ogy of medical terms and relationships.(De Silva et al. 2011)
SNOMED 2012 has 39392 logical axioms, some of which
are complex, but this can be normalized in constant time to
a logically equivalent set of 124,428 axioms. It is expressed
in EL++ , and newer versions utilize this complexity more
liberally. But the 2012 version contains exactly one state-
ment that is not in EL+ after normalization. We feel confi-
dent that omitting > v ∃topPartOf.Self does not constitute
a substantial loss in the ontology.
   Because SNOMED is so large, we sample it to obtain
connected subsets that also produce a minimum number of
reasoning steps. We require that the samples be connected
because any normal knowledge base is connected, and it im-
proves the chances that the statements will have entailments.
Often, however, random connected samples do not entail
anything, and this is also unusual in a normal ontology, so
we require a minimum amount of reasoning activity in the
samples as well. The reasoning task for SNOMED is more
unbalanced than for the synthetic data. It is equally trivial for
the reasoner to solve either knowledge base type. However,
we observe that random connected sampling tends to favor                           Figure 2: Piecewise Architecture
application of rules 3, 5, and 6 much more heavily than oth-
ers so the system will have a more difficult time learning the
overall reasoning patterns. This imbalance is likely an ar-
tifact from SNOMED because it seems to recur in different
sample sizes with different settings, though we acknowledge
that it could be correlated somehow with the sample gener-
ation procedure.

LSTM
LSTMs are type of recurrent neural network that work well
with data that has long-term dependencies.(Hochreiter and
Schmidhuber 1997) We choose to use an LSTM to learn
the reasoning patterns of EL+ because LSTMs are designed
to learn sequence patterns. The network is kept as simple
as possible to achieve the outputs we require and maximize
transparency. For the piecewise system, depicted in Figure
2, we use two separate flat single LSTMs that have the num-
ber of neurons matching first the support shape, and later
the output shape. The deep system shown in Figure 3 is
constructed to match this shape so that the two will be com-
parable, it simply stacks the LSTMs together so that it can
train without supports. We also train a flat system, shown in                        Figure 3: Deep Architecture
Figure 4, that has the same input and output shape but lacks
the internal supports. We have done this three ways because
we can compare the behavior of the systems with support
                                                                                   Table 4: Translation Rules


                                                                          KB statement                  Vectorization
                                                                            CX v CY           →       [ 0.0, Xc , Yc , 0.0 ]
                                                                         CX u CY v CZ         →       [ Xc , Yc , Zc , 0.0 ]
                                                                          CX v ∃RY.CZ         →       [0.0, Xc , −Y   Z
                                                                                                                  r , c]

                                                                         ∃RX.CY v CZ          →       [ −X   Y   Z
                                                                                                         r , c , c , 0.0]

                                                                            RX v RY           →      [0.0, −X  −Y
                                                                                                            r , r , 0.0]
                 Figure 4: Flat Architecture                             RX ◦ RY v RZ         →      [ −X  −Y −Z
                                                                                                        r , r , r , 0.0]

                                                                            c = Number of Possible Concept Names
against the flat system as a baseline, and then see whether the               r = Number of Possible Role Names
piecewise training is helping. All designs treat the reasoning
steps as the sequence to learn, so the number of LSTM cells
is defined by the maximum reasoning sequence length. We           to scaled floating point number. It can be reversed, with a
train our LSTMs using regression on mean squared error to         slight loss of precision due to integer conversion, quite di-
minimize the distances between predicted answers and cor-         rectly. The semantics of each encoded predicate name with
rect answers.                                                     regards to its respective knowledge base is structural in re-
                                                                  lation to its position in the 4-tuple, just like it would be in
Overall System                                                    an EL+ axiom. And its semantics is not at all related to the
The individual parts of our system are not themselves novel.      other non-equal predicate names within the same 4-tuple.
Our result, however, lies rather in the unexpected success        Knowledge Base Encoding In order to input knowledge
of this unconventional general approach we have applied.          bases into the LSTM we first apply the encoding defined
One of the crucial factors in making this work is the way         in the previous section to each of its statements. Then we
in which we have translated the data from its logical format      concatenate each of these encodings end-to-end to obtain a
into something a neural network can understand. In the fol-       much longer vector. For instance,
lowing sections we will discuss how our system transforms
data from a knowledge base format to an LSTM format and                                C1 v ∃R1.C2, R1 v R2
back again.                                                       might translate to
Statement Encoding Every data set we use has a fixed                       [0.0, 0.5, −0.5, 1.0, 0.0, −0.5, −1.0, 0.0].
number of names that it can contain for both roles and con-
cepts, and every concept or role has an arbitrary number          This knowledge base vector is then copied the same number
name identifier. For the synthetic data these numbers are         of times as there are reasoning steps and stuck together once
randomly generated. For SNOMED data, all of the labels            again along a new dimension. For an experiment this is done
are stripped and the concept and role numbers are shuffled        hundreds or thousands of times, and these fill up the tensor
around and substituted with random integers to remove any         that is given to the LSTM to learn. Because some of the
possibility of the names injecting a learning bias. These la-     randomness can cause ragged data, any tensor dimensions
bels are remembered for later retrieval for output but the rea-   that are not the same length as the maximum are padded
soner and LSTM do not see them. It can seem occasionally          with zeros.
like there is a bias in the naming if output for SNOMED
data is inspected because all the labels seem related. This                                  Results
is merely a result of our sampling technique which requires       Viewed in terms of the questions posed at the beginning of
connected statements, so it is not a concern.                     this paper we feel that our results are solid, and they vali-
   Knowledge bases have a maximum number of role and              date a reexamination of how we should approach integrating
concept names that are used to scale all of the integer values    logic and neural networks. We have attempted to create a
for names into a range of [−1, 1]. To enforce disjointness of     system that emulates a reasoning task rather than reasoning
concepts and roles, we map all concepts to (0, 1], all roles to   output.
[−1, 0). Each of the six possible normalized axiom forms             If we examine the example output from the synthetic data
is encoded into a 4-tuple based on the logical encodings de-      inputs in Table 5, it is clear that it is getting very close to
fined in Table 4.                                                 many correct answers. When it misses, it still appears to
   We would like to emphasize the distinction between our         be learning the shape, and this makes us optimistic about
method, which we prefer to call an encoding, and a tradi-         its future potential. The SNOMED predictions are much
tional embedding. This encoding adds no additional infor-         more dense and do not fit well into a table, but we have in-
mation to the names beyond the transformation from integer        cluded a few good examples with the original data labels
             Table 5: Example Synthetic Output

                  Correct Answer      Predicted Answer
         Step 0   C9 v C11            C8 v C9
                  C2 v C10            C1 v C9
                  C9 v C12            C8 v C9
                  C7 v C6             C8 v ∃R4.C9
                  C9 v ∃R4.C11        C1 v ∃R5.C9
                  C2 v ∃R4.C9         C8 v ∃R4.C9                                       Figure 5: Synthetic Training
                  C9 v ∃R5.C9         C9 v ∃R5.C9
                  C2 v ∃R5.C11
                  C9 v ∃R7.C12
                  C2 v ∃R6.C12
         Step 1   C9 v C13            C8 v C12
                  C2 v C11            C2 v C10
                  C2 v C12            C1 v C11
                  C2 v ∃R4.C11        C1 v ∃R3.C12
                  C2 v ∃R5.C9         C1 v ∃R4.C8
                  C2 v ∃R7.C12
         Step 2   C2 v C13            C1 v C12


                                                                                       Figure 6: SNOMED Training


                                                                       translated into English sentences. Because there are so many
                                                                       near-misses we include edit distance evaluations that better
                                                                       capture the degree to which each predicted statement misses
                                                                       what it should have been.
                                                                          It is interesting to note that by comparing Table 7 with
             Table 6: Example SNOMED Outputs                           Table 8 we can see that on the much harder SNOMED data
                                                                       the deep system appears to have a better result because of
Correct Answer    C6 v ∃R4.C1                                          the higher F1 score, but the average edit distance, which is
Meaning           if something is a zone of superficial fascia, then   our preferred alternative measure for evaluation, is not obvi-
                  there is a subcutaneous tissue that it is PartOf     ously correlated with the F1-score. This confirms that our
Prediction        C6 v ∃R4.C3                                          fifth research question was appropriate and that F1-score
Meaning           if something is a zone of superficial fascia,        might not be sufficient to evaluate whether or not we can
                  then there is a subcutaneous tissue of palmar        learn the shape of reasoning patterns.
                  area of little finger that it is PartOf                 A cause for the discrepancy may be the higher training
Correct Answer    C8 v ∃R3.C2                                          difficulty for reaching the completion versus reaching the
Meaning           if something is a infrapubic region of pelvis,       supports in the SNOMED data, which you can see in Fig-
                  then there is a perineum that it is PartOf           ures 5 and 6. We can speculate on the degree to which vari-
Prediction        C9 v ∃R3.C2                                          ous factors are contributing to this, but importantly, the fact
Meaning           if something is a zone of integument,                that this discussion is even possible in the first place is a con-
                  then there is a perineum that it is PartOf           firmation that we can answer the fourth research question in
                                                                       the affirmative.
                                                                          Another interesting result can be seen by examining the
                                                                       charts in Figures 7 - 12. As we have noticed before, the Deep
                                                                       architecture performs quite well on the synthetic data and
                                                                       is quite resistant to increased levels of noise. The flat sys-
                          Table 7: Average Precision Recall and F1-score For each Distance Evaluation

                                 Atomic Levenshtein Distance Character Levenshtein Distance Predicate Distance
                                Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score
                                                                    Synthetic Data
           Piecewise Prediction 0.138663 0.142208 0.140412 0.138663 0.142208 0.140412 0.138646 0.141923 0.140264
           Deep Prediction      0.154398 0.156056 0.155222 0.154398 0.156056 0.155222 0.154258 0.155736 0.154993
           Flat Prediction      0.140410 0.142976 0.141681 0.140410 0.142976 0.141681 0.140375 0.142687 0.141521
           Random Prediction 0.010951 0.0200518 0.014166 0.006833 0.012401 0.008811 0.004352 0.007908 0.007908
                                                                   SNOMED Data
           Piecewise Prediction 0.010530 0.013554 0.011845 0.010530 0.013554 0.011845 0.010521 0.013554 0.011839
           Deep Prediction      0.015983 0.0172811 0.016595 0.015983 0.017281 0.016595 0.015614 0.017281 0.016396
           Flat Prediction      0.014414 0.018300 0.016112 0.0144140 0.018300 0.016112 0.013495 0.018300 0.015525
           Random Prediction 0.002807 0.006803 0.003975 0.001433 0.003444 0.002023 0.001769 0.004281 0.002504

                                     Table 8: Average Statement Edit Distances with Reasoner

                                Atomic Levenshtein Distance Character Levenshtein Distance     Predicate Distance
                                  From      To     Average From          To      Average   From        To      Average
                                                                    Synthetic Data
           Piecewise Prediction 1.336599 1.687640 1.512119 1.533115 1.812006 1.672560 2.633427 4.587382 3.610404
           Deep Prediction      1.256940 1.507150 1.382045 1.454787 1.559751 1.507269 2.504496 3.552074 3.028285
           Flat Prediction      1.344946 1.584674 1.464810 1.586281 1.660409 1.623345 2.517655 3.739770 3.128713
           Random Prediction 1.598016 1.906369 1.752192 1.970604 1.289533 1.630068 5.467918 10.57324 8.020583
                                                                    SNOMED Data
           Piecewise Prediction 1.704931 2.686562 2.195746 2.016249 2.862737 2.439493 6.556592 5.857769 6.207181
           Deep Prediction      1.759633 3.052080 2.405857 2.027190 3.328850 2.678020 4.577427 6.179389 5.378408
           Flat Prediction      1.691738 2.769542 2.230640 1.948757 2.991328 2.470042 5.548226 6.665659 6.106942
           Random Prediction 1.814656 3.599629 2.707143 2.094682 1.621700 1.858191 5.169093 12.392325 8.780709


tem gets similar results, but slightly worse. Surprisingly, the     statement strings and computes their edit distance.(Wiki-
piecewise model starts out mostly the same as the deep sys-         books contributors 2019 accessed November 19 2019) How-
tem on synthetic data but then very quickly falters. When we        ever, because some names in the namespace are one digit
examine the results for the harder SNOMED data, however,            numbers and other names are two digit numbers, we include
the opposite seems to be happening. The deep and flat sys-          a modified version of this function, called ”Atomic”, that
tems still track each other. But for the unbalanced data the        uniformly substitutes all two digit numbers in the strings
deep system crosses into the random range by 30-50% cor-            with symbols that do not occur in either. Since there cannot
ruption, while the piecewise system is able to skirt just above     be more than eight unique numbers in two decoded strings
random for the Character and Predicate distance functions           there are no issues with finding enough new symbols. By
until around 80-90%.                                                doing the substitutions we can see the impact that the num-
                                                                    ber digits were having on the edits from the Atomic Lev-
Methodology                                                         enshtein distance. Finally we devise a distance function that
Our system is trained using randomized 10-fold cross valida-        is based on our encoding scheme. The Predicate Distance
tion to 20000 epochs on the deep and flat systems and 10000         method disassembles each string into only its predicates.
epochs each for the parts of the piecewise system at a learn-       Then, for each position in the 4-tuple, a distance is calcu-
ing rate of 0.0001. The results from every fold are saved and       lated that yields zero for perfect matches, absolute value of
we report the average of all of the runs. Often there are better    guessed number - actual number for correct Class and Role
and worse runs in any given 10-fold validation but the aver-        guesses, and guessed number + actual number for incorrect
ages across multiple cross-validations with the same settings       Class and Role matches. So, for instance, guessing C1 when
remain basically the same. 10-fold cross-validations are per-       the answer is C2 will yield a Predicate Distance of 1, while
formed with an extra comparison against data that has been          a guess of R2 for a correct answer of C15 will yield 17.
corrupted at a fixed rate, starting with zero percent probabil-     Though this method is specific to our unique encoding, we
ity of corruption and moving upwards by ten percent each            believe it detects good and bad results quite well because
time. The data in Tables 7 and 8 is from the comparison with        perfect hits are 0, close misses are penalized a little, and
no corruption, so the error data is not reported (it is identical   large misses are penalized a lot.
with reasoner answers).                                                For each method we take every statement in a knowledge
   To perform our evaluations we use three unique distance          base completion and compare it with the best match in the
measurements. We have a naive “Character” Levenshtein               reasoner answer, random answers, and corrupted knowledge
distance function that takes two unaltered knowledge base           base answers. While we compute these distances we are able
to obtain precision, recall, and F1-score by counting the the     GiB DDR4, and a GeForce GTX 1060 6GB/PCIe/SSE2.
number of times the distance returns zero and treating the        Source code and experiment data is available on GitHub
statement predictions as classifications. Each time the sys-      https://github.com/aaronEberhart/PySynGenReas.
tem runs it can make any number of predictions, from zero
to the maximum size of the output tensor. This means that,        Acknowledgements This work was partially supported by
although the predictions and reasoner are usually around the      the Air Force Office of Scientific Research under award
same size and those comparisons are fine, we have to gen-         number FA9550-18-1-0386, and by the National Science
erate random data to compare against that is as big as could      Foundation under award number 1936677.
conceivably be needed by the system. Any artificial shaping
decisions we made to compensate for the variations between                               References
runs would invariably introduce their own bias in how we
selected them. Thus the need to use the biggest possible ran-     Besold, T. R.; d’Avila Garcez, A. S.; Bader, S.; Bowman,
dom data to compare against means the precision, recall, and      H.; Domingos, P. M.; Hitzler, P.; Kühnberger, K.; Lamb,
F1-score for random are low.                                      L. C.; Lowd, D.; Lima, P. M. V.; de Penning, L.; Pinkas,
                                                                  G.; Poon, H.; and Zaverucha, G. 2017. Neural-symbolic
                       Conclusion                                 learning and reasoning: A survey and interpretation. CoRR
                                                                  abs/1711.03902.
In this experiment demonstrate that EL+ reasoning can be
emulated with an LSTM. Although our example is merely             Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and
a prototype, we can definitively say yes to the first research    Yakhnenko, O. 2013. Translating embeddings for model-
question, a neural network can be trained to emulate com-         ing multi-relational data. In Advances in neural information
pletion reasoning behavior. And because we have answered          processing systems, 2787–2795.
this first research question without using name adjustments       Das, R.; Neelakantan, A.; Belanger, D.; and McCallum, A.
or embeddings, so we can also answer yes to the second and        2016. Chains of reasoning over entities, relations, and text
third questions. We have already discussed questions four         using recurrent neural networks. CoRR abs/1607.01426.
and five.                                                         Das, R.; Dhuliawala, S.; Zaheer, M.; Vilnis, L.; Durugkar, I.;
                                                                  Krishnamurthy, A.; Smola, A.; and McCallum, A. 2017. Go
Future Work                                                       for a walk and arrive at the answer: Reasoning over paths
There are a number of potential improvements that could           in knowledge bases using reinforcement learning. arXiv
build on the work presented here. A few of the choices we         preprint arXiv:1711.05851.
made might have had an impact of the results of the study
                                                                  De Silva, T. S.; MacDonald, D.; Paterson, G.; Sikdar, K. C.;
and we are curious if alternate strategies could improve the
                                                                  and Cochrane, B. 2011. Systematized nomenclature of
way that the system learns reasoning patterns. We might
                                                                  medicine clinical terms (snomed ct) to represent computed
have chosen a 3-tuple encoding pattern that was more dense
                                                                  tomography procedures. Comput. Methods Prog. Biomed.
but did not mirror the shape of the statements symmetrically.
                                                                  101(3):324–329.
We also considered adding the statements as an additional
dimension rather than concatenating them all together. This       Ebrahimi, M.; Sarker, M. K.; Bianchi, F.; Xie, N.; Doran, D.;
may have made it easier for the system to distinguish indi-       and Hitzler, P. 2018. Reasoning over rdf knowledge bases
vidual statements but would also add a lot of complexity in       using deep learning. arXiv preprint arXiv:1811.04132.
the extra dimension. We experimented with different neu-          Hochreiter, S., and Schmidhuber, J. 1997. Lstm can solve
ron types like gated recurrent unit (GRU) and basic RNN,          hard long time lag problems. In Advances in neural infor-
though these had no significant impact on the results. The        mation processing systems, 473–479.
type of neuron seems to have less effect than the overall         Kazakov, Y.; Krötzsch, M.; and Simančı́k, F. 2012. Elk: a
structure of the network. It would be interesting to see if       reasoner for owl el ontologies. System Description.
different ways of segmenting a network, besides just along
                                                                  Kulmanov, M.; Liu-Wei, W.; Yan, Y.; and Hoehndorf,
the supports as we have defined them, could improve results.
                                                                  R. 2019. El embeddings: Geometric construction of
   Along with these ideas to improve our current system, we
                                                                  models for the description logic el++. arXiv preprint
also speculate that with some effort we should be able to
                                                                  arXiv:1902.10499.
accomplish basically the same type of thing with a more ex-
pressive logic and a new encoding. EL++ for instance would        Makni, B., and Hendler, J. 2018. Deep Learning for Noise-
be an easy first attempt since it is so similar to EL+ . Though   tolerant RDFS Reasoning. Ph.D. Dissertation, Rensselaer
we should be able to adapt this strategy for any logic that       Polytechnic Institute.
uses completion rules for reasoning. Whatever we choose to        Wikibooks contributors. 2019 (accessed November 19,
do with it, we are optimistic that we can use this strategy       2019). Algorithm implementation/strings/levenshtein dis-
to advance understanding of how logic and neural networks         tance.
can work together.                                                Xiong, W.; Hoang, T.; and Wang, W. Y. 2017. Deeppath:
                                                                  A reinforcement learning method for knowledge graph rea-
Testing Environment                                               soning. arXiv preprint arXiv:1707.06690.
All testing was done on a computer running Ubuntu 19.10
64-bit with an Intel Core i7-9700K CPU@3.60GHz x 8, 47.1
(a) Synthetic Data Piecewise Architecture      (b) Synthetic Data Deep Architecture    (c) Synthetic Data Flat Architecture




(d) SNOMED Data Piecewise Architecture         (e) SNOMED Data Deep Architecture       (f) SNOMED Data Flat Architecture

                                            Figure 7: Character Levenshtein Distance




(a) Synthetic Data Piecewise Architecture      (b) Synthetic Data Deep Architecture    (c) Synthetic Data Flat Architecture




(d) SNOMED Data Piecewise Architecture         (e) SNOMED Data Deep Architecture       (f) SNOMED Data Flat Architecture

                        Figure 8: Character Levenshtein Distance Precision, Recall, and F1-score
(a) Synthetic Data Piecewise Architecture     (b) Synthetic Data Deep Architecture   (c) Synthetic Data Flat Architecture




(d) SNOMED Data Piecewise Architecture        (e) SNOMED Data Deep Architecture      (f) SNOMED Data Flat Architecture

                                            Figure 9: Atomic Levenshtein Distance




(a) Synthetic Data Piecewise Architecture     (b) Synthetic Data Deep Architecture   (c) Synthetic Data Flat Architecture




(d) SNOMED Data Piecewise Architecture        (e) SNOMED Data Deep Architecture      (f) SNOMED Data Flat Architecture

                         Figure 10: Atomic Levenshtein Distance Precision, Recall, and F1-score
(a) Synthetic Data Piecewise Architecture   (b) Synthetic Data Deep Architecture     (c) Synthetic Data Flat Architecture




(d) SNOMED Data Piecewise Architecture      (e) SNOMED Data Deep Architecture        (f) SNOMED Data Flat Architecture

                                              Figure 11: Predicate Distance




(a) Synthetic Data Piecewise Architecture   (b) Synthetic Data Deep Architecture     (c) Synthetic Data Flat Architecture




(d) SNOMED Data Piecewise Architecture      (e) SNOMED Data Deep Architecture        (f) SNOMED Data Flat Architecture

                               Figure 12: Predicate Distance Precision, Recall, and F1-score