<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Completion Reasoning Emulation for the Description Logic E L+</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>SNOMED Data Piecewise Architecture</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SNOMED Data Deep Architecture</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SNOMED Data Flat Architecture Figure</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>: Predicate Distance Precision</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Recall</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>-score</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aaron Eberhart, Monireh Ebrahimi, Lu Zhou, Cogan Shimizu, and Pascal Hitzler Data Semantics Lab Kansas State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>We present a new approach to integrating deep learning with knowledge-based systems that we believe shows promise. Our approach seeks to emulate reasoning structure, which can be inspected part-way through, rather than simply learning reasoner answers, which is typical in many of the black-box systems currently in use. We demonstrate that this idea is feasible by training a long short-term memory (LSTM) artificial neural network to learn EL+ reasoning patterns with two different data sets. We also show that this trained system is resistant to noise by corrupting a percentage of the test data and comparing the reasoner's and LSTM's predictions on corrupt data with correct answers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Machine learning and neural network techniques are often
contrasted with logic-based systems as opposite in many
regards. The challenge of integrating inductive learning
strategies with deductive logic has no easy or obvious solution.
Though there are doubtless many reasons for this, one major
roadblock is that solutions and evaluations which work well
for logic, or for deep learning and machine learning, do not
work well in the opposite, and likely do not further the goal
of integration. In this paper we present a new approach that
embraces the liminality of this specific integration task. Our
approach is by its very nature ill-suited to either machine
learning or logic alone. But it tries to avoid the pitfalls of
unintentionally favoring one paradigm over the other,
aiming instead to grasp at something new in the space between.
A popular approach for training neural networks to
recognize logic is to use embedding techniques borrowed from
the field of natural language processing (NLP). For instance,
Makni and Hendler designed a system that layers Resource
Description Framework (RDF) graphs into adjacency
matrices and embedding them.
        <xref ref-type="bibr" rid="ref10">(Makni and Hendler 2018)</xref>
        There
is also considerable research into path-finding in
embedded triple data using recurrent neural networks (RNNs) or
LSTMs.
        <xref ref-type="bibr" rid="ref12">(Xiong, Hoang, and Wang 2017; Das et al. 2016;
2017)</xref>
        Memory networks also successfully operate over
triple embeddings and are capable of transferring training
to work over triple embeddings from completely different
vocabularies.
        <xref ref-type="bibr" rid="ref6">(Ebrahimi et al. 2018)</xref>
        There is even an attempt
to embed E L++ with TransE using a novel concept of
nballs, though it currently does not consider the RBox as well
as certain E L++ axioms that do not translate into a triple
format.(Kulmanov et al. 2019; Bordes et al. 2013)
      </p>
      <sec id="sec-1-1">
        <title>Discussion</title>
        <p>Many of the systems just mentioned are very successful, yet
have a tendency to adopt familiar evaluation strategies from
deep learning. Sure, it’s great to be able to get a 99%
F1score. When we do not know the underlying structure of the
data that we are using, this is often the best we can hope for.
However, precision and recall do not make much sense in a
knowledge base evaluation. For an integrated system, some
new goals and evaluation strategies seem warranted that are
neither completely deductive nor entirely empirical.</p>
        <p>In a deductive reasoning system the semantics of the data
is known explicitly. Why then would we want to blindly
allow such a system to teach itself what is important when
we already know what and how it should learn? Possibly we
don’t know the best ways to guide a complex network to the
correct conclusions, but surely more, not less, transparency
is needed for integrating logic and deep learning.
Transparency often becomes difficult when we use extremely deep
or complex networks that cannot be reduced to components.
It also makes things difficult when we pre-process our data
to help the system train by doing embeddings or other
distance adjustments. When we embed all the similar things
close together and all the dissimilar things far apart the
system can appear to succeed, but we can’t tell if it was the
embedding or the system itself or one of a dozen other things
that might have caused this.</p>
        <p>In response to these and other concerns we have
developed some research questions that we hope to address in this
paper. Though we certainly will not be able to definitively
answer all of them, we do hope that by starting this
investigation we can demonstrate future potential for this type of
inquiry.</p>
        <p>Research Questions
1. Can a neural network emulate reasoner behavior?
2. Can we still achieve success without embeddings?
3. Can learning happen with arbitrary numerical names?
4. Will intermediate answers help our evaluation?
5. Is F1-score sufficient to evaluate an integrated logic and
neural network system? If not, what would be?
Question one is the primary focus of this project, and the one
to which all the others are subordinate. It is the intention of
this paper and the prototype system we have implemented to
demonstrate that this is a feasible goal. Whatever
disagreements may arise concerning our later investigations, we will
argue that this is a necessary and plausible alternative course
of research.</p>
        <p>The second question we have already mentioned in the
last section. It is our intention to avoid any pre-processing
that will allow the system to give correct answers without
learning the reasoning patterns. In this way we can better
verify whether we have truly solved the reasoning problem
and not some other easier task.</p>
        <p>The third question is related to the second, but imposes
a more broad restriction on what we will be allowed to do.
Not only do we want to avoid embedding the data in a way
that will help the system to learn the answers. Also we must
take care when we go from logic to numerical data, and back
again, not to interfere with the latent semantics in the logic.
By avoiding distance comparisons in the naming convention
for the input we ensure that the names are arbitrary in both
systems. The only distances we intend to measure are
between predictions and answers.</p>
        <p>The fourth question is very important for our conception
of how future systems should work. In a logic-based system
there is transparency at any level of reasoning. We cannot,
of course, expect this in most neural networks. With a
network that aims to emulate reasoner behavior, however, we
can impose a degree of intermediate structure. This
intermediate structure allows us to inspect a network part-way
through and perform a sort of ”debugging” since we know
exactly what it should have learned at that point. This is a
crucial break from current thinking that advocates more and
deeper opaque hidden layers in networks that improve
accuracy but detract from explainability. Inspection of
intermediate answers could indicate whether a proposed architecture
is actually learning what we intend, which should aid
development of more correct systems.</p>
        <p>The fifth and final question is a difficult but necessary
question, likely beyond the scope of this paper. For now, it is
primarily a reminder that the method we choose
intentionally does not try to find best answers for individual
statements. It tries to emulate reasoner behavior by matching the
shape of the reasoning patterns. That is not to say we don’t
care about right answers, or that they don’t matter. Those
values are reported for our system. And if reasoning
structure is matched well enough then correct answers should
follow. But we believe that the reasoning distance results we
report are far more compelling and tell a better story about
what is happening, as well as indicate the future potential of
this method.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>We present a method for learning the structure of E L+
reasoning patterns using an LSTM that tests the questions posed
above. In our system we avoid, as much as possible,
embeddings or any distance adjustments that might help the
system learn answers based on embedding similarity
without first learning the reasoning structure. In fact, a primary
feature of our network is its lack of complexity. We take
a very simple logic, reason over knowledge bases in that
logic, and then extract supports from the reasoning steps,
mapping the reasoner supports back to sets of the original
knowledge base axioms. This allows us to encode the
input data in terms of only knowledge base statements. It also
provides an intermediate answer that might improve results
when provided to the system. This logic data is fed into three
different LSTM architectures with identical input and output
dimensionalities. One architecture, which we call ”Deep”,
does not train with support data but has a hidden layer the
same size as the supports we have defined. Another
architecture, called ”Piecewise”, trains two separate half-systems,
one with knowledge bases and one with supports from the
reasoner. The last system, called ”Flat”, simply learns to
map inputs directly to outputs for each reasoning step. We
will discuss in detail each part of the system in the
following sections, then provide some results.</p>
      <p>E L</p>
      <p>+
E L+ is a lightweight and highly tractable description logic.
A typical reasoning task in E L+ is a sequential process
with a fixed endpoint, making it a perfect candidate for
sequence learning. Unlike RDF, which reasons over instance
data in triple format, E L+ reasoning occurs on the predicate
level. Thus we will have to train the system to actually learn
the reasoning patterns and logical structure of E L+ directly
from encoded knowledge bases.</p>
      <sec id="sec-2-1">
        <title>Syntax</title>
        <sec id="sec-2-1-1">
          <title>The signature</title>
          <p>for E L+ is defined as:
= hNI ; NC ; NRi with NI ; NC ; NR pairwise disjoint.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>NI is a set of individual names.</title>
          <p>NC is a set of Concept names that includes &gt;.</p>
          <p>NR is a set of Role names.</p>
          <p>Expressions in E L+ use the following grammar:
R
C
::=
::=</p>
          <p>NR
NC
j</p>
          <p>C u C
j
9R:C
Semantics An interpretation I where I = ( I ; I ) maps
NI ; NC ; NR to elements, sets, and relations in I with
function I . For an interpretation I and C(i); R(i) 2 , the
function I is defined in Table 1.</p>
          <p>Description
Individual</p>
          <p>Top
Bottom
Concept</p>
          <p>Role</p>
          <p>Conjunction
Existential Restriction
Concept Subsumption</p>
          <p>Role Subsumption</p>
          <p>Role Chain</p>
          <p>Expression
a
&gt;
?
C</p>
          <p>R
C u D
9R:C
C v D
R v S
R1</p>
          <p>Rn v R R1I
with signifying standard binary composition</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Semantics</title>
          <p>
            Reasoning It is a well established result that any E L+
knowledge base has a least fixed point that can be
determined by repeatedly applying a finite set of axioms that
produce all entailments of a desired type.
            <xref ref-type="bibr" rid="ref1 ref8">(Besold et al. 2017;
Kazakov, Kro¨tzsch, and Simancˇ´ık 2012)</xref>
            In other words,
we can say that reasoning in E L+ often amounts to a
sequence of applications of a set of pattern-matching rules.
One such set of rules, the set we have used in our
experiment, is given in Table 2. The reasoning reaches
completion when there are no new conclusions to be made. Because
people are usually interested most in concept inclusions and
restrictions, those are the types of statements we choose to
include in our reasoning.
          </p>
          <p>After the reasoning has finished we are able to recursively
define supports for each conclusion the reasoner reaches.
The first step, of course, only has supports from the
knowledge base. After this step supports are determined by
effectively running the reasoner in reverse, and replacing each
statement that is not in the original knowledge base with a
superset that is, as you can see by the colored substitutions
in Table 3. When the reasoner proved the last statement it did
not consider all of the supports, since it had already proved
them. It used the new facts it had learned in the last iteration
but we have drawn their supports back out so that we can
define a fixed set of inputs from the knowledge base.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Synthetic Data</title>
        <p>To provide sufficient training input to our system we provide
a synthetic generation procedure that combines a structured
seed0 = C1 v C2
C2 v C3
C3 u C4 v C5
C1 v 9R1.C1
C1 v 9R2.C3
C2 v 9R2.C3
Which entails
Step 1:
C1 v C3
C1 v C4
C1 v 9R1.C3
C1 v 9R2.C1
C1 v 9R4.C4
C2 v 9R1.C3
C1 v 9R3.C4
C2 v 9R1.C3
9R1.C2 v C4
R1 v R2
R1 R3 v R4
Step 2:
seed1 = C1 v C5
forced-lower-bound reasoning sequence with a connected
randomized knowledge base. This allows us to rapidly
generate many normal semi-random E L+ knowledge bases of
arbitrary reasoning difficulty. For this experiment we choose
a moderate difficulty setting so that it can compare with
nonsynthetic data. An example of one iteration of the two-part
sequence is provided in Figure 1. To ensure that the
randomized statements do not shortcut this pattern, the random
statements are generated in a nearly disjoint space and
connected only to the initial seed term. This ensures that at least
one element of the random space will also produce random
entailments for the duration of the sequence, possibly longer.
Our procedure also guarantees that each completion rule will
be used at least once every iteration of the sequence so that
all reasoning patterns can potentially be learned by the
system.</p>
      </sec>
      <sec id="sec-2-3">
        <title>SNOMED</title>
        <p>
          We have also imported data from the SNOMED 2012
ontology to ensure that our method is applicable to non-synthetic
data. SNOMED is a widely-used, publicly available,
ontology of medical terms and relationships.
          <xref ref-type="bibr" rid="ref5">(De Silva et al. 2011)</xref>
          SNOMED 2012 has 39392 logical axioms, some of which
are complex, but this can be normalized in constant time to
a logically equivalent set of 124,428 axioms. It is expressed
in E L++, and newer versions utilize this complexity more
liberally. But the 2012 version contains exactly one
statement that is not in E L+ after normalization. We feel
confident that omitting &gt; v 9topPartOf.Self does not constitute
a substantial loss in the ontology.
        </p>
        <p>Because SNOMED is so large, we sample it to obtain
connected subsets that also produce a minimum number of
reasoning steps. We require that the samples be connected
because any normal knowledge base is connected, and it
improves the chances that the statements will have entailments.
Often, however, random connected samples do not entail
anything, and this is also unusual in a normal ontology, so
we require a minimum amount of reasoning activity in the
samples as well. The reasoning task for SNOMED is more
unbalanced than for the synthetic data. It is equally trivial for
the reasoner to solve either knowledge base type. However,
we observe that random connected sampling tends to favor
application of rules 3, 5, and 6 much more heavily than
others so the system will have a more difficult time learning the
overall reasoning patterns. This imbalance is likely an
artifact from SNOMED because it seems to recur in different
sample sizes with different settings, though we acknowledge
that it could be correlated somehow with the sample
generation procedure.</p>
      </sec>
      <sec id="sec-2-4">
        <title>LSTM</title>
        <p>
          LSTMs are type of recurrent neural network that work well
with data that has long-term dependencies.
          <xref ref-type="bibr" rid="ref7">(Hochreiter and
Schmidhuber 1997)</xref>
          We choose to use an LSTM to learn
the reasoning patterns of E L+ because LSTMs are designed
to learn sequence patterns. The network is kept as simple
as possible to achieve the outputs we require and maximize
transparency. For the piecewise system, depicted in Figure
2, we use two separate flat single LSTMs that have the
number of neurons matching first the support shape, and later
the output shape. The deep system shown in Figure 3 is
constructed to match this shape so that the two will be
comparable, it simply stacks the LSTMs together so that it can
train without supports. We also train a flat system, shown in
Figure 4, that has the same input and output shape but lacks
the internal supports. We have done this three ways because
we can compare the behavior of the systems with support
against the flat system as a baseline, and then see whether the
piecewise training is helping. All designs treat the reasoning
steps as the sequence to learn, so the number of LSTM cells
is defined by the maximum reasoning sequence length. We
train our LSTMs using regression on mean squared error to
minimize the distances between predicted answers and
correct answers.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Overall System</title>
        <p>The individual parts of our system are not themselves novel.
Our result, however, lies rather in the unexpected success
of this unconventional general approach we have applied.
One of the crucial factors in making this work is the way
in which we have translated the data from its logical format
into something a neural network can understand. In the
following sections we will discuss how our system transforms
data from a knowledge base format to an LSTM format and
back again.</p>
        <p>Statement Encoding Every data set we use has a fixed
number of names that it can contain for both roles and
concepts, and every concept or role has an arbitrary number
name identifier. For the synthetic data these numbers are
randomly generated. For SNOMED data, all of the labels
are stripped and the concept and role numbers are shuffled
around and substituted with random integers to remove any
possibility of the names injecting a learning bias. These
labels are remembered for later retrieval for output but the
reasoner and LSTM do not see them. It can seem occasionally
like there is a bias in the naming if output for SNOMED
data is inspected because all the labels seem related. This
is merely a result of our sampling technique which requires
connected statements, so it is not a concern.</p>
        <p>Knowledge bases have a maximum number of role and
concept names that are used to scale all of the integer values
for names into a range of [ 1; 1]. To enforce disjointness of
concepts and roles, we map all concepts to (0; 1], all roles to
[ 1; 0). Each of the six possible normalized axiom forms
is encoded into a 4-tuple based on the logical encodings
defined in Table 4.</p>
        <p>We would like to emphasize the distinction between our
method, which we prefer to call an encoding, and a
traditional embedding. This encoding adds no additional
information to the names beyond the transformation from integer
to scaled floating point number. It can be reversed, with a
slight loss of precision due to integer conversion, quite
directly. The semantics of each encoded predicate name with
regards to its respective knowledge base is structural in
relation to its position in the 4-tuple, just like it would be in
an E L+ axiom. And its semantics is not at all related to the
other non-equal predicate names within the same 4-tuple.
Knowledge Base Encoding In order to input knowledge
bases into the LSTM we first apply the encoding defined
in the previous section to each of its statements. Then we
concatenate each of these encodings end-to-end to obtain a
much longer vector. For instance,</p>
        <p>C1 v 9R1.C2, R1 v R2
might translate to</p>
        <p>[0:0; 0:5; 0:5; 1:0; 0:0; 0:5; 1:0; 0:0]:
This knowledge base vector is then copied the same number
of times as there are reasoning steps and stuck together once
again along a new dimension. For an experiment this is done
hundreds or thousands of times, and these fill up the tensor
that is given to the LSTM to learn. Because some of the
randomness can cause ragged data, any tensor dimensions
that are not the same length as the maximum are padded
with zeros.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>Viewed in terms of the questions posed at the beginning of
this paper we feel that our results are solid, and they
validate a reexamination of how we should approach integrating
logic and neural networks. We have attempted to create a
system that emulates a reasoning task rather than reasoning
output.</p>
      <p>If we examine the example output from the synthetic data
inputs in Table 5, it is clear that it is getting very close to
many correct answers. When it misses, it still appears to
be learning the shape, and this makes us optimistic about
its future potential. The SNOMED predictions are much
more dense and do not fit well into a table, but we have
included a few good examples with the original data labels
translated into English sentences. Because there are so many
near-misses we include edit distance evaluations that better
capture the degree to which each predicted statement misses
what it should have been.</p>
      <p>It is interesting to note that by comparing Table 7 with
Table 8 we can see that on the much harder SNOMED data
the deep system appears to have a better result because of
the higher F1 score, but the average edit distance, which is
our preferred alternative measure for evaluation, is not
obviously correlated with the F1-score. This confirms that our
fifth research question was appropriate and that F1-score
might not be sufficient to evaluate whether or not we can
learn the shape of reasoning patterns.</p>
      <p>A cause for the discrepancy may be the higher training
difficulty for reaching the completion versus reaching the
supports in the SNOMED data, which you can see in
Figures 5 and 6. We can speculate on the degree to which
various factors are contributing to this, but importantly, the fact
that this discussion is even possible in the first place is a
confirmation that we can answer the fourth research question in
the affirmative.</p>
      <p>Another interesting result can be seen by examining the
charts in Figures 7 - 12. As we have noticed before, the Deep
architecture performs quite well on the synthetic data and
is quite resistant to increased levels of noise. The flat
system gets similar results, but slightly worse. Surprisingly, the
piecewise model starts out mostly the same as the deep
system on synthetic data but then very quickly falters. When we
examine the results for the harder SNOMED data, however,
the opposite seems to be happening. The deep and flat
systems still track each other. But for the unbalanced data the
deep system crosses into the random range by 30-50%
corruption, while the piecewise system is able to skirt just above
random for the Character and Predicate distance functions
until around 80-90%.</p>
      <sec id="sec-3-1">
        <title>Methodology</title>
        <p>Our system is trained using randomized 10-fold cross
validation to 20000 epochs on the deep and flat systems and 10000
epochs each for the parts of the piecewise system at a
learning rate of 0.0001. The results from every fold are saved and
we report the average of all of the runs. Often there are better
and worse runs in any given 10-fold validation but the
averages across multiple cross-validations with the same settings
remain basically the same. 10-fold cross-validations are
performed with an extra comparison against data that has been
corrupted at a fixed rate, starting with zero percent
probability of corruption and moving upwards by ten percent each
time. The data in Tables 7 and 8 is from the comparison with
no corruption, so the error data is not reported (it is identical
with reasoner answers).</p>
        <p>
          To perform our evaluations we use three unique distance
measurements. We have a naive “Character” Levenshtein
distance function that takes two unaltered knowledge base
statement strings and computes their edit
distance.
          <xref ref-type="bibr" rid="ref11">(Wikibooks contributors 2019 accessed November 19 2019)</xref>
          However, because some names in the namespace are one digit
numbers and other names are two digit numbers, we include
a modified version of this function, called ”Atomic”, that
uniformly substitutes all two digit numbers in the strings
with symbols that do not occur in either. Since there cannot
be more than eight unique numbers in two decoded strings
there are no issues with finding enough new symbols. By
doing the substitutions we can see the impact that the
number digits were having on the edits from the Atomic
Levenshtein distance. Finally we devise a distance function that
is based on our encoding scheme. The Predicate Distance
method disassembles each string into only its predicates.
Then, for each position in the 4-tuple, a distance is
calculated that yields zero for perfect matches, absolute value of
guessed number - actual number for correct Class and Role
guesses, and guessed number + actual number for incorrect
Class and Role matches. So, for instance, guessing C1 when
the answer is C2 will yield a Predicate Distance of 1, while
a guess of R2 for a correct answer of C15 will yield 17.
Though this method is specific to our unique encoding, we
believe it detects good and bad results quite well because
perfect hits are 0, close misses are penalized a little, and
large misses are penalized a lot.
        </p>
        <p>For each method we take every statement in a knowledge
base completion and compare it with the best match in the
reasoner answer, random answers, and corrupted knowledge
base answers. While we compute these distances we are able
to obtain precision, recall, and F1-score by counting the the
number of times the distance returns zero and treating the
statement predictions as classifications. Each time the
system runs it can make any number of predictions, from zero
to the maximum size of the output tensor. This means that,
although the predictions and reasoner are usually around the
same size and those comparisons are fine, we have to
generate random data to compare against that is as big as could
conceivably be needed by the system. Any artificial shaping
decisions we made to compensate for the variations between
runs would invariably introduce their own bias in how we
selected them. Thus the need to use the biggest possible
random data to compare against means the precision, recall, and
F1-score for random are low.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this experiment demonstrate that E L+ reasoning can be
emulated with an LSTM. Although our example is merely
a prototype, we can definitively say yes to the first research
question, a neural network can be trained to emulate
completion reasoning behavior. And because we have answered
this first research question without using name adjustments
or embeddings, so we can also answer yes to the second and
third questions. We have already discussed questions four
and five.</p>
      <sec id="sec-4-1">
        <title>Future Work</title>
        <p>There are a number of potential improvements that could
build on the work presented here. A few of the choices we
made might have had an impact of the results of the study
and we are curious if alternate strategies could improve the
way that the system learns reasoning patterns. We might
have chosen a 3-tuple encoding pattern that was more dense
but did not mirror the shape of the statements symmetrically.
We also considered adding the statements as an additional
dimension rather than concatenating them all together. This
may have made it easier for the system to distinguish
individual statements but would also add a lot of complexity in
the extra dimension. We experimented with different
neuron types like gated recurrent unit (GRU) and basic RNN,
though these had no significant impact on the results. The
type of neuron seems to have less effect than the overall
structure of the network. It would be interesting to see if
different ways of segmenting a network, besides just along
the supports as we have defined them, could improve results.</p>
        <p>Along with these ideas to improve our current system, we
also speculate that with some effort we should be able to
accomplish basically the same type of thing with a more
expressive logic and a new encoding. E L++ for instance would
be an easy first attempt since it is so similar to E L+. Though
we should be able to adapt this strategy for any logic that
uses completion rules for reasoning. Whatever we choose to
do with it, we are optimistic that we can use this strategy
to advance understanding of how logic and neural networks
can work together.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Testing Environment</title>
        <p>All testing was done on a computer running Ubuntu 19.10
64-bit with an Intel Core i7-9700K CPU@3.60GHz x 8, 47.1
GiB DDR4, and a GeForce GTX 1060 6GB/PCIe/SSE2.
Source code and experiment data is available on GitHub
https://github.com/aaronEberhart/PySynGenReas.
Acknowledgements This work was partially supported by
the Air Force Office of Scientific Research under award
number FA9550-18-1-0386, and by the National Science
Foundation under award number 1936677.
(d) SNOMED Data Piecewise Architecture
(e) SNOMED Data Deep Architecture
Figure 7: Character Levenshtein Distance
(f) SNOMED Data Flat Architecture
(a) Synthetic Data Piecewise Architecture
(b) Synthetic Data Deep Architecture
(c) Synthetic Data Flat Architecture
(d) SNOMED Data Piecewise Architecture
(e) SNOMED Data Deep Architecture
Figure 9: Atomic Levenshtein Distance
(f) SNOMED Data Flat Architecture
(a) Synthetic Data Piecewise Architecture
(b) Synthetic Data Deep Architecture
(c) Synthetic Data Flat Architecture
(d) SNOMED Data Piecewise Architecture</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Besold</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          R.;
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Bowman,
          <string-name>
            <surname>H.</surname>
          </string-name>
          ; Domingos,
          <string-name>
            <given-names>P. M.</given-names>
            ;
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ; Ku¨hnberger, K.;
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            ;
            <surname>Lowd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ; Lima, P. M. V.; de Penning, L.;
            <surname>Pinkas</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ; Poon, H.; and Zaverucha,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Neural-symbolic learning and reasoning: A survey and interpretation</article-title>
          .
          <source>CoRR abs/1711</source>
          .03902.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>2787</volume>
          -
          <fpage>2795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          2016.
          <article-title>Chains of reasoning over entities, relations, and text using recurrent neural networks</article-title>
          .
          <source>CoRR abs/1607</source>
          .01426.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Dhuliawala,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Zaheer,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Vilnis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ;
            <surname>Durugkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ;
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ; and
            <surname>McCallum</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning</article-title>
          .
          <source>arXiv preprint arXiv:1711</source>
          .
          <fpage>05851</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>De Silva</surname>
            , T. S.; MacDonald, D.; Paterson,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sikdar</surname>
            ,
            <given-names>K. C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Cochrane</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Systematized nomenclature of medicine clinical terms (snomed ct) to represent computed tomography procedures</article-title>
          .
          <source>Comput. Methods Prog</source>
          . Biomed.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Ebrahimi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sarker</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bianchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Doran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and Hitzler,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Reasoning over rdf knowledge bases using deep learning</article-title>
          .
          <source>arXiv preprint arXiv:1811</source>
          .04132.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>Lstm can solve hard long time lag problems</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>473</volume>
          -
          <fpage>479</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; Kro¨tzsch, M.; and Simancˇ´ık,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Elk: a reasoner for owl el ontologies</article-title>
          .
          <source>System Description.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kulmanov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu-Wei</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Yan,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ; and Hoehndorf,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>El embeddings: Geometric construction of models for the description logic el++</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .10499.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Makni</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Deep Learning for Noisetolerant RDFS Reasoning</article-title>
          .
          <source>Ph.D. Dissertation</source>
          , Rensselaer Polytechnic Institute.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>Wikibooks contributors</article-title>
          .
          <source>2019 (accessed November 19</source>
          ,
          <year>2019</year>
          ). Algorithm implementation/strings/levenshtein distance.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Hoang,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Y.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Deeppath: A reinforcement learning method for knowledge graph reasoning</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>06690</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>