Completion Reasoning Emulation for the Description Logic EL+ Aaron Eberhart, Monireh Ebrahimi, Lu Zhou, Cogan Shimizu, and Pascal Hitzler Data Semantics Lab Kansas State University, USA {aaroneberhart,monireh,luzhou,coganmshimizu,hitzler}@ksu.edu Abstract is also considerable research into path-finding in embed- ded triple data using recurrent neural networks (RNNs) or We present a new approach to integrating deep learning with knowledge-based systems that we believe shows promise. LSTMs.(Xiong, Hoang, and Wang 2017; Das et al. 2016; Our approach seeks to emulate reasoning structure, which can 2017) Memory networks also successfully operate over be inspected part-way through, rather than simply learning triple embeddings and are capable of transferring training reasoner answers, which is typical in many of the black-box to work over triple embeddings from completely different systems currently in use. We demonstrate that this idea is fea- vocabularies.(Ebrahimi et al. 2018) There is even an attempt sible by training a long short-term memory (LSTM) artificial to embed EL++ with TransE using a novel concept of n- neural network to learn EL+ reasoning patterns with two dif- balls, though it currently does not consider the RBox as well ferent data sets. We also show that this trained system is re- as certain EL++ axioms that do not translate into a triple sistant to noise by corrupting a percentage of the test data and format.(Kulmanov et al. 2019; Bordes et al. 2013) comparing the reasoner’s and LSTM’s predictions on corrupt data with correct answers. Discussion Many of the systems just mentioned are very successful, yet Introduction have a tendency to adopt familiar evaluation strategies from Machine learning and neural network techniques are often deep learning. Sure, it’s great to be able to get a 99% F1- contrasted with logic-based systems as opposite in many re- score. When we do not know the underlying structure of the gards. The challenge of integrating inductive learning strate- data that we are using, this is often the best we can hope for. gies with deductive logic has no easy or obvious solution. However, precision and recall do not make much sense in a Though there are doubtless many reasons for this, one major knowledge base evaluation. For an integrated system, some roadblock is that solutions and evaluations which work well new goals and evaluation strategies seem warranted that are for logic, or for deep learning and machine learning, do not neither completely deductive nor entirely empirical. work well in the opposite, and likely do not further the goal In a deductive reasoning system the semantics of the data of integration. In this paper we present a new approach that is known explicitly. Why then would we want to blindly al- embraces the liminality of this specific integration task. Our low such a system to teach itself what is important when approach is by its very nature ill-suited to either machine we already know what and how it should learn? Possibly we learning or logic alone. But it tries to avoid the pitfalls of don’t know the best ways to guide a complex network to the unintentionally favoring one paradigm over the other, aim- correct conclusions, but surely more, not less, transparency ing instead to grasp at something new in the space between. is needed for integrating logic and deep learning. Trans- parency often becomes difficult when we use extremely deep Related Work or complex networks that cannot be reduced to components. A popular approach for training neural networks to recog- It also makes things difficult when we pre-process our data nize logic is to use embedding techniques borrowed from to help the system train by doing embeddings or other dis- the field of natural language processing (NLP). For instance, tance adjustments. When we embed all the similar things Makni and Hendler designed a system that layers Resource close together and all the dissimilar things far apart the sys- Description Framework (RDF) graphs into adjacency matri- tem can appear to succeed, but we can’t tell if it was the em- ces and embedding them.(Makni and Hendler 2018) There bedding or the system itself or one of a dozen other things that might have caused this. Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel- In response to these and other concerns we have devel- mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- oped some research questions that we hope to address in this bining Machine Learning and Knowledge Engineering in Practice paper. Though we certainly will not be able to definitively (AAAI-MAKE 2020). Stanford University, Palo Alto, California, answer all of them, we do hope that by starting this inves- USA, March 23-25, 2020. Use permitted under Creative Commons tigation we can demonstrate future potential for this type of License Attribution 4.0 International (CC BY 4.0). inquiry. Research Questions what is happening, as well as indicate the future potential of 1. Can a neural network emulate reasoner behavior? this method. 2. Can we still achieve success without embeddings? Approach 3. Can learning happen with arbitrary numerical names? We present a method for learning the structure of EL+ rea- 4. Will intermediate answers help our evaluation? soning patterns using an LSTM that tests the questions posed 5. Is F1-score sufficient to evaluate an integrated logic and above. In our system we avoid, as much as possible, em- neural network system? If not, what would be? beddings or any distance adjustments that might help the system learn answers based on embedding similarity with- Question one is the primary focus of this project, and the one out first learning the reasoning structure. In fact, a primary to which all the others are subordinate. It is the intention of feature of our network is its lack of complexity. We take this paper and the prototype system we have implemented to a very simple logic, reason over knowledge bases in that demonstrate that this is a feasible goal. Whatever disagree- logic, and then extract supports from the reasoning steps, ments may arise concerning our later investigations, we will mapping the reasoner supports back to sets of the original argue that this is a necessary and plausible alternative course knowledge base axioms. This allows us to encode the in- of research. put data in terms of only knowledge base statements. It also The second question we have already mentioned in the provides an intermediate answer that might improve results last section. It is our intention to avoid any pre-processing when provided to the system. This logic data is fed into three that will allow the system to give correct answers without different LSTM architectures with identical input and output learning the reasoning patterns. In this way we can better dimensionalities. One architecture, which we call ”Deep”, verify whether we have truly solved the reasoning problem does not train with support data but has a hidden layer the and not some other easier task. same size as the supports we have defined. Another architec- The third question is related to the second, but imposes ture, called ”Piecewise”, trains two separate half-systems, a more broad restriction on what we will be allowed to do. one with knowledge bases and one with supports from the Not only do we want to avoid embedding the data in a way reasoner. The last system, called ”Flat”, simply learns to that will help the system to learn the answers. Also we must map inputs directly to outputs for each reasoning step. We take care when we go from logic to numerical data, and back will discuss in detail each part of the system in the follow- again, not to interfere with the latent semantics in the logic. ing sections, then provide some results. By avoiding distance comparisons in the naming convention for the input we ensure that the names are arbitrary in both EL+ systems. The only distances we intend to measure are be- tween predictions and answers. EL+ is a lightweight and highly tractable description logic. The fourth question is very important for our conception A typical reasoning task in EL+ is a sequential process of how future systems should work. In a logic-based system with a fixed endpoint, making it a perfect candidate for se- there is transparency at any level of reasoning. We cannot, quence learning. Unlike RDF, which reasons over instance of course, expect this in most neural networks. With a net- data in triple format, EL+ reasoning occurs on the predicate work that aims to emulate reasoner behavior, however, we level. Thus we will have to train the system to actually learn can impose a degree of intermediate structure. This inter- the reasoning patterns and logical structure of EL+ directly mediate structure allows us to inspect a network part-way from encoded knowledge bases. through and perform a sort of ”debugging” since we know exactly what it should have learned at that point. This is a Syntax The signature Σ for EL+ is defined as: crucial break from current thinking that advocates more and deeper opaque hidden layers in networks that improve accu- Σ = hNI , NC , NR i with NI , NC , NR pairwise disjoint. racy but detract from explainability. Inspection of intermedi- ate answers could indicate whether a proposed architecture NI is a set of individual names. is actually learning what we intend, which should aid devel- NC is a set of Concept names that includes >. opment of more correct systems. NR is a set of Role names. The fifth and final question is a difficult but necessary question, likely beyond the scope of this paper. For now, it is Expressions in EL+ use the following grammar: primarily a reminder that the method we choose intention- ally does not try to find best answers for individual state- R ::= NR ments. It tries to emulate reasoner behavior by matching the shape of the reasoning patterns. That is not to say we don’t C ::= NC | CuC | ∃R.C care about right answers, or that they don’t matter. Those values are reported for our system. And if reasoning struc- Semantics An interpretation I where I = (∆I , ·I ) maps ture is matched well enough then correct answers should fol- NI , NC , NR to elements, sets, and relations in ∆I with low. But we believe that the reasoning distance results we function ·I . For an interpretation I and C(i), R(i) ∈ Σ, the report are far more compelling and tell a better story about function ·I is defined in Table 1. Table 1: EL+ Semantics Description Expression Semantics Individual a a ∈ ∆I Top > ∆I Bottom ⊥ ∅ Concept C C I ⊆ ∆I Role R R I ⊆ ∆I × ∆ I Conjunction C uD C I ∩ DI Existential Restriction ∃R.C { a | there is b ∈ ∆ such that (a, b) ∈ RI and b ∈ C I } I Concept Subsumption CvD C I ⊆ DI Role Subsumption RvS RI ⊆ S I Role Chain R1 ◦ · · · ◦ Rn v R R1 ◦ · · · ◦ RnI ⊆ RI I with ◦ signifying standard binary composition Table 2: EL+ Completion Rules seed0 = C1 v C2 C2 v ∃R1.C3 C2 v C3 C1 v ∃R3.C4 (1) A v C CvD |= A v D C3 u C4 v C5 C2 v ∃R1.C3 C1 v ∃R1.C1 ∃R1.C2 v C4 (2) A v C1 A v C2 C1 u C2 v D |= A v D C1 v ∃R2.C3 R1 v R2 (3) A v C C v ∃R.D |= A v ∃R.D C2 v ∃R2.C3 R1 ◦ R3 v R4 (4) A v ∃R.B BvC ∃R.C v D |= A v D Which entails (5) A v ∃S.D SvR |= A v ∃R.D (6) A v ∃R1 .C C v ∃R2 .D R1 ◦ R2 v R |= A v ∃R.D Step 1: Step 2: C1 v C3 seed1 = C1 v C5 C1 v C4 Reasoning It is a well established result that any EL+ C1 v ∃R1.C3 knowledge base has a least fixed point that can be deter- C1 v ∃R2.C1 mined by repeatedly applying a finite set of axioms that pro- C1 v ∃R4.C4 duce all entailments of a desired type.(Besold et al. 2017; Kazakov, Krötzsch, and Simančı́k 2012) In other words, we can say that reasoning in EL+ often amounts to a se- Figure 1: First Iteration of Sequence quence of applications of a set of pattern-matching rules. One such set of rules, the set we have used in our exper- iment, is given in Table 2. The reasoning reaches comple- forced-lower-bound reasoning sequence with a connected tion when there are no new conclusions to be made. Because randomized knowledge base. This allows us to rapidly gen- people are usually interested most in concept inclusions and erate many normal semi-random EL+ knowledge bases of restrictions, those are the types of statements we choose to arbitrary reasoning difficulty. For this experiment we choose include in our reasoning. a moderate difficulty setting so that it can compare with non- After the reasoning has finished we are able to recursively synthetic data. An example of one iteration of the two-part define supports for each conclusion the reasoner reaches. sequence is provided in Figure 1. To ensure that the ran- The first step, of course, only has supports from the knowl- domized statements do not shortcut this pattern, the random edge base. After this step supports are determined by effec- statements are generated in a nearly disjoint space and con- tively running the reasoner in reverse, and replacing each nected only to the initial seed term. This ensures that at least statement that is not in the original knowledge base with a one element of the random space will also produce random superset that is, as you can see by the colored substitutions entailments for the duration of the sequence, possibly longer. in Table 3. When the reasoner proved the last statement it did Our procedure also guarantees that each completion rule will not consider all of the supports, since it had already proved be used at least once every iteration of the sequence so that them. It used the new facts it had learned in the last iteration all reasoning patterns can potentially be learned by the sys- but we have drawn their supports back out so that we can tem. define a fixed set of inputs from the knowledge base. SNOMED Synthetic Data We have also imported data from the SNOMED 2012 ontol- To provide sufficient training input to our system we provide ogy to ensure that our method is applicable to non-synthetic a synthetic generation procedure that combines a structured data. SNOMED is a widely-used, publicly available, ontol- Table 3: Support Generation New Fact Rule Support Step 1 C1 v C3 (1) C1 v C2,C2 v C3 C1 v C4 (4) C1 v C2,C1 v ∃R1.C1,∃R1.C2 v C4 C1 v ∃R1.C3 (3) C1 v C2,C2 v ∃R1.C3 C1 v ∃R2.C1 (5) C1 v ∃R1.C1,R1 v R2 C1 v ∃R4.C4 (6) C1 v ∃R1.C1,R1 ◦ R3 v R4,C1 v ∃R3.C4 Step 2 C1 v C5 (2) C3 u C4 v C5,C1 v C2,C2 v C3,C1 v C2,C1 v ∃R1.C1,∃R1.C2 v C4 ogy of medical terms and relationships.(De Silva et al. 2011) SNOMED 2012 has 39392 logical axioms, some of which are complex, but this can be normalized in constant time to a logically equivalent set of 124,428 axioms. It is expressed in EL++ , and newer versions utilize this complexity more liberally. But the 2012 version contains exactly one state- ment that is not in EL+ after normalization. We feel confi- dent that omitting > v ∃topPartOf.Self does not constitute a substantial loss in the ontology. Because SNOMED is so large, we sample it to obtain connected subsets that also produce a minimum number of reasoning steps. We require that the samples be connected because any normal knowledge base is connected, and it im- proves the chances that the statements will have entailments. Often, however, random connected samples do not entail anything, and this is also unusual in a normal ontology, so we require a minimum amount of reasoning activity in the samples as well. The reasoning task for SNOMED is more unbalanced than for the synthetic data. It is equally trivial for the reasoner to solve either knowledge base type. However, we observe that random connected sampling tends to favor Figure 2: Piecewise Architecture application of rules 3, 5, and 6 much more heavily than oth- ers so the system will have a more difficult time learning the overall reasoning patterns. This imbalance is likely an ar- tifact from SNOMED because it seems to recur in different sample sizes with different settings, though we acknowledge that it could be correlated somehow with the sample gener- ation procedure. LSTM LSTMs are type of recurrent neural network that work well with data that has long-term dependencies.(Hochreiter and Schmidhuber 1997) We choose to use an LSTM to learn the reasoning patterns of EL+ because LSTMs are designed to learn sequence patterns. The network is kept as simple as possible to achieve the outputs we require and maximize transparency. For the piecewise system, depicted in Figure 2, we use two separate flat single LSTMs that have the num- ber of neurons matching first the support shape, and later the output shape. The deep system shown in Figure 3 is constructed to match this shape so that the two will be com- parable, it simply stacks the LSTMs together so that it can train without supports. We also train a flat system, shown in Figure 3: Deep Architecture Figure 4, that has the same input and output shape but lacks the internal supports. We have done this three ways because we can compare the behavior of the systems with support Table 4: Translation Rules KB statement Vectorization CX v CY → [ 0.0, Xc , Yc , 0.0 ] CX u CY v CZ → [ Xc , Yc , Zc , 0.0 ] CX v ∃RY.CZ → [0.0, Xc , −Y Z r , c] ∃RX.CY v CZ → [ −X Y Z r , c , c , 0.0] RX v RY → [0.0, −X −Y r , r , 0.0] Figure 4: Flat Architecture RX ◦ RY v RZ → [ −X −Y −Z r , r , r , 0.0] c = Number of Possible Concept Names against the flat system as a baseline, and then see whether the r = Number of Possible Role Names piecewise training is helping. All designs treat the reasoning steps as the sequence to learn, so the number of LSTM cells is defined by the maximum reasoning sequence length. We to scaled floating point number. It can be reversed, with a train our LSTMs using regression on mean squared error to slight loss of precision due to integer conversion, quite di- minimize the distances between predicted answers and cor- rectly. The semantics of each encoded predicate name with rect answers. regards to its respective knowledge base is structural in re- lation to its position in the 4-tuple, just like it would be in Overall System an EL+ axiom. And its semantics is not at all related to the The individual parts of our system are not themselves novel. other non-equal predicate names within the same 4-tuple. Our result, however, lies rather in the unexpected success Knowledge Base Encoding In order to input knowledge of this unconventional general approach we have applied. bases into the LSTM we first apply the encoding defined One of the crucial factors in making this work is the way in the previous section to each of its statements. Then we in which we have translated the data from its logical format concatenate each of these encodings end-to-end to obtain a into something a neural network can understand. In the fol- much longer vector. For instance, lowing sections we will discuss how our system transforms data from a knowledge base format to an LSTM format and C1 v ∃R1.C2, R1 v R2 back again. might translate to Statement Encoding Every data set we use has a fixed [0.0, 0.5, −0.5, 1.0, 0.0, −0.5, −1.0, 0.0]. number of names that it can contain for both roles and con- cepts, and every concept or role has an arbitrary number This knowledge base vector is then copied the same number name identifier. For the synthetic data these numbers are of times as there are reasoning steps and stuck together once randomly generated. For SNOMED data, all of the labels again along a new dimension. For an experiment this is done are stripped and the concept and role numbers are shuffled hundreds or thousands of times, and these fill up the tensor around and substituted with random integers to remove any that is given to the LSTM to learn. Because some of the possibility of the names injecting a learning bias. These la- randomness can cause ragged data, any tensor dimensions bels are remembered for later retrieval for output but the rea- that are not the same length as the maximum are padded soner and LSTM do not see them. It can seem occasionally with zeros. like there is a bias in the naming if output for SNOMED data is inspected because all the labels seem related. This Results is merely a result of our sampling technique which requires Viewed in terms of the questions posed at the beginning of connected statements, so it is not a concern. this paper we feel that our results are solid, and they vali- Knowledge bases have a maximum number of role and date a reexamination of how we should approach integrating concept names that are used to scale all of the integer values logic and neural networks. We have attempted to create a for names into a range of [−1, 1]. To enforce disjointness of system that emulates a reasoning task rather than reasoning concepts and roles, we map all concepts to (0, 1], all roles to output. [−1, 0). Each of the six possible normalized axiom forms If we examine the example output from the synthetic data is encoded into a 4-tuple based on the logical encodings de- inputs in Table 5, it is clear that it is getting very close to fined in Table 4. many correct answers. When it misses, it still appears to We would like to emphasize the distinction between our be learning the shape, and this makes us optimistic about method, which we prefer to call an encoding, and a tradi- its future potential. The SNOMED predictions are much tional embedding. This encoding adds no additional infor- more dense and do not fit well into a table, but we have in- mation to the names beyond the transformation from integer cluded a few good examples with the original data labels Table 5: Example Synthetic Output Correct Answer Predicted Answer Step 0 C9 v C11 C8 v C9 C2 v C10 C1 v C9 C9 v C12 C8 v C9 C7 v C6 C8 v ∃R4.C9 C9 v ∃R4.C11 C1 v ∃R5.C9 C2 v ∃R4.C9 C8 v ∃R4.C9 Figure 5: Synthetic Training C9 v ∃R5.C9 C9 v ∃R5.C9 C2 v ∃R5.C11 C9 v ∃R7.C12 C2 v ∃R6.C12 Step 1 C9 v C13 C8 v C12 C2 v C11 C2 v C10 C2 v C12 C1 v C11 C2 v ∃R4.C11 C1 v ∃R3.C12 C2 v ∃R5.C9 C1 v ∃R4.C8 C2 v ∃R7.C12 Step 2 C2 v C13 C1 v C12 Figure 6: SNOMED Training translated into English sentences. Because there are so many near-misses we include edit distance evaluations that better capture the degree to which each predicted statement misses what it should have been. It is interesting to note that by comparing Table 7 with Table 6: Example SNOMED Outputs Table 8 we can see that on the much harder SNOMED data the deep system appears to have a better result because of Correct Answer C6 v ∃R4.C1 the higher F1 score, but the average edit distance, which is Meaning if something is a zone of superficial fascia, then our preferred alternative measure for evaluation, is not obvi- there is a subcutaneous tissue that it is PartOf ously correlated with the F1-score. This confirms that our Prediction C6 v ∃R4.C3 fifth research question was appropriate and that F1-score Meaning if something is a zone of superficial fascia, might not be sufficient to evaluate whether or not we can then there is a subcutaneous tissue of palmar learn the shape of reasoning patterns. area of little finger that it is PartOf A cause for the discrepancy may be the higher training Correct Answer C8 v ∃R3.C2 difficulty for reaching the completion versus reaching the Meaning if something is a infrapubic region of pelvis, supports in the SNOMED data, which you can see in Fig- then there is a perineum that it is PartOf ures 5 and 6. We can speculate on the degree to which vari- Prediction C9 v ∃R3.C2 ous factors are contributing to this, but importantly, the fact Meaning if something is a zone of integument, that this discussion is even possible in the first place is a con- then there is a perineum that it is PartOf firmation that we can answer the fourth research question in the affirmative. Another interesting result can be seen by examining the charts in Figures 7 - 12. As we have noticed before, the Deep architecture performs quite well on the synthetic data and is quite resistant to increased levels of noise. The flat sys- Table 7: Average Precision Recall and F1-score For each Distance Evaluation Atomic Levenshtein Distance Character Levenshtein Distance Predicate Distance Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Synthetic Data Piecewise Prediction 0.138663 0.142208 0.140412 0.138663 0.142208 0.140412 0.138646 0.141923 0.140264 Deep Prediction 0.154398 0.156056 0.155222 0.154398 0.156056 0.155222 0.154258 0.155736 0.154993 Flat Prediction 0.140410 0.142976 0.141681 0.140410 0.142976 0.141681 0.140375 0.142687 0.141521 Random Prediction 0.010951 0.0200518 0.014166 0.006833 0.012401 0.008811 0.004352 0.007908 0.007908 SNOMED Data Piecewise Prediction 0.010530 0.013554 0.011845 0.010530 0.013554 0.011845 0.010521 0.013554 0.011839 Deep Prediction 0.015983 0.0172811 0.016595 0.015983 0.017281 0.016595 0.015614 0.017281 0.016396 Flat Prediction 0.014414 0.018300 0.016112 0.0144140 0.018300 0.016112 0.013495 0.018300 0.015525 Random Prediction 0.002807 0.006803 0.003975 0.001433 0.003444 0.002023 0.001769 0.004281 0.002504 Table 8: Average Statement Edit Distances with Reasoner Atomic Levenshtein Distance Character Levenshtein Distance Predicate Distance From To Average From To Average From To Average Synthetic Data Piecewise Prediction 1.336599 1.687640 1.512119 1.533115 1.812006 1.672560 2.633427 4.587382 3.610404 Deep Prediction 1.256940 1.507150 1.382045 1.454787 1.559751 1.507269 2.504496 3.552074 3.028285 Flat Prediction 1.344946 1.584674 1.464810 1.586281 1.660409 1.623345 2.517655 3.739770 3.128713 Random Prediction 1.598016 1.906369 1.752192 1.970604 1.289533 1.630068 5.467918 10.57324 8.020583 SNOMED Data Piecewise Prediction 1.704931 2.686562 2.195746 2.016249 2.862737 2.439493 6.556592 5.857769 6.207181 Deep Prediction 1.759633 3.052080 2.405857 2.027190 3.328850 2.678020 4.577427 6.179389 5.378408 Flat Prediction 1.691738 2.769542 2.230640 1.948757 2.991328 2.470042 5.548226 6.665659 6.106942 Random Prediction 1.814656 3.599629 2.707143 2.094682 1.621700 1.858191 5.169093 12.392325 8.780709 tem gets similar results, but slightly worse. Surprisingly, the statement strings and computes their edit distance.(Wiki- piecewise model starts out mostly the same as the deep sys- books contributors 2019 accessed November 19 2019) How- tem on synthetic data but then very quickly falters. When we ever, because some names in the namespace are one digit examine the results for the harder SNOMED data, however, numbers and other names are two digit numbers, we include the opposite seems to be happening. The deep and flat sys- a modified version of this function, called ”Atomic”, that tems still track each other. But for the unbalanced data the uniformly substitutes all two digit numbers in the strings deep system crosses into the random range by 30-50% cor- with symbols that do not occur in either. Since there cannot ruption, while the piecewise system is able to skirt just above be more than eight unique numbers in two decoded strings random for the Character and Predicate distance functions there are no issues with finding enough new symbols. By until around 80-90%. doing the substitutions we can see the impact that the num- ber digits were having on the edits from the Atomic Lev- Methodology enshtein distance. Finally we devise a distance function that Our system is trained using randomized 10-fold cross valida- is based on our encoding scheme. The Predicate Distance tion to 20000 epochs on the deep and flat systems and 10000 method disassembles each string into only its predicates. epochs each for the parts of the piecewise system at a learn- Then, for each position in the 4-tuple, a distance is calcu- ing rate of 0.0001. The results from every fold are saved and lated that yields zero for perfect matches, absolute value of we report the average of all of the runs. Often there are better guessed number - actual number for correct Class and Role and worse runs in any given 10-fold validation but the aver- guesses, and guessed number + actual number for incorrect ages across multiple cross-validations with the same settings Class and Role matches. So, for instance, guessing C1 when remain basically the same. 10-fold cross-validations are per- the answer is C2 will yield a Predicate Distance of 1, while formed with an extra comparison against data that has been a guess of R2 for a correct answer of C15 will yield 17. corrupted at a fixed rate, starting with zero percent probabil- Though this method is specific to our unique encoding, we ity of corruption and moving upwards by ten percent each believe it detects good and bad results quite well because time. The data in Tables 7 and 8 is from the comparison with perfect hits are 0, close misses are penalized a little, and no corruption, so the error data is not reported (it is identical large misses are penalized a lot. with reasoner answers). For each method we take every statement in a knowledge To perform our evaluations we use three unique distance base completion and compare it with the best match in the measurements. We have a naive “Character” Levenshtein reasoner answer, random answers, and corrupted knowledge distance function that takes two unaltered knowledge base base answers. While we compute these distances we are able to obtain precision, recall, and F1-score by counting the the GiB DDR4, and a GeForce GTX 1060 6GB/PCIe/SSE2. number of times the distance returns zero and treating the Source code and experiment data is available on GitHub statement predictions as classifications. Each time the sys- https://github.com/aaronEberhart/PySynGenReas. tem runs it can make any number of predictions, from zero to the maximum size of the output tensor. This means that, Acknowledgements This work was partially supported by although the predictions and reasoner are usually around the the Air Force Office of Scientific Research under award same size and those comparisons are fine, we have to gen- number FA9550-18-1-0386, and by the National Science erate random data to compare against that is as big as could Foundation under award number 1936677. conceivably be needed by the system. Any artificial shaping decisions we made to compensate for the variations between References runs would invariably introduce their own bias in how we selected them. Thus the need to use the biggest possible ran- Besold, T. R.; d’Avila Garcez, A. S.; Bader, S.; Bowman, dom data to compare against means the precision, recall, and H.; Domingos, P. M.; Hitzler, P.; Kühnberger, K.; Lamb, F1-score for random are low. L. C.; Lowd, D.; Lima, P. M. V.; de Penning, L.; Pinkas, G.; Poon, H.; and Zaverucha, G. 2017. Neural-symbolic Conclusion learning and reasoning: A survey and interpretation. CoRR abs/1711.03902. In this experiment demonstrate that EL+ reasoning can be emulated with an LSTM. Although our example is merely Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; and a prototype, we can definitively say yes to the first research Yakhnenko, O. 2013. Translating embeddings for model- question, a neural network can be trained to emulate com- ing multi-relational data. In Advances in neural information pletion reasoning behavior. And because we have answered processing systems, 2787–2795. this first research question without using name adjustments Das, R.; Neelakantan, A.; Belanger, D.; and McCallum, A. or embeddings, so we can also answer yes to the second and 2016. Chains of reasoning over entities, relations, and text third questions. We have already discussed questions four using recurrent neural networks. CoRR abs/1607.01426. and five. Das, R.; Dhuliawala, S.; Zaheer, M.; Vilnis, L.; Durugkar, I.; Krishnamurthy, A.; Smola, A.; and McCallum, A. 2017. Go Future Work for a walk and arrive at the answer: Reasoning over paths There are a number of potential improvements that could in knowledge bases using reinforcement learning. arXiv build on the work presented here. A few of the choices we preprint arXiv:1711.05851. made might have had an impact of the results of the study De Silva, T. S.; MacDonald, D.; Paterson, G.; Sikdar, K. C.; and we are curious if alternate strategies could improve the and Cochrane, B. 2011. Systematized nomenclature of way that the system learns reasoning patterns. We might medicine clinical terms (snomed ct) to represent computed have chosen a 3-tuple encoding pattern that was more dense tomography procedures. Comput. Methods Prog. Biomed. but did not mirror the shape of the statements symmetrically. 101(3):324–329. We also considered adding the statements as an additional dimension rather than concatenating them all together. This Ebrahimi, M.; Sarker, M. K.; Bianchi, F.; Xie, N.; Doran, D.; may have made it easier for the system to distinguish indi- and Hitzler, P. 2018. Reasoning over rdf knowledge bases vidual statements but would also add a lot of complexity in using deep learning. arXiv preprint arXiv:1811.04132. the extra dimension. We experimented with different neu- Hochreiter, S., and Schmidhuber, J. 1997. Lstm can solve ron types like gated recurrent unit (GRU) and basic RNN, hard long time lag problems. In Advances in neural infor- though these had no significant impact on the results. The mation processing systems, 473–479. type of neuron seems to have less effect than the overall Kazakov, Y.; Krötzsch, M.; and Simančı́k, F. 2012. Elk: a structure of the network. It would be interesting to see if reasoner for owl el ontologies. System Description. different ways of segmenting a network, besides just along Kulmanov, M.; Liu-Wei, W.; Yan, Y.; and Hoehndorf, the supports as we have defined them, could improve results. R. 2019. El embeddings: Geometric construction of Along with these ideas to improve our current system, we models for the description logic el++. arXiv preprint also speculate that with some effort we should be able to arXiv:1902.10499. accomplish basically the same type of thing with a more ex- pressive logic and a new encoding. EL++ for instance would Makni, B., and Hendler, J. 2018. Deep Learning for Noise- be an easy first attempt since it is so similar to EL+ . Though tolerant RDFS Reasoning. Ph.D. Dissertation, Rensselaer we should be able to adapt this strategy for any logic that Polytechnic Institute. uses completion rules for reasoning. Whatever we choose to Wikibooks contributors. 2019 (accessed November 19, do with it, we are optimistic that we can use this strategy 2019). Algorithm implementation/strings/levenshtein dis- to advance understanding of how logic and neural networks tance. can work together. Xiong, W.; Hoang, T.; and Wang, W. Y. 2017. Deeppath: A reinforcement learning method for knowledge graph rea- Testing Environment soning. arXiv preprint arXiv:1707.06690. All testing was done on a computer running Ubuntu 19.10 64-bit with an Intel Core i7-9700K CPU@3.60GHz x 8, 47.1 (a) Synthetic Data Piecewise Architecture (b) Synthetic Data Deep Architecture (c) Synthetic Data Flat Architecture (d) SNOMED Data Piecewise Architecture (e) SNOMED Data Deep Architecture (f) SNOMED Data Flat Architecture Figure 7: Character Levenshtein Distance (a) Synthetic Data Piecewise Architecture (b) Synthetic Data Deep Architecture (c) Synthetic Data Flat Architecture (d) SNOMED Data Piecewise Architecture (e) SNOMED Data Deep Architecture (f) SNOMED Data Flat Architecture Figure 8: Character Levenshtein Distance Precision, Recall, and F1-score (a) Synthetic Data Piecewise Architecture (b) Synthetic Data Deep Architecture (c) Synthetic Data Flat Architecture (d) SNOMED Data Piecewise Architecture (e) SNOMED Data Deep Architecture (f) SNOMED Data Flat Architecture Figure 9: Atomic Levenshtein Distance (a) Synthetic Data Piecewise Architecture (b) Synthetic Data Deep Architecture (c) Synthetic Data Flat Architecture (d) SNOMED Data Piecewise Architecture (e) SNOMED Data Deep Architecture (f) SNOMED Data Flat Architecture Figure 10: Atomic Levenshtein Distance Precision, Recall, and F1-score (a) Synthetic Data Piecewise Architecture (b) Synthetic Data Deep Architecture (c) Synthetic Data Flat Architecture (d) SNOMED Data Piecewise Architecture (e) SNOMED Data Deep Architecture (f) SNOMED Data Flat Architecture Figure 11: Predicate Distance (a) Synthetic Data Piecewise Architecture (b) Synthetic Data Deep Architecture (c) Synthetic Data Flat Architecture (d) SNOMED Data Piecewise Architecture (e) SNOMED Data Deep Architecture (f) SNOMED Data Flat Architecture Figure 12: Predicate Distance Precision, Recall, and F1-score