<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Neuro-Symbolic Deductive Reasoning for Cross-Knowledge Graph Entailment</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Monireh</forename><surname>Ebrahimi</surname></persName>
							<email>monireh@ksu.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">Kansas State University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><roleName>Md</roleName><forename type="first">Kamruzzaman</forename><surname>Sarker</surname></persName>
							<email>mdkamruzzamansarker@ksu.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">Kansas State University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Federico</forename><surname>Bianchi</surname></persName>
							<email>f.bianchi@unibocconi.it</email>
							<affiliation key="aff1">
								<orgName type="institution">Bocconi University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ning</forename><surname>Xie</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Department of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">Wright State University d Bosch Research &amp; Technology Center</orgName>
								<address>
									<country>North America</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Aaron</forename><surname>Eberhart</surname></persName>
							<email>aaroneberhart@ksu.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">Kansas State University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Derek</forename><surname>Doran</surname></persName>
							<email>derek.doran@wright.edu</email>
							<affiliation key="aff2">
								<orgName type="department">Department of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">Wright State University d Bosch Research &amp; Technology Center</orgName>
								<address>
									<country>North America</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hyeongsik</forename><surname>Kim</surname></persName>
							<email>hyeongsik.kim@us.bosch.com</email>
						</author>
						<author>
							<persName><forename type="first">Pascal</forename><surname>Hitzler</surname></persName>
							<email>hitzler@ksu.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science &amp; Engineering</orgName>
								<orgName type="institution">Kansas State University</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution">Stanford University</orgName>
								<address>
									<addrLine>March 22-24</addrLine>
									<postCode>2021</postCode>
									<settlement>Palo Alto</settlement>
									<region>California</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Neuro-Symbolic Deductive Reasoning for Cross-Knowledge Graph Entailment</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">46A43F4E36EA4911B383B1ADA427EE46</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T08:16+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>deep learning</term>
					<term>deductive reasoning</term>
					<term>knowledge graph entailment</term>
					<term>neuro-symbolic</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A significant and recent development in neural-symbolic learning are deep neural networks that can reason over symbolic knowledge graphs (KGs). A particular task of interest is KG entailment, which is to infer the set of all facts that are a logical consequence of current and potential facts of a KG. Initial neural-symbolic systems that can deduce the entailment of a KG have been presented, but they are limited: current systems learn fact relations and entailment patterns specific to a particular KG and hence do not truly generalize, and must be retrained for each KG they are tasked with entailing. We propose a neural-symbolic system to address this limitation in this paper. It is designed as a differentiable end-to-end deep memory network that learns over abstract, generic symbols to discover entailment patterns common to any reasoning task. A key component of the system is a simple but highly effective normalization process for continuous representation learning of KG entities within memory networks. Our results show how the model, trained over a set of KGs, can effectively entail facts from KGs excluded from the training, even when the vocabulary or the domain of test KGs is completely different from the training KGs.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>For many years, reasoning has been tackled as the task of building systems capable of inferring new crisp symbolic logical rules. However, those traditional methods are too brittle to be applied to noisy automatically created KGs. <ref type="bibr" target="#b0">[1]</ref> provides a taxonomy of noise types in web KGs with respect to its effects on reasoning and shows the detrimental impact of noises on the result of the traditional reasoners. With the recent revival of interest in artificial neural networks, the more robust neural link prediction models have been applied vastly for the completion of KGs. These methods <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref> heavily rely on the subsymbolic representation of entities and relations learned through maximization of a scoring objective function over valid factual triples. Thus, the success of such models hinges primarily on the power of those subsymbolic representations in encoding the similarity/relatedness of entities and relations. Recent attempts have focused on neural multi-hop reasoners <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> to equip the model to deal with more complex reasoning where multi-hop inference is required. More recently, a Neural Theorem Prover <ref type="bibr" target="#b8">[9]</ref> has been proposed in an attempt to take advantage of both symbolic and sub-symbolic reasoning.</p><p>Despite their success, the main restriction common to machine learning-based reasoners is that they are unable to recognize and generalize to different domains or tasks. This inherent limitation follows from both the representations used and the learning process. The major issue comes from the mere reliance of these models on the representation of entities learned during the training or in the pre-training phase stored in a lookup table. Consequently, these models usually have difficulty to deal with out-of-vocabulary (OOV) entities. Although the OOV problem has been addressed in part in the natural language processing (NLP) domain by taking advantage of character-level embedding <ref type="bibr" target="#b9">[10]</ref>, subword units <ref type="bibr" target="#b10">[11]</ref>, Byte-Pair-Encoding <ref type="bibr" target="#b11">[12]</ref>, learning embeddings on the fly by leveraging text descriptions or spelling <ref type="bibr" target="#b12">[13]</ref>, copy mechanism <ref type="bibr" target="#b13">[14]</ref> or pointer networks <ref type="bibr" target="#b14">[15]</ref>, still these solutions are insufficient for transferring purposes for reasoning. <ref type="bibr" target="#b15">[16]</ref> shows that the success of natural language inference (NLI) methods is heavily benchmark specific. An even greater source of concern is that reasoning in most of the above sub-symbolic approaches hinges more on the notion of similarity and geometric-based proximity of real-valued vectors (induction) as opposed to performing transitive reasoning (deduction) over them. Nevertheless, recent years have seen some progress in zero-shot relation learning in sub-symbolic reasoning domain <ref type="bibr" target="#b16">[17]</ref>. Zero-shot learning refers to the ability of the model to infer new relations where that relation has not been seen before in training set <ref type="bibr" target="#b17">[18]</ref>. This generalization capability is still quite limited and fundamentally different from our work in terms of both methodology and purpose.</p><p>Inspired by these observations, we take a different approach in this work by investigating the emulation of deductive symbolic reasoning using memory networks. Memory networks <ref type="bibr" target="#b18">[19]</ref> are a class of learning models capable of conducting multiple computational steps over an explicit memory component before returning an answer. Their sequential nature corresponds, conceptually, to the sequential process underlying some deductive reasoning algorithms. The attention modeling corresponds to pulling only relevant information (logical axioms) necessary for the next reasoning step. Besides, as attention can be traced over the run of a memory network, we will furthermore get insights into the "reasoning" underlying the network output.</p><p>This paper contributes a recipe involving a simple but effective KG triple normalization before learning their representation within an end-to-end memory network. To perform logical inference in more abstract level, and thereby facilitating the transfer of reasoning expertise from one KG to another, the normalization maps entities and predicates in a KG to a generic vocabulary. Facts in additional KGs are normalized using the same vocabulary, so that the network does not learn to overfit its learning to entity and predicate names in a specific KG. This emulates symbolic reasoning by neural embeddings as the actual names (as strings) of entities from the underlying logic such as variables, constants, functions, and predicates are insub-stantial for logical entailment in the sense that a consistent renaming across a theory does not change the set of entailed formulas (under the same renaming). Thanks to the term-agnostic feature of our representation, we are able to create a reasoning system capable of performing reasoning over an unseen set of vocabularies in the test phase.</p><p>Our contributions are threefold: (i) We present the construction of memory networks for emulating the symbolic deductive reasoning. (ii) We propose an optimization to this architecture using normalization approach to enhance their transfer capability. We show that in an unnormalized setting, they fail to perform well across KGs. (iii) We examine the efficacy of our model for cross-domain and cross-KG deductive reasoning. We also show the scalability of our model (in terms of reduced time and space complexity) for large datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related Work</head><p>On the issue of doing logical reasoning using deep networks, we mention the following selected recent contributions: Tensor-based approaches have been used <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b20">21]</ref>, following <ref type="bibr" target="#b2">[3]</ref>. However, approaches are restricted in terms of logical expressivity and/or to toy examples <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b19">20]</ref>. <ref type="bibr" target="#b0">[1]</ref> perform Resource Description Framework (RDF) reasoning based on KG embeddings. <ref type="bibr" target="#b22">[23]</ref> considers OWL RL reasoning <ref type="bibr" target="#b23">[24]</ref>. There is a fundamental difference between these contributions and our approach, though: We train our model once, and then the model transfers to all other RDF KGs with good performance. In the above mentioned publications, training is either done on (a part of the) KG which is also used for evaluation, or training is explicitly done on similar KGs, in terms of topic. More precisely, in case of <ref type="bibr" target="#b22">[23]</ref>, it requires re-training for obtaining embeddings for new vocabularies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Problem Formulation</head><p>To explain what we are setting out to do, let us first re-frame the deductive reasoning problem as a classification task. Any given logic  comes with an entailment relation ⊧  ⊆ 𝑇  × 𝐹  , where 𝐹  is a subset of the set of all logical formulas (or axioms) over , and 𝑇  is the set of all theories (or sets of logical formulas) over . If 𝑇 ⊧ 𝐹  , then we say that 𝐹 is entailed by 𝑇 . Re-framed as a classification task, we can ask whether a given pair (𝑇 , 𝐹 ) ∈ 𝑇  × 𝐹  should be classified as a valid entailment (i.e., 𝑇 ⊧  𝐹 ) holds, or as the opposite (i.e., 𝑇 ̸ ⊧  𝐹 ). We would like to train a model on sets of examples (𝑇 , 𝐹 ), such that it learns to correctly classify them as valid or invalid inferences.</p><p>We wish to train a neural model that will learn to reason over one set of theories, and can then transfer that learning to new theories over the same logic. This way, our results will demonstrate that the reasoning principles (entailment under the model-theoretic semantics) that underlie the logic have been learned. If we were to train a model such that it learns only to reason over one theory, or a few very similar theories, then that could hardly be demonstrated. One of the key obstacles we face with our task is to understand how to represent training and test data so that they can be used in deep learning settings. To use standard deep learning approaches, formulas -or even theories -will have to be represented over the real coordinate space 𝑅 as vectors, matrices or tensors. Many embeddings for RDF (i.e., KGs) have been proposed <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b25">26</ref>], but we are not aware of an existing embedding that captures what seems important for the deductive reasoning scenario. Indeed, the prominent use case explored for KG embeddings is not deductive in nature; rather, it concerns the problem of the discovery or suggestion of additional links or edges in the graph, together with appropriate edge labels. In this link discovery setting, the actual labels for nodes or edges in the graph, and as such their commonsense meanings, are likely important, and most existing embeddings reflect this. However, for deductive reasoning the names of entities are insubstantial and should not be captured by an embedding. Another inherent problem in the use of such representations across KGs is the OOV problem. While a word lookup table can be initialized with vectors in an unsupervised task or during training of the reasoner, it still cannot generate vector representations for unseen terms. It is further impractical to store the vectors of all words when vocabulary size is huge <ref type="bibr" target="#b9">[10]</ref>. Similarly, memory networks usually rely on word-level embedding lookup tables, i.e., learned with the underlying rationale that words that occur in similar supervised scenarios should be represented by similar vectors in the real coordinate space. That is why they are known to have difficulties dealing with OOV, as a word lookup table cannot provide a representation for the unseen, and thus cannot be applied to NLI over new words <ref type="bibr" target="#b12">[13]</ref>, and for us this would pose a challenge in the transfer to new KGs.</p><p>We thus need embeddings that are agnostic to the terms (i.e., strings) used as primitives in the KG. To build such an embedding, we use syntactic normalization: a renaming of primitives from the logical language (variables, constants, functions, predicates) to a set of predefined entity names that are used across different normalized theories. By randomly assigning the mapping for the renaming, the network's learning will be based on the structural information within the theories, and not on the actual names of the primitives. Note that this normalization does not only play the role of "forgetting" irrelevant label names, but also makes it possible to transfer learning from one KG to the other. Indeed, for the approach to work, the network should be trained with many KGs, and then subsequently tested on completely new ones which had not been encountered during training. Our results show that our simple but very effective normalization yields a word-agnostic system capable of deductive reasoning over previouslyunseen RDF KGs containing new vocabulary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Model Architecture</head><p>We consider a model architecture that adapts the end-to-end memory network proposed by <ref type="bibr" target="#b18">[19]</ref> with fundamental alterations necessary for abstract reasoning. A high-level view of our model is shown in Figure <ref type="figure" target="#fig_0">1</ref>. It takes a discrete set 𝐺 of normalized RDF statements (called triples) 𝑡 1 , ..., 𝑡 𝑛 that are stored in memory, a query 𝑞, and outputs a "yes" or "no" answer to determine if 𝑞 is entailed by 𝐺. Each of the normalized 𝑡 𝑖 and 𝑞 contains symbols coming from a general dictionary with 𝑉 normalized words shared among all of the normalized RDF theories in both training and test sets. The model writes all triples to the memory and then calculates a continuous embedding for 𝐺 and 𝑞. Through multiple hop attention over those continuous representations, the model then classifies the query. The model is trained by back-propagation of error from output to the input through multiple memory accesses. We discuss components of the architecture in more detail below. Model Description The model is augmented with an external memory component that stores the embeddings of the normalized triples in our KG. This memory is defined as an 𝑛 × 𝑑 tensor where 𝑛 denotes the number of triples in the KG and 𝑑 is the dimensionality of the embeddings. The KG is stored in the memory vectors from two continuous representations of 𝑚 𝑖 and 𝑐 𝑖 obtained from two input and output embedding matrices of A and C with size 𝑑 × 𝑉 where 𝑉 is the size of vocabulary. Similarly, the query 𝑞 is embedded via a matrix 𝐵 to obtain an internal state 𝑢. In each reasoning step, those memory slots useful for finding the correct answers should have their contents retrieved. To enable this, we use an attention mechanism for 𝑞 over memory input representations by taking an internal product followed by a softmax:</p><formula xml:id="formula_0">𝑝 𝑖 = Softmax(𝑢 𝑇 (𝑚 𝑖 ))<label>(1)</label></formula><p>where Softmax(𝑎 𝑖 ) = 𝑒 (𝑎 𝑖 ) ∑ 𝑗 𝑒 (𝑎 𝑗 ) .</p><p>Equation (1) calculates a probability vector 𝑝 over the memory inputs, the output vector 𝑜 is then computed as the weighted sum of the transformed memory contents 𝑐 𝑖 with respect to their corresponding probabilities 𝑝 𝑖 by 𝑜 = ∑ 𝑖 𝑝 𝑖 𝑐 𝑖 . This describes the computation within a single hop. The internal state of the query vector updates for the next hop as 𝑢 𝑘+1 = 𝑢 𝑘 + 𝑜 𝑘 .</p><p>The process repeats 𝐾 times where 𝐾 is the number of computational hops. The output of the 𝐾 𝑡ℎ hop is used to predict the label 𝑎 ̂by passing 𝑜 𝐾 and 𝑢 𝐾 through a weight matrix of size 𝑉 × 𝑑 and a softmax:</p><formula xml:id="formula_1">𝑎 ̂= Softmax(𝑊 (𝑢 𝐾 +1 )) = Softmax(𝑊 (𝑢 𝑘 + 𝑜 𝑘 )).</formula><p>Figure <ref type="figure" target="#fig_0">1</ref> shows the model for 𝐾 = 1 (1 hop). The learning parameters are the matrices 𝐴, 𝐵, 𝐶, and 𝑊 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Memory Content</head><p>There is a plethora of logics which could be used for our investigation.</p><p>Here we use RDF. The RDF <ref type="bibr" target="#b26">[27]</ref> is an established and widely used W3C standard for expressing</p><p>KGs. An RDF KG is a collection of statements stored as triples (𝑒1, 𝑟, 𝑒2) where 𝑒1 and 𝑒2 are called subject and object, respectively, while 𝑟 is a relation binding 𝑒1 and 𝑒2 together. Statements can constitute base facts (logically speaking, in this case 𝑒1 and 𝑒2 would be constants, and 𝑟 a binary predicate) or simple logical axioms (e.g., 𝑒1 and 𝑒2 could identify unary prediates or classes, and 𝑟 would be class subsumption or material implication). Every entity in an RDF KG is represented by a unique Universal Resource Identifier (URI). We normalize these triples by systematically renaming all URIs which are not in the RDF/RDFS (Schema) namespaces as discussed previously. Each such URI is mapped to a set of arbitrary strings in a predefined set  = {𝑎 1 , ..., 𝑎 𝑛 }, where 𝑛 is taken as a training hyper-parameter giving an upper bound for the largest number of entities in a KG the system will be able to handle. Note that URIs in the RDF/RDFS namespaces are not renamed, as they are important for the deductive reasoning according to the RDF model-theoretic semantics. Consequently, each normalized RDF KG will be a collection of facts stored as triples {(𝑎 𝑖 , 𝑎 𝑗 , 𝑎 𝑘 )}.</p><p>It is important to note that each symbol is mapped into an element of  regardless of its position in the triple, and whether it is a subject or an object or a predicate. Yet the position of an element within a triple is an important feature to consider. Thus we employ a positional encoding(PE) <ref type="bibr" target="#b18">[19]</ref> to encode the position of each element within the triple. Let 𝑗th element of 𝑖th triple be 𝑡 𝑖,𝑗 . This gives us memory vector representation of each triple as 𝑚 𝑖 = ∑ 𝑗 𝑙 𝑗 •𝑡 𝑖,𝑗 , where • is the Hadamard (element-wise) product and 𝑙 𝑗 is a column vector with the structure 𝑙 𝑘,𝑗 = (1 − 𝑗/3) − (𝑘(1 − 2𝑗/3)/𝑑) (assuming 1-based indexing), where 𝑑 is the size of the embedding vector in the memory embedding matrix and the 3 in the denominator corresponds to the number of elements in an RDF triplet. Each memory slot thus represents the positional-weighted summation of each triplet. The positional encoding ensures that the order of the elements now affects the encoding of each memory slot.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation</head><p>The RDF semantics standard specification <ref type="bibr" target="#b27">[28]</ref> describes a prodecural semantics based on 13 completion rules, which can be used to algorithmically compute logical consequences. The completion of an RDF KG is in general infinite because, by definition, there is an infinite set of facts (related to RDF-encodings of lists) which are always entailed -however for practical reasons, and as recommended in the standard specification, only certain finite subsets are computed as completions of RDF KGs, and we do the same. Dataset There are many RDF KGs available on the World Wide Web that can be used to create our own dataset. For this purpose, we have collected RDF datasets from the Linked Data Cloud<ref type="foot" target="#foot_0">1</ref> and the Data Hub<ref type="foot" target="#foot_1">2</ref> to create our datasets. <ref type="foot" target="#foot_2">3</ref> Our training set (which by coincidence was based on RDF data conforming also to the OWL standard <ref type="bibr" target="#b23">[24]</ref> and which we call an "OWLcentric" dataset) is comprised of a set of RDF KGs each of size 1,000 triples, sampled from populating around 20 OWL ontologies with different data. In order to test our model's ability to generalize to completely different datasets, we have collected another dataset which we call the OWL-Centric Test Set. Furthermore, to assure our evaluation represents real-world RDF data completely independent of the training data, we have used almost all RDF KGs listed in a recent RDF quality survey <ref type="bibr" target="#b28">[29]</ref>; we call this the Linked Data test set. Further, to test the limitations of our model on artificially difficult data, we have created a small synthetic dataset which requires long reasoning chains if done with a symbolic reasoner.</p><p>For each KG we have created the finite set of inferred triples using the Apache Jena<ref type="foot" target="#foot_3">4</ref> API. These inferred triples comprise our positive class instances. For generating invalid instances we used the following two methods. In the first, we generated non-inferred triples by random permutation of triple entities removing those triples which were entailed. In the second scenario, which serves as our final quality check for not including trivially invalid triples in our dataset, we created invalid instances using the rdf:type predicate. More specifically, for each valid triple in the dataset, we replaced one of the elements (chosen randomly), with another random element which qualifies for being placed in that triple based on its rdf:type relationships. The datasets created by this strategy are marked with superscript "a" in Table <ref type="table">1</ref>.</p><p>Training Details Trainings were done over 10 epochs using the Adam optimizer with a learning rate of 𝜂 = 0.005, a learning rate decay of 𝜂/2, and a batch size of 100 over triples. For the final batches of queries for each KG, we have used zero-padding to the maximum batch size of 100. The capacity of the external memory is 1,000 which is also the maximum size of our KGs. We used a linear starting of 1 epoch where we have removed the softmax from each memory layer except for the final layer. L2 norm clipping of max 40 was applied to the gradient. The memory input/output embeddings are vectors of size 20. The embedding matrices of A, B, and C therefore are of size |𝑉 | × 𝑑 = 3, 033 × 20, where 3,033 is the size of the normalized generic vocabulary plus RDF(S) namespace vocabulary. Unless otherwise mentioned, we have used 𝐾 = 10. Adjacent weight sharing was used where the output embedding of one layer is the input embedding of the next one, as in 𝐴 𝑘+1 = 𝐶. Similarly, the answer prediction weight matrix 𝑊 gets copied to the final output embedding 𝐶 𝐾 and query embedding is equal to the first layer input embedding as in 𝐵 = 𝐴 1 . All the weights are initialized by a Gaussian distribution with 𝜇 = 0 and 𝜎 = 0.1. We would like to emphasize again that one and the same trained model was used in the evaluation over different test sets. We did not retrain, e.g., on Linked Data for the Linked Data test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Quantitative Results</head><p>We now present and discuss our evaluation results. Our evaluation metrics are average of precision and recall and f-measure over all the KGs in the test set, obtained for both valid and invalid sets of triples. We also report the recall for the class of negatives (specificity) to interpret the result more carefully by counting the number of true negatives. Additionally, as mentioned earlier, we have done zero-padding for each batch of queries with size less than 100. This implies the need for introducing another class label for such zero paddings both in the training and test phase. We have not considered the zero-padding class in the calculation of precision, recall and f-measure. Through our evaluations, however, we have observed some missclassifications from/to this class. Thus, we report accuracy as well.</p><p>To the best of our knowledge there is no architecture capable of conducting deductive reasoning on completely unseen RDF KGs. In addition, NTP and LTNs appear to have severe scalability issues, which means we cannot compare them to our system at scale. Neighbourhood approximated Neural Theorem Provers <ref type="bibr" target="#b29">[30]</ref> heavily rely on entity embeddings, making it unsuitable for our goal, as discussed. That is why we have considered the non-normalized embedding version of our memory network as a baseline. Similarly, Graph-to-Graph learning architecture <ref type="bibr" target="#b0">[1]</ref> is ontology-specific model. In fact, after training such model on one domain, you need to adapt the model hyper-parameters for another one and start the training from scratch on a different width model. Beside that, the Graph-to-Graph model is not scalable to large ontologies like DBpedia; instead it restricts the vocabulary to small restricted domain datasets. These inherent limitations for cross-ontology adaptation and the generative nature of the model (as opposed to classification in our setup) makes the comparison impossible.</p><p>Our technique shows a significant advantage over the baseline as shown in Table <ref type="table">1</ref>. A further even more important benefit of using our normalization model is its training time. In fact, this considerable time complexity difference is the result of the remarkable size difference of embedding matrices in the original and normalized cases. For instance, the size of embedding matrices to be learned by our algorithm for the normalized OWL-Centric dataset is 3, 033×20 as opposed to 811, 261 × 20 for the non-normalized one (and 1, 974, 062 × 20 for Linked Data which is prohibitively big). That has caused a remarkably high decrease in training time and space complexity and consequently has helped the scalability of our memory networks. In case of the OWL-Centric dataset, for instance, the space required for saving the normalized model is 80 times less than the intact model (≈ 4𝐺𝐵 after compression). Nevertheless, the normalized model is almost 40 times faster to train than the non-normalized one for this dataset. Our normalized model trained for just a day on OWL-Centric data but achieves better accuracy, whereas it trained on the same non-normalized dataset more than a week on a 12-core machine. Hence, the importance of using normalization cannot be emphasized enough.</p><p>To further get an idea of how our model performs on different data sources, we have applied our approach on multiple datasets with various characteristics. The result across all variations are given in Table <ref type="table">1</ref>. From this Table we can see that, apart from our strikingly good performance compared to the baseline, there are number of other interesting points: Our model gets even better results on the Linked Data task while it has trained on the OWL-Centric dataset. We hypothesize that this may be due to a generally simpler structure of Linked Data, but validating this will need further research.</p><p>The large portion of our few false negative instances come from the inability of our model to infer that all classes are subclass of themselves. Another interesting observation is the poor performance of our algorithm when it has trained on the OWL-Centric dataset and tested on a tricky version of the Linked Data. In that case our model has classified most of the triples to the "yes" class and this has led to low specificity (recall for "no" class) of 16%. This seems inevitable because in this case the negative instances bear close resemblance to the positives ones, making differentiation more challenging. However, training the model on the tricky OWL-Centric dataset has improved that by a substantial margin (more than three times). In case of our particularly challenging synthetic data, performance is not as good, and this may be due to the very different nature of this dataset which would require much longer reasoning chains than the non-synthetic data. Our training so far has only been done on real-world datasets; it may </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Experimental results of proposed model be interesting to more closely investigate our approach when trained on synthetic data, but that was not the purpose of our study.</p><p>It appears natural to analyze the reasoning depth acquired by our network. We conjecture that reasoning depth acquired by the network will correspond both to (1) the number of layers in the deep network, and (2) the ratio of deep versus shallow reasoning required to perform the deductive reasoning. Forward-chaining reasoners iteratively apply inference rules in order to derive new entailed facts. In subsequent iterations, the previously derived facts need to be taken into account. To gain a first understanding of what our model has learned in this respect, we have mimicked this symbolic reasoner behavior in creating our test set. We first started from our input KG 𝐾 0 in hop 0. We then produced, subsequently, KGs of 𝐾 1 ,..., 𝐾 𝑛 until no new triples are added (i.e. 𝐾 𝑛+1 is empty) by applying the RDF inference rules from the specification: The hop 0 dataset contains the original KG's triples in the inferred axioms, hop 1 contains the RDF(S) axiomatic triples. The real inference steps start with 𝐾 𝑛 where 𝑛 &gt;= 2. Table <ref type="table" target="#tab_2">2</ref> summarizes our results in this setup. Unsurprisingly, we observe that result over our synthetic data is poor. This may be because of the huge gap between the distribution of our training data over reasoning hops and the synthetic data reasoning hop length distribution as shown in the first row of Table <ref type="table" target="#tab_2">2</ref>. From that, one can see how the distribution of our training set affects the learning capability of our model. Apart from our observations, previous studies <ref type="bibr" target="#b30">[31,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b31">32]</ref> also corroborate that the reasoning chain length in real-world KGs is limited to 3 or 4. Hence, a synthetic training toy set would have to be built as part of follow-up work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>General Embeddings Visualization</head><p>In order to gain some insight on the nature of our normalized embeddings, we have plotted a Principal Component Analysis (PCA) two-dimensional vector visualization of embeddings computed for the RDF(S) terms and all normalized words in the KGs, in Figure <ref type="figure" target="#fig_1">2</ref>. The embeddings were fetched from the matrix B (embedding query lookup table) in the hop 1 of our model trained over the OWL-Centric dataset. Words are positioned in the plot based on the similarity of their embedding vectors. As anticipated, all the normalized   words tend to form one cluster as opposed to multiple ones. The PCA projection illustrates the ability of our model to automatically organize RDF(S) concepts and learn implicitly the relationships between them. For instance, rdfs:domain and rdfs:range have been located very close together and far from normalized entities. rdf:subject, rdf:predicate and rdf:object vectors are very similar, and the same for rdf:seeAlso and rdf:isDefinedBy. Likewise, rdfs:container, rdf:bag, rdf:seq, and rdf:alt are in the vicinity of each other. rdf:langstring is the only RDF(S) entity which is inside the normalized entities cluster. We believe that it is because rdf:langString's domain and range is string and consequently it has mainly co-occurred with normalized instances in the KGs. Another possible reason for this is its low frequency in our data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and Future Work</head><p>We have demonstrated that a deep learning architecture based on memory networks and preembedding normalization is capable of learning how to perform deductive reason over previously unseen RDF KGs with high accuracy. We believe that we have thus provided the first deep learning approach that is capable of high accuracy RDF deductive reasoning over previously unseen KGs. Normalization appears to be a critical component for high performance of our system. We plan to investigate its scalability and to adapt it to other, more complex, logics.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Diagram of the proposed model, for K=1</figDesc><graphic coords="5,89.29,84.19,416.69,164.77" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: PCA projection of embeddings for the vocabulary</figDesc><graphic coords="10,172.63,196.83,250.01,218.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>.56 33 3.09 33 6.03 33 11.46 31 20.48 31 31.25 28 23.65%</figDesc><table><row><cell>Dataset</cell><cell cols="14">Hop 1 F% D% F% D% F% D% F% D% F% D% F% D% F% Hop 2 Hop 3 Hop 4 Hop 5 Hop 6 Hop 7 D%</cell><cell cols="2">Hop 8 F% D%</cell><cell cols="2">Hop 9 F% D%</cell><cell cols="2">Hop 10 F% D%</cell></row><row><cell cols="2">OWL-Centric a -</cell><cell>8</cell><cell>-</cell><cell>67</cell><cell>-</cell><cell>24</cell><cell>-</cell><cell>1</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell cols="2">OWL-Centric b 42</cell><cell>5</cell><cell>78</cell><cell>64</cell><cell>44</cell><cell>30</cell><cell>6</cell><cell>1</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>Linked Data c</cell><cell>88</cell><cell>31</cell><cell>93</cell><cell>50</cell><cell>86</cell><cell>19</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell cols="2">Linked Data d 86</cell><cell>34</cell><cell>93</cell><cell>46</cell><cell>88</cell><cell>20</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>Synthetic</cell><cell cols="5">38 0.03 44 1.42 32</cell><cell>1</cell><cell>33 1</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table><note>a Training set b Completely different domain c LemonUby Ontology d Agrovoc Ontology</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 F</head><label>2</label><figDesc>-measure and Data Distribution over each reasoning hop</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://lod-cloud.net/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://datahub.io/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/Monireh2/kg-deductive-reasoner</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://jena.apache.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgements</head><p>This work was supported by the Air Force Office of Scientific Research under award number FA9550-18-1-0386 and by the National Science Foundation (NSF) under award OIA-2033521 "KnowWhereGraph: Enriching and Linking Cross-Domain Knowledge Graphs using Spatially-Explicit AI Technologies. "</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Deep Learning for Noise-tolerant RDFS Reasoning</title>
		<author>
			<persName><forename type="first">B</forename><surname>Makni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
		<respStmt>
			<orgName>Rensselaer Polytechnic Institute</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Relation extraction with matrix factorization and universal schemas</title>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Marlin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="74" to="84" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Reasoning with neural tensor networks for knowledge base completion</title>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="926" to="934" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6575</idno>
		<title level="m">Embedding entities and relations for learning and inference in knowledge bases</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Representing text for joint embedding of text and knowledge bases</title>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pantel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Poon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Choudhury</surname></persName>
		</author>
		<author>
			<persName><surname>Gamon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1499" to="1509" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Complex embeddings for simple link prediction</title>
		<author>
			<persName><forename type="first">T</forename><surname>Trouillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Welbl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">É</forename><surname>Gaussier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bouchard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2071" to="2080" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-F</forename><surname>Wong</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1508.05508</idno>
		<title level="m">Towards neural network-based reasoning</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Belanger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.01426</idno>
		<title level="m">Chains of reasoning over entities, relations, and text using recurrent neural networks</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">End-to-end differentiable proving</title>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="3788" to="3800" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Ling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Luís</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Marujo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">F</forename><surname>Astudillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Amir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">W</forename><surname>Black</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Trancoso</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1508.02096</idno>
		<title level="m">Finding function in form: Compositional character models for open vocabulary word representation</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Subword regularization: Improving neural network translation models with multiple subword candidates</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kudo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1804.10959</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Sennrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Haddow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Birch</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1508.07909</idno>
		<title level="m">Neural machine translation of rare words with subword units</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bosc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jastrzębski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.00286</idno>
		<title level="m">Learning to compute word embeddings on the fly</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Eric</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1701.04024</idno>
		<title level="m">A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Hierarchical pointer memory network for task oriented dialogue</title>
		<author>
			<persName><forename type="first">D</forename><surname>Raghu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Gupta</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.01216</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Talman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chatzikyriakidis</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.09774</idno>
		<title level="m">Testing the generalization power of neural network models across nli benchmarks</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hoang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1707.06690</idno>
		<title level="m">Deeppath: A reinforcement learning method for knowledge graph reasoning</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Learning structured embeddings of knowledge bases</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>AAAI, AAAI Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">End-to-end memory networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sukhbaatar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="2440" to="2448" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Can neural networks understand logical entailment?</title>
		<author>
			<persName><forename type="first">R</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Saxton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kohli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1802.08535</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Learning and reasoning with logic tensor networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Serafini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S D</forename><surname>Garcez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference of the Italian Association for Artificial Intelligence</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="334" to="348" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">On the capabilities of logic tensor networks for deductive reasoning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bianchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Hohenecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lukasiewicz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1808.07980</idno>
		<title level="m">Ontology reasoning with deep neural networks</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">OWL 2 Web Ontology Language: Primer (Second Edition)</title>
		<ptr target="http://www.w3.org/TR/owl2-primer/" />
	</analytic>
	<monogr>
		<title level="m">W3C Recommendation 11</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Rudolph</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2012-12">December 2012, 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Translating embeddings for modeling multi-relational data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Usunier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Garcia-Duran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Yakhnenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2787" to="2795" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Knowledge graph embedding by translating on hyperplanes</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AAAI</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1112" to="1119" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krotzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rudolph</surname></persName>
		</author>
		<title level="m">Foundations of semantic web technologies</title>
				<imprint>
			<publisher>Chapman and Hall/CRC</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">RDF 1.1 Semantics</title>
		<ptr target="http://www.w3.org/TR/rdf11-mt/" />
		<editor>P. J. Hayes, P. F. Patel-Schneider</editor>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">On the quality of vocabularies for linked dataset papers published in the semantic web journal</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Janowicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="207" to="220" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Scalable neural theorem proving on knowledge bases and natural language</title>
		<author>
			<persName><forename type="first">P</forename><surname>Minervini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosnjak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dhuliawala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zaheer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vilnis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Durugkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krishnamurthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Smola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1711.05851</idno>
		<title level="m">Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Differentiable learning of logical rules for knowledge base reasoning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Cohen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2319" to="2328" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
