Should All Semantic Web Reasoners Include a Neural Network? Jeff Heflin1,*,† , Yifan Zhang1,† 1 Lehigh University, Computer Science and Engineering, 113 Research Dr., Bethlehem, PA, 18015 Abstract This position poster makes the case that deep neural nets can be incorporated into traditional symbolic reasoners to enable reasoning at Web scale. This will maintain the correctness guarantees of the reasoner while offering efficiency by exploring fruitful paths first. We suggest a first-level decomposition of the problem into representation, learning mechanism, and control strategy and describe some preliminary results. Keywords scalable reasoning, neurosymbolic, knowledge base, deep learning 1. Introduction Since the early days of Semantic Web research, one of the core questions has been how to perform inference at Web scale. For example, the Scalable Semantic Web Knowledge Base Systems workshop has been held 13 times since 2005. In recent years, deep learning has been used to find solutions to problems that were previously unsolvable. This position poster asks whether deep learning could be a vital component in research into scalable reasoning. Neural networks are capable of quickly making predictions from large data, are robust to noise (such as exceptions and contradictions) and can capture knowledge that is difficult to describe formally. However, it is difficult to explain how the trained model came to its conclusion, requires very careful curation of the training set to avoid biases, and the predictions can be very wrong when encountering the long-tail of inputs seldom seen during training. Due to the opaqueness and unpredictability on out-of-distribution data, we do not suggest that all reasoning be performed by a deep neural net. To the contrary, we propose to build traditional symbolic systems that use machine learning models in the decision-making process. Generally, all machine reasoning is a search for a proof, and at each step, the system has many options to choose from. In classical reasoning systems, there is often nothing to distinguish the options, so the choice amounts to a random guess. If the guess turns out to be wrong, and leads to a dead-end, the reasoner will back up and try a different choice. Deep learning can be used to train a meta-reasoner that will use knowledge about the question at hand and the Posters, Demos, and Industry Tracks at ISWC 2024, November 13–15, 2024, Baltimore, USA * Corresponding author. † These authors contributed equally. $ heflin@cse.lehigh.edu (J. Heflin); yiz521@lehigh.edu (Y. Zhang) € http://www.cse.lehigh.edu/~heflin/ (J. Heflin)  0000-0002-7290-1495 (J. Heflin); 0009-0002-2219-0514 (Y. Zhang) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings structure of the axioms, in order to make more informed choices. In essence, the meta-reasoner will function as “intuition,” guiding the reasoner to a solution with fewer false starts. Since the reasoner takes discrete steps, each of which has a justification, the result is reliable and explainable. Admittedly, this is not a new idea. However, we have two reasons to believe that success is possible where previous approaches have failed: first, our proposal for a new way to represent logical statements in vectors that capture their logical semantics has shown early promise [1, 2]. Second, new deep learning architectures are pushing the boundaries of what problems can be solved by neural networks. There are two key research areas that are relevant to this topic: neurosymbolic AI and learning for automated theorem provers. Neurosymbolic AI seeks “to integrate neural network-based methods with symbolic knowledge-based approaches” [3]. This can include a broad range of topics from generating embeddings of knowledge graphs to training a neural network to predict whether one logical statement entails another. In the field of automated theorem proving, there has been various work on proof guidance. Wang et al. [4] proposed to use Graph Convolutional Networks to identify which mathematical statements were relevant to a given conjecture. Jakubův and Urban’s ENIGMA is a learning- based method to train a classification model to identify “useful and un-useful” clauses for proof search [5]. Crouse et al. combine various embedding strategies, including from ENIGMA, with a deep reinforcement learning approach to proof search [6]. More recently, Google has combined large language models and symbolic reasoners to enable automated proving of several Olympiad-level geometry problems [7]. 2. Challenges We propose to augment a traditional symbolic reasoner with a learned meta-reasoning compo- nent. In order to train the meta-reasoner, there are three main problems that must be addressed : representation, learning mechanism and control strategy. Representation is the determination of how to express symbolic information in a form that can be input to a neural architecture since such architectures require inputs to be vectors of numbers. There are many possible choices, and the ability of the choice to capture the semantics of the input can have significant impact on neural reasoning. The learning mechanism is the approach by which we train the model(s) used by the reasoner to help guide decisions for future queries. It considers the type of learning, the architecture of the neural model, and for supervised learning, how the training data is produced. Control strategy determines how the knowledge is utilized. In particular, the control strategy dictates which choices the algorithm will make, and when these choices will be made.Let us consider each of these challenges in turn. 2.1. Representation Crucial to the problem of creating an intuitive symbolic reasoner is the choice of good repre- sentations for the logical statements. There are many ways to translate a sequence of symbols into a vector of numbers, but not all of them are useful. The objective is to find a representation that places logically similar statements close in vector space; thus similar statements will have similar outcomes. We have had initial success by training an embedding using triplet loss to place atoms that unify closer together [1]. Representations for more complex statements can be composed from the embeddings of their constituent atoms. For example, to create the representation of a Horn Logic rule, you can sum the embeddings of the atoms in the body, and concatenate that with the embeddings of the head. Such a representation matches our objective well: if two atoms different only by a variable renaming, their representations should be close; if two rules differ only by the order of their body, their representations will be identical. Designing a representation for a fundamental unit like an atom is appealing. This could become the building block for representations of more complex statements, much in the same way that Word2Vec is used as an initial representation for words by some Large Language Models. Several strategies might be used to create the more complex representations: such as long short-term memory networks, graph convolutional networks, or transformers. 2.2. Learning Mechanism After the representation question has been solved, the next question is how to train the intuitive component. The first choice is between supervised learning and reinforcement learning. For early explorations, supervised learning has the advantage that one can more consistently compare different alternatives, but does require the collection of suitable training data. In general, training data can be produced by using traditional reasoning algorithms to answer a suite of queries. There are different types of functions that we can try to predict: e.g., given a choice, how many steps will be required to reach a solution vs. what is the likelihood that a solution will be found? Different reasoning algorithms will require different choices. For example, in backward-chaining, at each step the algorithm has to choose which subgoal to attempt next, and then which rule to apply to that subgoal. In the presence of cycles, a poor choice can lead to needless search or even infinite loops. We can train one or more scoring models to rank the choices. We can produce the training data automatically by running traditional reasoners (with randomized choices) on random queries. We have had initial success by using the scoring model to choose the hardest goal first, and then try to prove it using the most promising rule first. Alternatively, reinforcement learning (RL) provides a strong framework that can replace supervised learning, especially when feedback (rewards or penalties) guides decisions. In RL, the reasoner acts as an agent making decisions step-by-step, such as choosing sub-goals or rules. Unlike supervised learning, RL doesn’t need pre-labeled data; the agent learns through attempts to answer queries, getting feedback on whether a solution is found and how efficiently. Each decision is treated as an action, with rewards based on minimizing steps or avoiding loops. Over time, the RL agent refines its strategy to maximize overall rewards. 2.3. Control Strategy The final piece is the control strategy: how will the trained intuitive component impact the reasoning process? The specific options will depend on the reasoning algorithm, but in all cases we can imagine a spectrum of strategies, At one extreme, the valid options are determined by the reasoner and the meta-reasoner merely ranks these choices. At the other extreme, the meta-reasoner determines both the available options and the order in which to evaluate them. When the design approaches this extreme, we go from intuition that merely makes traditional reasoners more efficient to intuition that enables new capabilities such as making intuitive leaps and selectively ignoring contradictory information. 3. Preliminary Results We conducted a preliminary study using the Lehigh University Benchmark (LUBM), designed to evaluate the performance of semantic web databases in simulated universities, to assess our approach. In 62% of queries, our proposed system reduced the search space by several orders of magnitude compared to a traditional backward-chaining reasoner. Depending on query complexity, it found an answer by exploring between 5x to 60x fewer nodes. 4. Conclusion If the three core problems can be solved, then this approach could become an essential component to any scalable knowledge base system. Future work includes extending the approach to description logics and tableau-based reasoning, finding efficient ways to encode statements in knowledge bases that have billions of constants, how to learn models that are still useful as the knowledge base evolves, and how best to avoid outliers. References [1] A. Arnold, J. Heflin, Learning a more efficient backward-chaining reasoner, in: Tenth Annual Conference on Advances in Cognitive Systems (ACS-2022), Cognitive Systems Foundation, Arlington, VA, 2022. [2] Y.-B. Jia, G. Johnson, A. Arnold, J. Heflin, An evaluation of strategies to train more efficient backward-chaining reasoners, in: Proceedings of the 12th Knowledge Capture Conference 2023, 2023, pp. 206–213. [3] A. Sheth, K. Roy, M. Gaur, Neurosymbolic AI- why, what, and how, arXiv preprint arXiv:2305.00813 (2023). [4] M. Wang, Y. Tang, J. Wang, J. Deng, Premise selection for theorem proving by deep graph embedding, in: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 2783–2793. [5] J. Jakubův, J. Urban, Enigma: Efficient learning-based inference guiding machine, in: H. Geu- vers, M. England, O. Hasan, F. Rabe, O. Teschke (Eds.), Intelligent Computer Mathematics, Springer International Publishing, Cham, 2017, pp. 292–302. [6] M. Crouse, I. Abdelaziz, B. Makni, S. Whitehead, C. Cornelio, P. Kapanipathi, K. Srinivas, V. Thost, M. Witbrock, A. Fokoue, A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving, 35th AAAI Conf. on Artificial Intelligence, AAAI 2021 7 (2021) 6279–6287. arXiv:1911.02065. [7] T. Trinh, Y. Wu, Q. Le, H. He, L. Thng, Solving olympiad geometry without human demonstrations, Nature 625 (2024) 476–482. doi:10.1038/s41586-023-06747-5.