Softening Ontological Reasoning with Large Language Models Teodoro Baldazzi1 , Davide Benedetto1 , Luigi Bellomarini2 , Emanuel Sallinger3,4 and Adriano Vlad3 1 Università Roma Tre, Italy 2 Banca d’Italia, Italy 3 TU Wien, Austria 4 University of Oxford, United Kingdom Abstract Logic-based Knowledge Graphs (KGs) and Knowledge Representation and Reasoning (KRR) have emerged as fundamental methodologies in many data-intensive areas, fostering trust and accountability for effective decision- making. However, the knowledge captured by such approaches is often restricted by the rigidity of their structured rule-based formalisms. More recently, the rising adoption of Large Language Models (LLMs) has introduced a new layer of semantic understanding and flexibility in human-data interaction. Yet, these models are inherently limited in reasoning capabilities and lack systematic and explainable outcomes due to their opaque nature. To address today’s challenge of combining the strengths of both technologies, we propose a novel neurosymbolic solution that leverages the power of LLMs to “soften” rule activations, enhancing adaptability in ontological reasoning while preserving robustness and transparency of KRR systems. Keywords Ontological reasoning, Language models, Knowledge graphs 1. Introduction In recent years, the widespread interest in querying and exploiting large volumes of data has catalyzed the development of increasingly mature, efficient, and scalable solutions capable of capturing and reasoning over real-world scenarios. In this context, ensuring the transparency of data-driven processes is paramount to provide high levels of trustworthiness and accountability in decision-making, especially over critical domains such as finance and biomedicine [1, 2]. Powered by logic-based Knowledge Representation and Reasoning (KRR) formalisms, such intelligent systems are fully explainable [3], as they provide factual conclusions augmented with the consequentially logical steps that led to them through the inference. Among these formalisms, logic programming-based database query languages, such as Datalog and its extensions [4], are a yardstick, thanks to their effective trade-off between expressive power and computational complexity. Leveraging such languages, factual data from corporate databases can be combined with business-level definitions as ontologies in Knowledge Graphs (KGs), and further augmented via ontological reasoning [5]. However, ontological reasoning systems are constrained by the rigid nature of KRR formalisms at their foundation, which limits their adaptability to the complexities of real-world data. Indeed, these systems typically rely on query-based interactions, operating at a low level and often proving challenging for non-specialists to use effectively. Moreover, all inputs and outputs are confined to structured formats such as facts, n-tuples, or triples, and the generation of new knowledge through rule activation is restricted to what can be syntactically captured by predefined logical predicates and via RuleML+RR’24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16–22, 2024, Bucharest, Romania $ teodoro.baldazzi@uniroma3.it (T. Baldazzi); davide.benedetto93@gmail.com (D. Benedetto); luigi.bellomarini@bancaditalia.it (L. Bellomarini); sallinger@dbai.tuwien.ac.at (E. Sallinger); adriano.vlad@gmail.com (A. Vlad)  0000-0002-1762-1431 (T. Baldazzi); 0000-0001-6079-4250 (D. Benedetto); 0000-0001-6863-0162 (L. Bellomarini); 0000-0001-7441-129X (E. Sallinger) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings precise bindings to actual values. This rigidity fundamentally clashes with the inherent ambiguity of unstructured or raw data that may not fit into predefined categories. Together with the incompleteness, inconsistencies, and inaccuracies that might affect such data, these issues inhibit the applicability of KRR in real-world scenarios where understanding the semantic meaning of information is crucial. Consequently, we are observing a critical need for solutions that enable more semantic-aware and flexible reasoning capabilities in such systems. As an intuitive example, let us consider the natural language (NL) sentence “Through a series of five transactions, E. Musk has acquired 52% of Twitter in October 2022, after previously expressing interest in the platform during several interviews.” and the logical rule Owns(owner,owned,shares), shares > 0.5 → Controls(owner,owned), stating that “a financial entity owning more than 50% of the shares of another one controls it. While a human would readily understand that Elon Musk now controls Twitter, automatically inferring this result presents significant challenges for a KRR system. Indeed, it should first recognize that, despite the unstructured nature of the input, the rule’s body could bind to it, given the close semantic relationship between acquisition and ownership. Then, it would need to correctly map the arguments of the Owns predicate to the corresponding portions of the input, i.e., identifying E. Musk as the entity owning, Twitter as the entity owned, and 52% as the shares involved in the ownership. Moreover, information such as the number of transactions and the time-frame, while not affecting rule activation, still provides relevant context in the financial domain the example belongs to and should not be filtered out, whereas details like Musk’s prior expressions of interest can be omitted. Finally, the rule would be activated, producing as output Controls(E. Musk,Twitter), ideally augmented with such contextually-relevant details as additional metadata to enrich explainability. The demand for solutions that enable more adaptable and flexible reasoning to navigate the intricacies and ambiguities of real-world data has gained further traction with the recent breakthrough of AI-based chatbots and Large Language Models (LLMs) [6], which has marked a significant turning point in the field of Natural Language Processing (NLP) and a pivotal shift in the access to data and knowledge towards more human-friendly and high-level paradigms. Today, LLMs such as OpenAI’s GPT [7] and Meta’s Llama [8] are effectively adopted to address a plethora of tasks across multiple domains [9]. Following the development of techniques such as chain-of-thought prompting [10], recent attempts have been carried out to employ LLMs for complex data analyses as well as multi-step reasoning [11, 12]. Yet, despite the advancements in the field and the proficiency of these models in handling semantic relationships within natural language, concerns persist due to their intrinsic opacity and unpredictabil- ity [13]. Indeed, they often fall short in providing systematic, explainable outcomes necessary for big data processing and robust decision-making in high-stakes domains [14, 15]. This paper addresses the challenge of synergistically combining the robustness and transparency of KRR systems with the power of LLMs in understanding the semantic meaning of NL knowledge. We propose a neurosymbolic solution that leverages LLMs to augment the ontological reasoning process with real-world semantic flexibility, injecting “softness” into rule activations. Specifically, we operate in the context of the Vadalog [16] system, a Datalog-based reasoning engine for KGs, that finds many industrial applications [17]. The semantics of a Vadalog set Σ of rules can be defined in an operational way via the well-known chase [18] procedure. Given a database in input, this algorithmic tool expands it with new facts entailed via the application of the rules in Σ, until all of them are satisfied. Intuitively, a rule is applied when an exact binding is identified, i.e., a set of mappings of the variables in the rule’s body to the constants of structured facts in the database. With the goal of extending the traditional chase mechanism to address the complexities of unstruc- tured data, our approach leverages a pre-trained Llama 3 model to act as a semantic unifier, responsible for identifying bindings in the chase between rules and such data. In practice, given the next rule to be applied via the reasoner, both the rule in its natural language form and the candidate facts to activate it on are passed to the LLM. The model leverages its semantic understanding capabilities to generate bindings as sets of mappings from the variables of the rule body to the proper excerpts of the NL facts. These mappings then undergo a validation phase, which includes a feedback loop to confirm their correctness and coherence and to address potential hallucinations. Once validated, the resulting D Data Rule & facts Semantic Binding selection identification Σ unifier Rules Validator Rule G activation Glossary tupnI Ontological reasoner Verbalizer tuptuO Σ(D) Chase Termination Semantic Fact check unifier verbalization Figure 1: Neurosymbolic reasoning pipeline for LLM-powered soft chase. 𝐷 represents input data collected from relational databases and natural language sources connected to the ontological reasoning system. Σ denotes the set of logic rules to be applied on 𝐷. Σ(𝐷) refers to the original data augmented with new knowledge inferred by applying the rules in Σ throughout the reasoning process. binding is provided to the reasoner, which employs it to attempt rule activation. If all the conditions in the rule are satisfied, a new fact is inferred and additional details of the parents NL facts are preserved as chase metadata. Finally, the newly produced fact is verbalized into natural language via a dedicated module and a termination check is performed, leveraging again the LLM to ensure that the knowledge it provides had not already been generated at a previous step of the reasoning. If that is the case, the fact is added as new input in the chase, and the procedure continues until no more bindings can be identified. A high-level summary of the pipeline, illustrated in Figure 1, will guide our discussion. More in detail, our contributions can be summarized as follows. • We present a novel soft chase technique that extends logic rule bindings and termination control of traditional chase methodologies to unstructured data, leveraging the semantic awareness of LLMs and a deterministic verbalization of logic facts into NL. • We deliver such an approach in a new neurosymbolic KRR-centered architecture (powered by Vadalog, but compatible with any chase-based reasoner) to enable more adaptable and flexible ontological reasoning while preserving robustness and explainability. • We discuss a preliminary experimental evaluation confirming the validity of our approach and comparing standard chase with its soft counterpart, powered by pre-trained and Retrieval- Augmented Generation (RAG) [19]-enriched versions of the LLM. Overview. In Section 2 we provide essential background notions. In Section 3 we present our proposed neurosymbolic architecture. A preliminary experimental evaluation is provided in Section 4. Section 5 discusses related work. We draw our conclusions in Section 6. 2. Chase-based Ontological Reasoning in the Vadalog System To guide the rest of our discussion, we first lay out some preliminary notions on ontological reasoning over KGs, with a specific focus on the Vadalog system and the chase procedure at its foundation. Relational foundations. Let C and V be disjoint countably infinite sets of constants and variables, respectively. A (relational) schema S is a finite set of relation symbols (or predicates) with associated arity. A term is either a constant or a variable. An atom over S is an expression of the form 𝑅(𝑣¯ ), where 𝑅 ∈ S is of arity 𝑛 > 0 and 𝑣¯ is an 𝑛-tuple of terms. A database (instance) over S associates to each symbol in S a relation of the respective arity over the domain of constants. The members of the relations are called tuples or facts. Given two conjunctions of atoms ς1 and ς2 , we define a homomorphism from ς1 to ς2 as a mapping ℎ : C ∪ V → C ∪ V s.t. ℎ(𝑡) = 𝑡 if 𝑡 ∈ C and for each atom 𝑎(𝑡1 , . . . , 𝑡 𝑛 ) ∈ ς1 , then ℎ(𝑎(𝑡1 , . . . , 𝑡 𝑛 )) = 𝑎(ℎ(𝑡1 ), . . . , ℎ(𝑡 𝑛 )) ∈ ς2 . Vadalog syntax. Vadalog is a declarative language for ontological reasoning. It is based on Warded Datalog± , a member of the Datalog family that, at the price of very mild syntactic restrictions, extends Datalog with existential quantifiers and guarantees PTIME data complexity for query answering [20]. A Warded Datalog± program consists of a set of tuples (or facts) and tuple-generating dependencies (TGDs) of the form ∀𝑥¯∀𝑦¯ (𝜑(𝑥¯ , 𝑦¯ )→∃𝑧¯ 𝜓(𝑥¯ , ¯𝑧)), where 𝜑(𝑥¯ , 𝑦¯ ) (the body) and 𝜓(𝑥¯ , ¯𝑧) (the head) are conjunctions of atoms over the respective predicates, 𝑥¯ , 𝑦¯ are vectors of universally quantified variables and constants, and ¯𝑧 is a vector of existentially quantified variables. Quantifiers can be omitted and conjunction is denoted by comma. In this context, Vadalog extends the Warded fragment with features of practical utility to address real-world scenarios [16]. Support for aggregate functions, namely sum, prod, min, max and count, is achieved by means of monotonic aggregations [21]. Other relevant extensions include negations and negative constraints of the form 𝜑(𝑥¯ , 𝑦¯ ) →⊥, where 𝜑(𝑥¯ , 𝑦¯ ) is a conjunction of atoms and ⊥ denotes the truth constant false to model disjointness or non-membership, as well as expressions in rule bodies, modelled with comparison (>, <, ≥, ≤, ≠) and algebraic (+, −, ∗, /, etc.) operators. Chase Procedure. KRR approaches model KGs as the combination Σ(𝐷) of an extensional component, essentially the ground business data in a database 𝐷, and an intensional component, which formally describes the business knowledge as a set Σ of rules in a declarative language such as Vadalog. Performing ontological reasoning over the KG augments it with new inferred knowledge derived from the application of the rules over the input data. Specifically, the semantics of a Vadalog program can be defined in an operational way with the chase procedure [18]. It enforces the satisfaction of a set Σ of rules over a database 𝐷, incrementally augmenting 𝐷 with facts entailed via the application of the rules over 𝐷, until fixpoint. While Vadalog guarantees that such fixpoint exists when only the core features are used [16], the joint presence of algebraic operations and recursion must be carefully handled, as even simple Datalog programs can be in general non-terminating [22]. A TGD 𝜎 : 𝜑(𝑥¯ , 𝑦¯ )→𝜓(𝑥¯ , ¯𝑧) is applicable to 𝐷 if: (i) there exists a homomorphism 𝜃 (technically known as binding) such that 𝜃 (𝜑(𝑥¯ , 𝑦¯ )) ⊆ 𝐷, that is, if there exists a set of mappings from the terms of 𝜑(𝑥¯ , 𝑦¯ ) to the constants of facts in 𝐷 such that each term maps to exactly one constant, and (ii) 𝜃 (𝜓(𝑥¯ , ¯𝑧)) is a fact not already present in 𝐷. If such a binding 𝜃 exists, then 𝜃 (𝜓(𝑥¯ , ¯𝑧)), derived by applying these mappings to the conclusion of the TGD, is added to 𝐷 via a chase step. The chase graph G(𝐷, Σ) is the directed acyclic graph with the facts from the chase Σ(𝐷) as nodes and an edge from a node 𝑛 to a node 𝑚 if 𝑚 derives from 𝑛 (and possibly other facts) via a chase step [4]. Dedicated works [23, 24] have thoroughly explored chase termination [22] in Vadalog in the presence of recursion and algebraic operations. Vadalog reasoner. The Vadalog system is a state-of-the-art ontological reasoning engine that leverages the theoretical underpinnings of the chase procedure and the vast experience of the database community on provenance to power efficient, scalable, and explainable reasoning tasks over critical business domains and large KGs [16]. To achieve this, it adopts a streaming data processing architecture based on the pipes and filters style [16, 25]. Here, the set of rules Σ and the queries are translated into active data scans (linear scans for linear TGDs, join scans for join TGDs, and an output scan for the query), connected by intermediate buffers in a processing pipeline. The reasoning process is performed as a data stream along the pipeline, where each filter (i.e., scan) reads tuples from the respective parent, from the output scan down to the external data stores that inject ground facts into the pipeline. Interactions between scans occur by means of primitives such as next(), which fetches facts from the parent stream, if present. Since, for each filter, multiple parent filters may be available, Vadalog selects which one to invoke by employing specific routing strategies (round-robin, shortest path, etc.) that manage a priority queue of the sources. This methodology allows Vadalog to keep track of the provenance of each result, derived from one or more chase steps. Unlike traditional semi-naive approaches [22], Vadalog generalizes the volcano iterator model [26], operating in a pull-based query-driven fashion in which, ideally, facts are materialized only at the end of the chase and if they contributed to the reasoning task. 3. Neurosymbolic Reasoning by Softening the Chase The input blocks of the soft chase pipeline in Figure 1 are a set 𝐷 of data, a set Σ of reasoning rules expressed in Vadalog, and a glossary 𝐺. Without loss of generality, we define 𝐷 as the collection of structured data from relational databases 𝐷 𝑠 and unstructured data from natural language sources 𝐷 𝑢 , all connected to Vadalog for the reasoning task. The glossary 𝐺 lists the predicates in Σ and their corresponding natural language descriptions. Let us first introduce our running example. Here, 𝐷 contains a collection of acquisitions and ownerships of companies’ shares by financial entities in the market, both persons and other companies. Example 1. The following set Σ contains the Vadalog rules governing who has decision power in a financial entity, based on who owns, directly or indirectly via intermediaries, a significant amount of shares of the financial entity [27] Owns(𝑥, 𝑦, 𝑠) → OwnedShares(𝑥, 𝑦, 𝑦, 𝑠) (𝜎1 ) SignificantShares(𝑥, 𝑧), Owns(𝑧, 𝑦, 𝑠) → OwnedShares(𝑥, 𝑧, 𝑦, 𝑠) (𝜎2 ) OwnedShares(𝑥, _, 𝑦, 𝑠), 𝑡𝑠 = msum(𝑠), 𝑡𝑠 > 0.3 → SignificantShares(𝑥, 𝑦) (𝜎3 ) A financial entity 𝑥 directly owning 𝑠 shares of another financial entity 𝑦, owns such shares via 𝑦 itself (rule 𝜎1 ). If 𝑥 owns significant shares of a financial entity 𝑧 and 𝑧 owns 𝑠 shares of 𝑦, then 𝑥 owns 𝑠 shares of 𝑦 via 𝑧 (rule 𝜎2 ). Finally, if 𝑥 owns, directly or indirectly, a total amount of 𝑦’s shares greater than 0.3, then 𝑥 owns a significant portion of 𝑦’s shares (rule 𝜎3 ). Consider the following subset of data 𝐷 𝑠 = {Owns(Elon Musk, Tesla, 0.19), Owns(Google LLC, DeepMind, 0.7), Owns(BlackRock, Google, 0.4)}, and the query 𝑄: “what are all the entailed significant shares?” as ontological reasoning task. Note that the example is not intended to reflect real-world dynamics. In pure KRR settings, the set Σ(𝐷 𝑠 ) is computed via the standard chase: starting from Σ(𝐷 𝑠 ) = 𝐷 𝑠 , it augments Σ(𝐷 𝑠 ) with facts derived from the application of the rules in Σ up to fixpoint. Figure 2 illustrates the chase graph derived from the activation of Σ over 𝐷 𝑠 . Specifically, rule 𝜎1 generates OwnedShares(Elon Musk, Tesla, Tesla, 0.19), OwnedShares(Google LLC, DeepMind, DeepMind, 0.7), and OwnedShares(BlackRock, Google, Google, 0.4) representing the direct ownership entailed from the input facts. Then, SignificantShares(Google LLC, DeepMind) and SignificantShares(BlackRock, Google) are inferred via rule 𝜎3 , whereas Elon Musk does not own significant shares of Tesla directly. Note that we cannot automatically derive, via rule 𝜎2 and rule 𝜎3 , that BlackRock owns significant shares of DeepMind indirectly through Google, as rule 𝜎2 does not activate on the join argument ⟨Google LLC, Google⟩. Reasoning with the soft chase. Let us now extend Example 1 by taking into account an additional source of information apart from 𝐷 𝑠 . For instance, consider the following input NL data 𝐷 𝑢 = {“E. Musk bought 21% additional shares of Tesla in 2023”, “Andy Jassy is CEO of Amazon since 2021”}. Indeed, in this instance relevant information would be lost via the standard chase due to the absence of syntactic bindings from the rule bodies to NL knowledge in 𝐷 𝑢 . Thus, we extend binding identification by introducing the soft chase, in which an LLM acts as a semantic unifier between rule bodies and Σ(𝐷), 𝐷 = 𝐷 𝑠 ∪ 𝐷 𝑢 , injecting NL understanding capabilities into the reasoning process. Owns(Elon Musk,Tesla,0.19) Owns(BlackRock,Google,0.4) σ1 σ1 OwnedShares OwnedShares (Elon Musk,Tesla,Tesla,0.19) (BlackRock,Google,Google,0.4) σ3 SignificantShares Owns(Google LLC,DeepMind,0.7) (BlackRock,Google) σ1 OwnedShares (Google LLC,DeepMind,DeepMind,0.7) σ3 SignificantShares (Google LLC,DeepMind) Figure 2: Instance of standard chase graph for Example 1. Algorithm 1 Soft Chase Procedure. 1: function soft_chase(𝐷, Σ, 𝐺, model) 2: Σ(𝐷) ← 𝐷 ⊲ initialize chase facts to 𝐷 3: while Vadalog.hasNext() do ⊲ continue until all rules and facts are processed 4: 𝜎, i ← Vadalog.next() ⊲ fetch next rule and facts to process according to routing strategy 5: imappings , attempts ← ∅, 0 6: if linear(𝜎) then 7: imappings ← model.bindLinear(𝜎, i) ⊲ get mappings via LLM for linear rules 8: else 9: if join(𝜎) then 10: imappings ← model.bindAndMatchJoin(𝜎, i) ⊲ get mappings via LLM and check join conditions 11: while attempts < limit do 12: feedback ← validate(imappings , model) ⊲ validate mappings 13: if feedback == “OK” then 14: break ⊲ exit loop if feedback is positive 15: else 16: imappings , attempts ← ∅, attempts + 1 17: if attempts < limit then 18: imappings ← model.refineMappings(𝜎, i, feedback) ⊲ refine mappings based on feedback 19: if imappings ≠ ∅ then 20: i′ .logic ← Vadalog.apply(𝜎, imappings ) ⊲ activate rule via Vadalog 21: i′ .metadata ← storeMetadata(imappings ) ⊲ preserve additional NL details as metadata 22: i′ .nl ← verbalize(i′ ,G) ⊲ verbalize the new fact into NL 23: if model.checkTermination(Σ(𝐷), i′ .nl) then ⊲ check termination via LLM 24: Σ(𝐷) = Σ(𝐷) ∪ i′ ⊲ add newly generated fact to the chase 25: return Σ(𝐷) Specifically, the soft chase can be distinguished into five distinct phases, discussed below for Exam- ple 1 with the aid of Algorithm 1 and Figure 1. 1. Initialization and rule selection. As in a standard chase procedure, we begin by initializing the set Σ(𝐷) of chase facts to the ground ones in 𝐷 (line 2, in the algorithm). Next, we consider the data from Σ(𝐷) to activate the rules in Σ and generate new knowledge. Current rule and data to check for bindings are fetched via next() primitive in Vadalog, if present, according to a routing strategy. Let us assume that we are employing the default round-robin strategy. Let us also assume that each rule features both its logical form and a natural language description, easily produced as a preprocessing step by deterministically verbalizing the atoms in body and head into a “Since {body}, then {head}” sentence [28] according to select-project-join semantics and looking up the glossary 𝐺. Similarly, if the input facts belong to the 𝐷 𝑠 database, they are verbalized as well. D () ne xt ne xt( ) ) bi d( nd n check_ bi () 𝝈1 termination() 𝝈2 bi () ne nd xt () xt ne () + jo in () 𝝈3 () next() nd bi o Figure 3: Vadalog processing pipeline of soft chase for Example 1. Green nodes are linear rules, the blue one is a join rule, and the red one is the output of the reasoning task. Solid edges are logical dependencies between the rules, and dashed ones denote an interaction with the semantic unifier of the type specified in the label (bind(), join(), check_termination()). 2. Binding identification. The goal of this step is identifying the possible binding of the current rule body with the input facts. To achieve this, the LLM is employed, acting as semantic unifier to generate a set of variable-to-constant mappings. Specifically, we operate with a pre-trained model, augmented only with some manually defined examples of mappings in a few-shot learning fashion to increase accuracy and limit hallucinations, both in the actual task and in the output format. Here we observe distinct behaviours according to the type of the rule. Indeed, if the rule is linear, i.e., it features a single atom in the body (such as 𝜎1 in Example 1), the model only verifies whether there exists a set of mappings from the verbalized atom to excerpts of the NL fact (line 7). For instance, the NL fact “E. Musk bought 21% additional shares of Tesla in 2023 ” maps to the verbalized form of atom Owns(𝑥,𝑦,𝑠), that is, “A financial entity 𝑥 owns 𝑠% shares of another financial entity 𝑦”. If a possible binding is identified, the LLM returns as output the structured set of mappings from the rule body to the fact, e.g., { 𝑥 → E. Musk, 𝑦 → Tesla, 𝑠 → 0.21 }, together with the details around the time-frame as additional metadata. Otherwise, it returns the empty set. If instead the rule involves a join, first the model performs the same binding identification as in the linear case, for each individual atom in the body. Then, it further processes the resulting sets of mappings to check whether the values corresponding to the join variables match semantically (line 10), in which case the mappings are returned as output. For instance, the input fact “Google LLC owns 70% shares of DeepMind” (the NL version of Owns(Google LLC, DeepMind, 0.7)) and “BlackRock owns significant shares of Google” (the NL version of SignificantShares(BlackRock, Google)) match on the join argument ⟨Google LLC, Google⟩, unlike the standard chase approach discussed above. 3. Binding validation. After generating the candidate mappings 𝑖mappings , a validation step occurs. Specifically, it first performs a deterministic check to ensure that all the variables in the body have been mapped to exactly one constant (e.g., an excerpt of the NL fact). This step is required to comply with the definition of binding as a homomorphism introduced in Section 2. Then, a separate LLM is employed as well, acting as a validator to confirm the response of the binding identification phase in a feedback loop fashion (lines 11-18). Indeed, if the candidate mappings do not pass the check, the cause of the issue is provided to the semantic unifier, which is tasked with repeating the step. A limit is enforced on the maximum number of attempts before considering the rule as unable to be bound to the current data. 4. Rule activation. If the set of mappings is not empty after validation, the logic rule can be de- terministically activated via the Vadalog reasoner according to the binding. Before this, standard applicability checks of the rule occur, verifying the pre-existence of the unified head in Σ(𝐷). If that is not the case, and if additional conditions that might be present in the rule, such as selections and negations, are satisfied, the rule is activated and the new logic fact 𝑖 ′ is inferred (line 20). Then, the fact is verbalized via the dedicated module and according to the glossary (line 22). For instance, from the binding { 𝑥 → E. Musk, 𝑦 → Tesla, 𝑠 → 0.21 }, rule 𝜎1 generates the fact OwnedShares(E. Musk, Tesla, Tesla, 0.21), verbalized into “E. Musk owns 21% shares of Tesla directly”, with the specific time-frame of the parent fact as additional chase metadata, i.e., “acquisition occurred in 2023 ”. 5. Termination check. Finally, the resulting fact 𝑖 ′ undergoes a semantic termination check to ensure that it is not already present in the chase instance Σ(𝐷). This step, essential to prevent loops in recursive settings, in the soft chase version goes beyond standard applicability checks as it limits redundancy of inferred knowledge throughout the reasoning by pruning facts whose semantic meaning has already been derived in a previous step. Thus, the semantic unifier is employed once again and the verbalized version of 𝑖 ′ is semantically compared with the ones of the facts in Σ(𝐷) (line 23). Such a phase needs to be properly handled to prevent the removal of relevant facts. For instance, in our running example the fact “E. Musk owns 21% shares of Tesla directly” must be added to Σ(𝐷), thus it must not be pruned due to “Elon Musk owns 19% shares of Tesla”. To address this, the LLM is enriched with specific examples, and the chase metadata of the compared facts is taken into account as well. If 𝑖 ′ passes the check, it is added to Σ(𝐷) and the soft chase begins a new iteration, until fixpoint. Extending soft chase with RAG. To further specialize the LLM into the domain of interest for the reasoning task, thus enabling a more accurate semantic unification throughout the procedure, we can also make available additional knowledge and terminology via RAG mechanisms. RAG enhances the model’s contextual understanding by retrieving relevant documents or data points that contain specific information related to the concepts (i.e., the atoms and the facts) involved in the binding at hand. As further discussed in the next section, this proved to have a relevant impact in practical settings. For instance, in pure soft chase the fact significantShares(Andy Jassy, Amazon) is inferred from the NL input “Andy Jassy is CEO of Amazon since 2021” via rule 𝜎1 , due to the mapping { 𝑥 → Andy Jassy, 𝑦 → Amazon, 𝑠 → 0.51 }, and then rule 𝜎3 . In this instance, the LLM is incorrectly assuming that being CEO of a company entails owning the majority of its shares. We can prevent this incorrect inference by explicitly specifying, in the domain knowledge provided via RAG, that, in the absence of additional information, a CEO does not necessarily own any shares of the company at all. Figure 4 illustrates the soft chase graph for Example 1. As further discussed in the next section, it can be observed how the soft chase variants augment the resulting chase instance with multiple relevant facts derived from LLM’s semantic understanding of the domain. 4. Preliminary Experimental Evaluation We integrated our proposed pipeline with the Vadalog system, although it can be integrated with any chase-based ontological reasoner. A full-scale evaluation of the architecture is beyond the scope of this work. Conversely, in this section we provide a preliminary comparison of standard and soft chase (in its pure and RAG-powered versions) over an instance of Example 1. Setup. The experiments were conducted over a KG comprising ownership relationships between companies and persons as financial entities, represented using various nomenclatures such as full names, stock symbols, phrases, or common abbreviations. The KG featured inherent ambiguities and synonymous terms, reflecting real-world complexities and inconsistencies typical of semi-structured and unstructured corporate data. Moreover, natural language sentences describing ownership and acquisition facts were provided separately as input, simulating the scenario introduced in the previous section. We employed a pre-trained Llama 3 70B model as the semantic unifier. "E. Musk bought 21% additional Owns(Elon Musk,Tesla,0.19) Owns(BlackRock,Google,0.4) shares of Tesla in 2023" σ1 σ1 σ1 OwnedShares OwnedShares OwnedShares (E. Musk,Tesla,Tesla,0.21) (Elon Musk,Tesla,Tesla,0.19) (BlackRock,Google,Google,0.4) σ3 σ3 SignificantShares Owns(Google LLC,DeepMind,0.7) (BlackRock,Google) SignificantShares σ2 (Elon Musk,Tesla) σ1 σ2 "Andy Jassy is CEO of OwnedShares OwnedShares Amazon since 2021" (Google LLC,DeepMind,DeepMind,0.7) (BlackRock,Google,DeepMind,0.7) σ1 σ3 σ3 OwnedShares SignificantShares SignificantShares (Andy Jassy,Amazon,Amazon,0.51) (Google LLC,DeepMind) (BlackRock,DeepMind) σ3 SignificantShares (Andy Jassy,Amazon) Figure 4: Instance of soft chase graph for Example 1. Red nodes and edges denote an incorrect derivation due to LLM hallucination, prevented in the version featuring RAG support. Goal and Metrics. The primary goal of this evaluation is to assess the extent to which the injection of “softness” enhances the standard chase by recognizing similar entities and relationships according to real-world semantics. This enables augmenting the inference capabilities of the traditional approach while also preventing the generation of redundant data that represents the same knowledge in different syntactic forms. We conducted the experiments both before and after integrating the LLM with detailed knowledge of the domain of interest via RAG, with the purpose of further improving the model’s accuracy in recognizing domain-specific entities and relationships. We compared the three distinct approaches (standard chase, soft chase, soft chase with RAG) according to the following metrics: • precision, i.e., the fraction of inferred significant shares that are correct; • recall, i.e., the fraction of correct significant shares that are inferred; • F1 score, i.e., the harmonic mean of precision and recall; • false positive (FP) shares, i.e., the fraction of incorrect significant shares that are inferred. For this evaluation, the correct instances of significant shares were determined through a manually curated golden set, where domain experts verified the correctness of the inferred relationships. Discussion. Results are illustrated in Figure 5. The standard chase featured full precision by definition, as it derived significant shares solely through strict logical binding with structured facts. However, this precision came at the cost of recall, as the standard chase was unable to bind rules to unstructured input, leading to missed inferences throughout the reasoning process. For instance, it failed to derive the direct relationship OwnedShares (E. Musk,Tesla,Tesla,0.21) from the input knowledge “E. Musk bought 21% additional shares of Tesla in 2023”, and consequently did not infer that Elon Musk holds significant shares of Tesla. On the other hand, the soft chase demonstrated lower precision but higher recall, as it leveraged the LLM to recognize and semantically unify unstructured concepts with structured relationships. Furthermore, the introduction of RAG significantly improved both precision and recall, reducing the generation of incorrect facts such as significantShares(Andy Jassy, Amazon) from the input “Andy Jassy is CEO of Amazon since 2021”. Indeed, the domain-specific knowledge provided by RAG effectively mitigated LLM hallucinations, reducing false positive bindings and consequently the incorrect inference of significant shares in the soft chase approach. Text 140 Standard Chase Soft Chase Text 120 Soft Chase with RAG 100 Percentage 80 60 40 20 0 Precision Recall F1 Score FP Shares Figure 5: Comparison of precision, recall, F1 score, and FP shares for standard chase, soft chase, and soft chase with RAG, evaluated on instance of Example 1. 5. Related Work Neurosymbolic methodologies are currently at the forefront of both academic and industrial research due to their potential in developing more intelligent, versatile, and explainable AI applications [29]. In this context, the integration of logic-based KGs and, more broadly, KRR approaches with LLMs has shown significant promise [30, 31]. Among the distinct forms of hybrid interactions between the two paradigms [32], studies have primarily focused on enriching LLMs with domain-specific knowledge encapsulated in KGs [28], as well as employing these models for tasks such as KG construction from unstructured text [33] and exploration [34]. A recent line of research involves integrating LLMs with foundational reasoning skills, modeling implicit structure information within the text and performing explicit logical reasoning over them to deduce the conclusion [35]. However, while these approaches improve reasoning capabilities, they often lack the robust, transparent reasoning structures that KRR systems inherently provide. To address this, frameworks like LOGIC-LM have been introduced, which first translates natural language problems into symbolic formulations using LLMs, and then employs a deterministic symbolic solver for inference [36]. To the best of our knowledge, this is the first approach that goes beyond the pure combination of LLMs with symbolic solvers to translate and solve specific logical problems. Our proposal is designed to seamlessly integrate LLMs within a KRR-centric framework to enhance ontological reasoning with semantic understanding throughtout the whole process, injecting human-like flexibility for complex real-world tasks while also preserving the inherent transparency of the paradigm. 6. Conclusion In this paper, we addressed the limitations of traditional ontological reasoning systems, particularly their inherent rigidity in managing the intricacies and ambiguities of natural language data. We proposed a novel neurosymbolic approach that integrates Large Language Models as semantic interpreters between logic rules and such unstructured knowledge, enhancing the flexibility and robustness of rule activations. Our preliminary experiments demonstrate the effectiveness of our solution in preserving correctness and explainability while significantly improving adaptability. As future work, we aim to further refine the underlying formalism of our proposal and tackle challenges related to accuracy and scalability, particularly critical when processing large amounts of text as input knowledge for complex reasoning tasks. We believe this approach lays the foundation for deeper and more synergistic interactions between KRR systems and LLMs, fostering human-like reasoning in real-world contexts. Acknowledgments The work on this paper was partially supported by the Vienna Science and Technology Fund (WWTF) [10.47379/ICT2201, 10.47379/ VRG18013, 10.47379/NXT22018]; and the Christian Doppler Research Association (CDG) JRC LIVE. References [1] L. Bellomarini, L. Bencivelli, C. Biancotti, L. Blasi, F. P. Conteduca, A. Gentili, et al., Reasoning on company takeovers: From tactic to strategy, Data Knowl. Eng. 141 (2022) 102073. [2] O. P. Dwyer, T. Baldazzi, J. Davies, E. Sallinger, A. Vlad, Reasoning over health records with Vadalog: a rule-based approach to patient pathways (2023). [3] L. Caroprese, E. Vocaturo, E. Zumpano, Argumentation approaches for explanaible ai in medical informatics, Intelligent Systems with Applications 16 (2022) 200109. URL: https://doi.org/10.1016/j. iswa.2022.200109. [4] A. Calì, G. Gottlob, T. Lukasiewicz, A general datalog-based framework for tractable query answering over ontologies, J. Web Semant. 14 (2012) 57–83. doi:10.1016/j.websem.2012.03. 001. [5] L. Bellomarini, G. Gottlob, A. Pieris, E. Sallinger, Swift logic for big data and knowledge graphs: Overview of requirements, language, and system, in: SOFSEM 2018: Theory and Practice of Computer Science: 44th International Conference on Current Trends in Theory and Practice of Computer Science, Krems, Austria, January 29-February 2, 2018, Proceedings 44, Springer, 2018, pp. 3–16. [6] D. K. Kanbach, L. Heiduk, G. Blueher, M. Schreiter, A. Lahmann, The genai is out of the bottle: gen- erative artificial intelligence from a business model innovation perspective, Review of Managerial Science (2023) 1–32. [7] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., Improving language understanding by generative pre-training, . (2018). [8] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhar- gava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv preprint arXiv:2307.09288 (2023). [9] Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, et al., A survey on evaluation of large language models, ACM Transactions on Intelligent Systems and Technology 15 (2024) 1–45. [10] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa, Large language models are zero-shot reasoners, NIPS 35 (2022) 22199–22213. [11] D. Shu, T. Chen, M. Jin, Y. Zhang, M. Du, Y. Zhang, Knowledge graph large language model (kg-llm) for link prediction, arXiv preprint arXiv:2403.07311 (2024). [12] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., Chain-of-thought prompting elicits reasoning in large language models, NIPS 35 (2022) 24824–24837. [13] H. Zhao, H. Chen, F. Yang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, M. Du, Explainability for large language models: A survey, arXiv preprint arXiv:2309.01029 (2023). [14] J. Fandinno, C. Schulz, Answering the “why” in answer set programming–a survey of explanation approaches, Theory and Practice of Logic Programming 19 (2019) 114–203. [15] J. I. Hong, Teaching the fate community about privacy, Commun. ACM 66 (2023) 10–11. [16] L. Bellomarini, D. Benedetto, G. Gottlob, E. Sallinger, Vadalog: A modern architecture for automated reasoning with large knowledge graphs, IS 105 (2022). [17] L. Bellomarini, D. Fakhoury, G. Gottlob, E. Sallinger, Knowledge graphs and enterprise AI: the promise of an enabling technology, in: ICDE, 2019, pp. 26–37. [18] C. Beeri, M. Y. Vardi, A proof procedure for data dependencies, Journal of the ACM (JACM) 31 (1984) 718–741. [19] P. Lewis, E. Perez, A. Piktus, F. Petroni, et al., Retrieval-augmented generation for knowledge- intensive nlp tasks, NeurIPS 33 (2020) 9459–9474. [20] G. Gottlob, A. Pieris, Beyond sparql under owl 2 ql entailment regime: Rules to the rescue, in: Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. [21] A. Shkapsky, M. Yang, C. Zaniolo, Optimizing recursive queries with monotonic aggregates in deals, in: 2015 IEEE 31st International Conference on Data Engineering, IEEE, 2015, pp. 867–878. [22] S. Abiteboul, R. Hull, V. Vianu, Foundations of databases, volume 8, Addison-Wesley Reading, 1995. [23] L. Bellomarini, E. Sallinger, G. Gottlob, The vadalog system: Datalog-based reasoning for knowl- edge graphs, Proc. VLDB Endow. 11 (2018) 975–987. URL: https://doi.org/10.14778/3213880.3213888. doi:10.14778/3213880.3213888. [24] T. Baldazzi, L. Bellomarini, E. Sallinger, P. Atzeni, Reasoning in warded datalog+/-with harmful joins., in: SEBD, 2022, pp. 292–299. [25] T. Baldazzi, L. Bellomarini, M. Favorito, E. Sallinger, Ontological reasoning over shy and warded datalog+/–for streaming-based architectures, in: International Symposium on Practical Aspects of Declarative Languages, Springer, 2024, pp. 169–185. [26] G. Graefe, W. J. McKenna, The volcano optimizer generator: Extensibility and efficient search, in: ICDE, IEEE Computer Society, 1993, pp. 209–218. [27] A. Gulino, S. Ceri, G. Gottlob, E. Sallinger, L. Bellomarini, Distributed company control in company shareholding graphs, in: IEEE 37th International Conference on Data Engineering (ICDE), Los Alamitos, CA, USA, 2021, pp. 2637–2648. [28] T. Baldazzi, L. Bellomarini, S. Ceri, A. Colombo, A. Gentili, E. Sallinger, Fine-tuning large enterprise language models via ontological reasoning, in: International Joint Conference on Rules and Reasoning, Springer, 2023, pp. 86–94. [29] A. d. Garcez, L. C. Lamb, Neurosymbolic ai: The 3 rd wave, Artificial Intelligence Review 56 (2023) 12387–12406. [30] X. L. Dong, Generations of knowledge graphs: The crazy ideas and the business impact, arXiv preprint arXiv:2308.14217 (2023). [31] K. Hamilton, A. Nayak, B. Bozic, L. Longo, Is neuro-symbolic AI meeting its promise in natural language processing? A structured review, CoRR abs/2202.12205 (2022). URL: https://arxiv.org/ abs/2202.12205. arXiv:2202.12205. [32] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, X. Wu, Unifying large language models and knowledge graphs: A roadmap, arXiv preprint arXiv:2306.08302 (2023). [33] M. Trajanoska, R. Stojanov, D. Trajanov, Enhancing knowledge graph construction using large language models, arXiv preprint arXiv:2305.04676 (2023). [34] T. Baldazzi, L. Bellomarini, S. Ceri, A. Colombo, A. Gentili, E. Sallinger, " please, vadalog, tell me why": Interactive explanation of datalog-based reasoning., in: EDBT, 2024, pp. 834–837. [35] S. Wang, Z. Wei, J. Xu, T. Li, Z. Fan, Unifying structure reasoning and language model pre-training for complex reasoning, arXiv preprint arXiv:2301.08913 (2023). [36] L. Pan, A. Albalak, X. Wang, W. Y. Wang, Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning, arXiv preprint arXiv:2305.12295 (2023).