New Perspectives for Fuzzy Datalog (Extended Abstract) Matthias Lanzinger1 , Stefano Sferrazza1,2 and Georg Gottlob1 1 University of Oxford, Department of Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, United Kingdom 2 TU Wien, Institut fΓΌr Logic and Computation, Favoritenstraße 9, A-1040 Wien, Austria 1. Introduction Today, new trends in data management introduce complex datasets and knowledge graphs where data is obtained via crowd-sourcing or from observations made by artificial intelligence systems. Moreover, existing knowledge graphs (KGs) are now commonly enriched using link prediction and KG-completion techniques [1]. Such data points are often associated with a degree of certainty, expressing the level of confidence of the human or AI source in the truth of the datum. These developments raise new challenges for data management and the need for formalisms that can take into account degrees of certainty in data1 . Fuzzy logic has a long history as a tool for combining logical reasoning with the different types of uncertainty that are encountered in real-world settings by interpreting degrees of certainty as degrees of truth. This naturally motivates the study of Datalog with fuzzy logic semantics as a reasoning formalism for large databases and KGs with uncertainty. Many variants of fuzzy logic programming have been proposed in the literature [2, 3, 4, 5, 6, 7], with the most active research focusing on complex multi-adjoint settings [8, 7], or Prolog- derived semantics based on fuzzy similarity of constants and fuzzy unification procedures [5]. Alternative frameworks for reasoning with uncertainty like Markov Logic Networks [9] and Probabilistic Soft Logic [10] require extensive grounding before inference that can quickly become prohibitive when reasoning over large amounts of data. Our proposed language 𝑑- Datalog aims to be a simpler alternative focused on effective reasoning in large databases and KGs with uncertainty and aims to be the fuzzy analogue of standard Datalog. In this paper, we present the 𝑑-Datalog formalism and report on ongoing research. In particular, we present a simple and efficient fixpoint procedure for computing minimal fuzzy models for 𝑑-Datalog. Furthermore, we show how Datalog with fuzzy semantics relates to Datalog 2.0 2022: 4th International Workshop on the Resurgence of Datalog in Academia and Industry, September 05, 2022, Genova - Nervi, Italy $ matthias.lanzinger@cs.ox.ac.uk (M. Lanzinger); stefano.sferrazza@cs.ox.ac.uk (S. Sferrazza); georg.gottlob@cs.ox.ac.uk (G. Gottlob)  0000-0002-7601-3727 (M. Lanzinger) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 Uncertainty is generally not the same as a probability and typical settings mentioned in the introduction often do not fit the probabilistic database framework. 42 Matthias Lanzinger et al. CEUR Workshop Proceedings 42–47 the recently proposed Datalog∘ formalism [11]. 2. Datalog over 𝑑-norms A t-norm is a commutative, monotone, and associative function βŠ™ : [0, 1] Γ— [0, 1] β†’ [0, 1] with identity element 1. They generalise Boolean conjunction to the interval [0, 1] of truth degrees. Commonly studied t-norms include min (the GΓΆdel t-norm), the Łukasiewicz t-norm π‘Ž βŠ™πΏ 𝑏 = max{0, π‘Ž + 𝑏 βˆ’ 1}, or the real product. But t-norms can also be significantly more 1 complex, e.g., all functions 𝑓𝑝 (π‘₯, 𝑦) = (π‘₯𝑝 + 𝑦 𝑝 βˆ’ 1) 𝑝 for 𝑝 < 0 are t-norms (part of the Schweizer-Sklar family of t-norms, see [12]). For an extensive overview of t-norms and fuzzy logic, we refer to HΓ‘jek [13]. The following presentation extends recent work on a restricted version of 𝑑-Datalog that allows only the Łukasiewicz t-norm [6]. A Datalog over t-norms (𝑑-Datalog) program Ξ  is a finite set of rules where each rule 𝜌 is of the form 𝑅1 (x1 ) βŠ™πœŒ Β· Β· Β· βŠ™πœŒ π‘…π‘˜ (xk ) β†’ π‘…β„Ž (xh ) (1) where βŠ™πœŒ is some t-norm2 . Note that a program is not limited to one specific t-norm but rather each rule can use a different t-norm. This is important to express different fuzzy behaviour in different rules. That is, in some rules, it may be natural to use the pessimistic interpretation of the Łukasiewicz t-norm βŠ™πΏ , while other rules in the same program may be more natural under the relatively optimistic interpretation of the GΓΆdel t-norm βŠ™πΊ . Similar formalisms have been studied extensively in more general settings, see [14, 15]. To simplify presentation, we assume some fixed global signature 𝜎 and domain Dom through- out this paper. We write GAtoms for the set of all ground atoms with respect to 𝜎 and Dom. A truth assignment is a function 𝜈 : GAtoms β†’ [0, 1], intuitively assigning a degree of truth in the real interval [0, 1] to every ground atom. We extend the application of a truth assignment 𝜈 to ground formulas 𝛾, 𝛾 β€² inductively as follows3 . 𝜈(𝛾 βŠ™ 𝛾 β€² ) = βŠ™(𝜈(𝛾), 𝜈(𝛾 β€² )) 𝜈(𝛾 β†’ 𝛾 β€² ) = min{1, 1 βˆ’ 𝜈(𝛾) + 𝜈(𝛾 β€² )} With truth degrees for atoms, it also becomes interesting to consider degrees of satisfaction. For rational 𝐾 ∈ (0, 1] we say that a rule 𝜌 is 𝐾-satisfied by 𝜈 if for every grounding 𝛾 of 𝜌 over Dom it holds that 𝜈(𝛾) β‰₯ 𝐾. In particular, a rule 𝛾 β†’ 𝛾 β€² is 1-satisfied by truth assignment 𝜈 exactly when 𝜈(𝛾) ≀ 𝜈(𝛾 β€² ). For a program Ξ , we say that a truth assignment 𝜈 is a πΎβˆ’fuzzy model if all formulas in Ξ  are 𝐾-satisfied by 𝜈. In place of the database in the classical setting, we have (finite) partial truth assignments, that is, partial functions 𝜏 : GAtoms β†’ (0, 1] that are defined for a finite number of ground atoms. Let (Ξ , 𝜏 ) be a pair where Ξ  is a set of formulas and 𝜏 is a partial truth assignment, a 2 By slight abuse of notation we use the same symbol for the t-norm and the corresponding connective in a Datalog rule 3 Note that for technical reasons there is only a single β†’ connective and it can not vary by rule. Note also that all standard implications in fuzzy logic behave the same when considering 1-satisfiability. 43 Matthias Lanzinger et al. CEUR Workshop Proceedings 42–47 𝐾-fuzzy model of (Ξ , 𝜏 ) is a 𝐾-fuzzy model 𝜈 of Ξ  where 𝜈(𝐺) = 𝜏 (𝐺) for every ground atom 𝐺 for which 𝜏 is defined. For a partial truth assignment 𝜏 , we write 𝜏 * for the induced truth assignment where 𝜏 * (𝐺) = 𝜏 (𝐺) if 𝜏 is defined for 𝐺, and 𝜏 * (𝐺) = 0 otherwise. The natural query task, given Ξ , 𝜏 and 𝑐, 𝐾 ∈ (0, 1] is whether a fact 𝐺 is at least true to a target degree 𝑐 in all 𝐾-fuzzy models of (Ξ , 𝜏 ). For truth assignments 𝜈, 𝜈 β€² , we say 𝜈 ≀ 𝜈 β€² if for every fact 𝐺, 𝜈(𝐺) ≀ 𝜈 β€² (𝐺). As in Datalog, if (Ξ , 𝜏 ) is consistent, then it has a unique minimal 𝐾-fuzzy model that can be used to evaluate the query task described above. 3. An Efficient Fixpoint Procedure for Datalog over t-norms A step-wise evaluation of 𝑑-Datalog programs that follows standard procedures for Datalog directly may have to revise the truth of a fact multiple times, as different paths of inference may imply different degrees of truth. In this section, we show that with the right kind of procedure this is not necessary and evaluation of 𝑑-Datalog requires at most as many steps of inference as classical Datalog. For program Ξ , we first define πœ‡Ξ  (𝜈) = max{ 𝜈(body(𝛾)) | 𝛾 ∈ GRules Ξ  , 𝜈(head(𝛾)) < 𝜈(body(𝛾)) + 𝐾 βˆ’ 1} where GRules Ξ  is the set of all groundings of rules in Ξ . The immediate maximal 𝐾-consequence operator 𝑇Π,𝐾 is then defined as 𝑇Π,𝐾 (𝜈) = 𝜈 β€² , where 𝜈 is a truth assignment and 𝜈 β€² is the truth assignment determined by the following: 1. if 𝐺 is the head of ground rule 𝛾 with 𝜈(body(𝛾)) = πœ‡Ξ  (𝜈) > 0 that is not 𝐾-satisfied by 𝜈, then set 𝜈 β€² (𝐺) = 𝜈(body(𝛾)) + 𝐾 βˆ’ 1, 2. otherwise set 𝜈 β€² (𝐺) = 𝜈(𝐺). (𝛼) Clearly the operator is monotone, i.e., if 𝜈1 ≀ 𝜈2 , then also 𝑇Π,𝐾 (𝜈1 ) ≀ 𝑇Π,𝐾 (𝜈2 ). Let 𝑇Π,𝐾 denote the 𝛼-fold application of 𝑇Π,𝐾 . It is known that 𝑇Π,𝐾 always reaches a fixpoint in a (∞) finite number of steps, we write 𝑇Π,𝐾 for the application of the operator until a fixpoint is (𝛼) (𝛼+1) reached. We call the smallest 𝛼 where 𝑇Π,𝐾 (𝜈) = 𝑇Π,𝐾 (𝜈) the fixpoint index of Ξ , 𝜈 (for 𝐾). Observe that when 𝜈 maps only to {0, 1} (i.e., a classical database), 𝑇Π,1 is precisely the standard immediate consequence operator for Datalog. It is not difficult to see that the procedure can be used for the evaluation of 𝑑-Datalog. Theorem 1. Let Ξ  be a 𝑑-Datalog program and let 𝜏 be a finite partial truth assignment. Then (∞) 𝑇Π,𝐾 (𝜏 * ) is the minimal 𝐾-fuzzy model of (Ξ , 𝜏 ). Simpler immediate consequence operators have been studied in similar settings [7, 15] but without a computational focus. The important distinction from a practical perspective is the restriction to only considering the immediate maximal 𝐾-consequences, i.e., only those immediate consequences with maximum truth degree. This awareness of the truth degree reveals an important property of the procedure: the truth value of each 𝐺 ∈ GAtoms changes at most once during the computation of 𝑇Π,𝐾∞ . In consequence, the computation of a fuzzy minimal model for a 𝑑-Datalog program requires at most as many steps as the standard fixpoint 44 Matthias Lanzinger et al. CEUR Workshop Proceedings 42–47 procedure in the classical setting. In the following we write πœˆΞ” to refer to the truth assignment that sets πœˆΞ” (𝐺) = 1 if 𝜈(𝐺) > 0, and πœˆΞ” (𝐺) = 0 otherwise. Theorem 2. Let Ξ  be a 𝑑-Datalog program and let 𝜏 be a finite partial truth assignment. The fixpoint index of Ξ , 𝜏 * for 𝐾 is less or equal to the fixpoint index of Ξ , πœΞ” * for 𝐾. Theorem 2 also holds when we adapt the operator 𝑇Π,𝐾 to only change one atom at a time. That is, 𝑑-Datalog reasoning requires the materialisation of at most as many atoms as reasoning in Datalog over the corresponding crisp database. This is a significant improvement over the previously mentioned Markov Logic Networks [9] and Probabilistic Soft Logic [10] which require extensive grounding before inference and suggests that reasoning in 𝑑-Datalog can be feasible even for large datasets by efficiently maintaining a list of unsatisfied rules ordered by truth degree of the body4 . We see that the semantics of 𝑑-Datalog closely follow classical Datalog semantics, both in terms of minimal models as well as fixpoint semantics. This provides a tight link to existing Datalog literature and thus opens up clear directions for extensions analogous to work for classical Datalog. Among them, we are particularly interested in fuzzy extensions of the DatalogΒ± family of languages, 𝑑-Datalog with stratified negation, as well as integration with existing Datalog reasoning systems where implementations are based on the immediate consequence operator. 4. t-norms and Datalog∘ Very recently, Abo Khamis, et al. [11] introduced Datalog∘ as an extension of Datalog over partially ordered pre-semirings. This is motivated by the desire to express numerical recursive tasks, such as linear regression or shortest path computations in a Datalog-style language. Beyond numerical applications, Fitting’s three-valued logic and Belnap’s four-valued logic are also explored as interesting applications of Datalog∘ . Here we note that this connection to many- valued logics can be significantly expanded. Observe that for any t-norm βŠ™, ([0, 1], max, βŠ™, 0, 1) is a semiring (that also enjoys all necessary other technical properties required by Datalog∘ ). One can then show that for a 𝑑-Datalog program Ξ  that mentions the same t-norm βŠ™ in all rules, the minimal 1-fuzzy models of (Ξ , 𝜏 ) are exactly the same as the least fixpoints considered in the semantics of Datalog∘ over the semiring corresponding to βŠ™ with the natural partial-ordering of [0, 1] by ≀5 . This raises the question of whether the Datalog∘ framework and in particular its convergence conditions, can be extended to consider individual semirings per rule (even with a shared addition monoid). This would allow full expression of 𝑑-Datalog in Datalog∘ , while also opening up the possibility of combining complex fuzzy reasoning with various forms of aggregation. 4 Note that a practical implementation of 𝑇Π,𝐾 does not actually require the materialisation of the full set GRules Ξ  to compute πœ‡Ξ  . 5 It remains unclear if the same is also possible for 𝐾-fuzzy models where 𝐾 < 1 45 Matthias Lanzinger et al. CEUR Workshop Proceedings 42–47 5. Conclusion & Outlook We reported on new motivations and ongoing research in the intersection between fuzzy logic and Datalog. The procedure introduced in Section 3 provides an important foundation for future work. We are in the process of extending the Vadalog system [16] to support arbitrary t-norms in rule bodies following the ideas presented there. Following the implementation, we plan for a large-scale experimental evaluation to verify the feasibility of 𝑑-Datalog over large uncertain datasets. Furthermore, the procedure provides a foundation for fuzzy extensions of DatalogΒ± languages as it can naturally be extended to a chase procedure. Additionally, the observed connections to Datalog∘ present further promising directions for future research. Acknowledgements Stefano Sferrazza was supported by the Austrian Science Fund (FWF):P30930. Georg Gott- lob is a Royal Society Research Professor and acknowledges support by the Royal Society in this role through the β€œRAISON DATA” project (Reference No. RP\R1\201074). Matthias Lanzinger acknowledges support by the Royal Society β€œRAISON DATA” project (Reference No. RP\R1\201074). References [1] A. Rossi, D. Barbosa, D. Firmani, A. Matinata, P. Merialdo, Knowledge graph embedding for link prediction: A comparative analysis, ACM Trans. Knowl. Discov. Data 15 (2021) 14:1–14:49. URL: https://doi.org/10.1145/3424672. doi:10.1145/3424672. [2] Á. Achs, A. Kiss, Fuzzy extension of datalog, Acta Cybernetica 12 (1995) 153–166. [3] R. Ebrahim, Fuzzy logic programming, Fuzzy Sets Syst. 117 (2001) 215–230. URL: https: //doi.org/10.1016/S0165-0114(98)00300-5. doi:10.1016/S0165-0114(98)00300-5. [4] P. Eklund, F. Klawonn, Neural fuzzy logic programming, IEEE Trans. Neural Networks 3 (1992) 815–818. URL: https://doi.org/10.1109/72.159071. doi:10.1109/72.159071. [5] P. J. Iranzo, F. SΓ‘enz-PΓ©rez, A fuzzy datalog deductive database system, IEEE Trans. Fuzzy Syst. 26 (2018) 2634–2648. URL: https://doi.org/10.1109/TFUZZ.2018.2806923. doi:10. 1109/TFUZZ.2018.2806923. [6] M. Lanzinger, S. Sferrazza, G. Gottlob, MV-Datalog+-: Effective Rule-based Reasoning with Uncertain Observations, 2022. [7] J. Medina, M. Ojeda-Aciego, P. VojtΓ‘s, A procedural semantics for multi-adjoint logic programming, in: Proc. EPIA, volume 2258 of Lecture Notes in Computer Science, Springer, 2001, pp. 290–297. URL: https://doi.org/10.1007/3-540-45329-6_29. doi:10.1007/ 3-540-45329-6\_29. [8] M. E. Cornejo, D. Lobo, J. Medina, Syntax and semantics of multi-adjoint normal logic programming, Fuzzy Sets Syst. 345 (2018) 41–62. URL: https://doi.org/10.1016/j.fss.2017.12. 009. doi:10.1016/j.fss.2017.12.009. [9] M. Richardson, P. M. Domingos, Markov logic networks, Mach. Learn. 62 (2006) 107–136. URL: https://doi.org/10.1007/s10994-006-5833-1. doi:10.1007/s10994-006-5833-1. 46 Matthias Lanzinger et al. CEUR Workshop Proceedings 42–47 [10] S. H. Bach, M. Broecheler, B. Huang, L. Getoor, Hinge-Loss Markov Random Fields and Probabilistic Soft Logic, J. Mach. Learn. Res. 18 (2017) 109:1–109:67. URL: http: //jmlr.org/papers/v18/15-631.html. [11] M. A. Khamis, H. Q. Ngo, R. Pichler, D. Suciu, Y. R. Wang, Convergence of datalog over (pre-) semirings, in: Proc. PODS, ACM, 2022, pp. 105–117. doi:10.1145/3517804.3524140. [12] X. Zhang, H. He, Y. Xu, A fuzzy logic system based on Schweizer-Sklar t-norm, Sci. China Ser. F Inf. Sci. 49 (2006) 175–188. URL: https://doi.org/10.1007/s11432-006-0175-y. doi:10.1007/s11432-006-0175-y. [13] P. HΓ‘jek, Metamathematics of Fuzzy Logic, volume 4 of Trends in Logic, Kluwer, 1998. URL: https://doi.org/10.1007/978-94-011-5300-3. doi:10.1007/978-94-011-5300-3. [14] J. Medina, M. Ojeda-Aciego, P. VojtΓ‘s, Multi-adjoint logic programming with contin- uous semantics, in: Proc. LPNMR, volume 2173 of Lecture Notes in Computer Science, Springer, 2001, pp. 351–364. URL: https://doi.org/10.1007/3-540-45402-0_26. doi:10.1007/ 3-540-45402-0\_26. [15] P. VojtΓ‘s, Fuzzy logic programming, Fuzzy Sets Syst. 124 (2001) 361–370. URL: https: //doi.org/10.1016/S0165-0114(01)00106-3. doi:10.1016/S0165-0114(01)00106-3. [16] L. Bellomarini, E. Sallinger, G. Gottlob, The vadalog system: Datalog-based reasoning for knowledge graphs, Proc. VLDB Endow. 11 (2018) 975–987. URL: http://www.vldb.org/ pvldb/vol11/p975-bellomarini.pdf. doi:10.14778/3213880.3213888. 47