Reverse Engineering of Temporal Queries with and without LTL Ontologies: First Steps Extended Abstract Marie Fortin1 , Boris Konev1 , Vladislav Ryzhikov2 , Yury Savateev2 , Frank Wolter1 and Michael Zakharyaschev2,3 1 Department of Computer Science, University of Liverpool, UK 2 Department of Computer Science and Information Systems, Birkbeck, University of London, UK 3 School of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russia Abstract In reverse engineering of database queries, one aims to construct a query from a set of positively and negatively labelled answers and non-answers. The query can then be used to explore the data further or as an explanation of the answers and non-answers. We consider this reverse engineering problem for queries formulated in various fragments of positive linear temporal logic LTL over data instances given by timestamped atomic concepts. We focus on the design of suitable query languages and the complexity of the separability problem: β€˜does there exist a query in the given query language that separates the given answers from the non-answers?’. We deal with both plain LTL queries and those that are mediated by ontologies providing background knowledge and formulated in fragments of clausal LTL. Keywords Reverse engineering of queries, query-by-example, explanation, linear temporal logic, ontology-mediated query, computational complexity. 1. Introduction Supporting users of databases by constructing a query from examples of answers and non- answers to the query has been a major research area for many years [1]. In the database community, research has focussed on standard query languages such as (fragments of) SQL, graph query languages, and SPARQL [2, 3, 4, 5, 6, 7, 8, 9]. The KR community has been concerned with constructing queries from examples under the open world semantics and with background knowledge given by an ontology [10, 11, 12, 13, 14]. In both cases, the focus has been on general multi-purpose query languages. A fundamental problem that has been investigated by both communities is known as separability or query-by-example: given sets 𝐸 + and 𝐸 βˆ’ of pairs (π’Ÿ, 𝑑) with a database π’Ÿ and a tuple 𝑑 in π’Ÿ, and a query language 𝒬, does there exist a query π‘ž ∈ 𝒬 that separates (𝐸 + , 𝐸 βˆ’ ) in the sense that π’Ÿ |= π‘ž(𝑑) for all (π’Ÿ, 𝑑) ∈ 𝐸 + and π’Ÿ ΜΈ|= π‘ž(𝑑) for all (π’Ÿ, 𝑑) ∈ 𝐸 βˆ’ (or π’ͺ, π’Ÿ |= π‘ž(𝑑) for all (π’Ÿ, 𝑑) ∈ 𝐸 + and π’ͺ, π’Ÿ ΜΈ|= π‘ž(𝑑) for all DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel $ marie.fortin@liverpool.ac.uk (M. Fortin); boris.konev@liverpool.ac.uk (B. Konev); vlad@dcs.bbk.ac.uk (V. Ryzhikov); yury@dcs.bbk.ac.uk (Y. Savateev); wolter@liverpool.ac.uk (F. Wolter); michael@dcs.bbk.ac.uk (M. Zakharyaschev) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (π’Ÿ, 𝑑) ∈ 𝐸 βˆ’ if an ontology π’ͺ is present)?1 There are various strategies to ensure that the query π‘ž is a generalisation of the positive examples and does not overfit. For instance, one can ask for the existence of a small separating query in 𝒬 or one can choose a query language that enforces generalisation by not admitting disjunction. In the latter case, query-by-example is often very hard computationally: it is coNExpTime-complete for conjunctive queries (CQs) over standard relational databases [15, 16] and even undecidable for CQs under ℰℒℐ or π’œβ„’π’ž ontologies [17]. In many applications, the input data is timestamped and queries are naturally formulated in languages with temporal operators. Taking into account the prohibitive complexity of many query-by-example problems already in the static case, it does not seem wise to start an investi- gation of the temporal case by considering temporal extensions of standard query languages (which can only lead to computationally even harder problems). Instead, we investigate the simpler but still very useful case in which data, π’Ÿ, is a set of timestamped atomic concepts. Our query languages are positive fragments of linear temporal logic LTL with the temporal operators β—‡ (eventually), β—‹ (next), and U (until) interpreted under the strict semantics [18]. To avoid overfitting, we only consider such fragments without ∨. The most expressive query language we deal with, 𝒬[U], is thus defined as the set of formulas constructed from atoms using ∧ and U (via which β—‹ and β—‡ can be defined). The fragments 𝒬[β—‡], 𝒬[β—‹], and 𝒬[β—‹, β—‡] are defined analogously. Within this temporal setting, we take a broad view of the potential applications of the reverse engineering of queries and the query-by-example problem. On the one hand, there are non- expert end-users who would like to explore data via queries but are not familiar with temporal logic. They usually are, however, capable of providing data examples illustrating the queries they are after. Query-by-example supports such users in the construction of those queries. On the other hand, the positive and negative data examples might come from an application, and the user is interested in possible explanations of the examples. Such an explanation is then provided by a temporal query separating the positive examples from the negative ones. In this case, our goal is similar to recent work on learning linear temporal logic formulas and, more generally, explainable AI [19, 20, 21, 22, 23]. The following example illustrates this point. Example 1. Imagine an engineer whose task is to explain the behaviour of the monitored equipment (say, why an engine stops) in terms of qualitative sensor data such as β€˜low tempera- ture’, which can be represented by the atomic concept 𝑇 , β€˜strong vibration’, 𝑉 , etc. Suppose the engine stopped after the runs π’Ÿ1+ and π’Ÿ2+ shown below but did not stop after the runs π’Ÿ1βˆ’ , π’Ÿ2βˆ’ , π’Ÿ3βˆ’ , where we assume the runs to start at 0 and measurements to be recorded at 0, 1, 2, . . . : π’Ÿ1+ = {𝑇 (2), 𝑉 (4)}, π’Ÿ2+ = {𝑇 (1), 𝑉 (3)}, π’Ÿ1βˆ’ = {𝑇 (1)}, π’Ÿ2βˆ’ = {𝑉 (4)}, π’Ÿ3βˆ’ = {𝑉 (1), 𝑇 (2)}. The β—‡-query π‘ž = β—‡(𝑇 ∧ ◇◇𝑉 ) is true at 0 in the positive data instances π’Ÿπ‘–+ , false in the negative ones π’Ÿπ‘–βˆ’ , and so provides a possible natural explanation of what could cause the engine failure. The example set ({π’Ÿ3+ , π’Ÿ4+ }, {π’Ÿ4βˆ’ }) with π’Ÿ3+ = {𝑇 (1), 𝑉 (2)}, π’Ÿ4+ = {𝑇 (1), 𝑇 (2), 𝑉 (3)}, π’Ÿ4βˆ’ = {𝑇 (1), 𝑉 (3)} can be explained by the U-query 𝑇 U 𝑉 . Using background knowledge of the domain, we can 1 If such a π‘ž exists, then (𝐸 + , 𝐸 βˆ’ ) is often called satisfiable w.r.t. 𝒬 and the construction of π‘ž is called learning. compensate for sensor failures, which result in incomplete data. To illustrate, suppose that Β―+ π’Ÿ 1 = {𝐻(3), 𝑉 (4)}, where 𝐻 stands for β€˜heater is on’. If a background ontology π’ͺ contains the axiom ○𝐻 β†’ 𝑇 saying that a heater can only be triggered by the low temperature at the previous moment, then the same π‘ž will separate {π’ŸΒ― + , π’Ÿ+ } from {π’Ÿβˆ’ , π’Ÿβˆ’ , π’Ÿβˆ’ } under π’ͺ. ⊣ 1 2 1 2 3 The queries used in Example 1 are of a particular β€˜linear’ form and suggest a restriction to path queries in which the order of the atoms is fixed and not left open as in ◇𝐴 ∧ ◇𝐡. More precisely, path β—‹β—‡-queries in the class 𝒬𝑝 [β—‹, β—‡] take the form π‘ž = 𝜌0 ∧ π‘œ1 (𝜌1 ∧ π‘œ2 (𝜌2 ∧ Β· Β· Β· ∧ π‘œπ‘› πœŒπ‘› )), (1) where π‘œπ‘– ∈ {β—‹, β—‡} and πœŒπ‘– is a conjunction of atoms; 𝒬𝑝 [β—‡] and 𝒬𝑝 [β—‹] restrict π‘œπ‘– to {β—‡} and {β—‹}, respectively; and path U-queries in the class 𝒬𝑝 [U] take the form π‘ž = 𝜌0 ∧ (πœ†1 U (𝜌1 ∧ (πœ†2 U (. . . (πœ†π‘› U πœŒπ‘› ) . . . )))), (2) where πœ†π‘– is a conjunction of atoms or βŠ₯. Path queries are motivated by two observations. First, if a query language 𝒬 allows conjunctions of queries, then, dually to the overfitting problem for disjunction, the admission of multiple negative examplesβ‹€οΈ€ becomes trivialised: if queries π‘ž π’Ÿ separate (𝐸 + , {π’Ÿ}) for π’Ÿ ∈ 𝐸 βˆ’ , then the conjunction π’ŸβˆˆπΈ βˆ’ π‘ž π’Ÿ separates (𝐸 + , 𝐸 βˆ’ ). In particular, the query-by-example problem becomes polynomially reducible to its version with a single negative example. This is clearly not the case for path queries. Example 2. Let π’Ÿ1 = {𝐴(1), 𝐡(2)}, π’Ÿ2 = {𝐡(1), 𝐴(2)}, π’Ÿ3 = {𝐴(1)} and π’Ÿ4 = {𝐡(1)}. Then ({π’Ÿ1 , π’Ÿ2 }, π’Ÿ3 ) and ({π’Ÿ1 , π’Ÿ2 }, π’Ÿ4 ) are separated in 𝒬𝑝 [β—‡] by ◇𝐡 and ◇𝐴, respectively; ({π’Ÿ1 , π’Ÿ2 }, {π’Ÿ3 , π’Ÿ4 }) is separated in 𝒬[β—‡] by ◇𝐡 ∧ ◇𝐴, but it is not 𝒬𝑝 [β—‡]-separable. ⊣ Second, numerous natural types of query classes from applications are represented by path queries. For example, the existence of a common subsequence of the positive examples that is not a subsequence of any negative example corresponds to the existence of a separating query in 𝒬𝑝 [β—‡] with 𝜌0 = ⊀ and πœŒπ‘– ΜΈ= ⊀ for 𝑖 > 0, and the existence of a common subword of the positive examples that is not a subword of any negative example corresponds to the existence of a separating query of the form β—‡(𝜌1 ∧ β—‹(𝜌2 ∧ Β· Β· Β· ∧ β—‹πœŒπ‘› )). The unique characterisability and learnability of path queries is investigated in [24]. Except for 𝒬𝑝 [β—‹] = 𝒬[β—‹] (modulo logical equivalence), no nontrivial inclusion relations hold between the separation capabilities of the query languages introduced above, as illustrated by the following example. Example 3. (1) Let π’Ÿ1 = {𝐴(1)}, π’Ÿ2 = {𝐴(2)} and 𝐸 = ({π’Ÿ1 }, {π’Ÿ2 }). Then ○𝐴 separates 𝐸 but no query in 𝒬[β—‡] does. On the other hand, 𝐸 is not 𝒬-separable under π’ͺ = {○𝐴 β†’ 𝐴}, for any class 𝒬 defined above, as π’ͺ, π’Ÿ1 |= π‘ž(0) implies π’ͺ, π’Ÿ2 |= π‘ž(0) for all π‘ž ∈ 𝒬. (2) Let 𝐸 = ({π’Ÿ1 , π’Ÿ2 }, βˆ…) with π’Ÿ1 and π’Ÿ2 as in (1). Then ◇𝐴 separates 𝐸 but no 𝒬[β—‹]-query does. Observe that at least two positive examples are needed to achieve this effect. However, ○○○𝐡 separates 𝐸 under π’ͺ = {𝐴 β†’ ░𝐡}. (3) Let 𝐸 = ({{𝐴(1), 𝐡(2)}, {𝐴(2), 𝐡(3)}}, {{𝐴(3), 𝐡(5)}}). Then β—‡(𝐴 ∧ ○𝐡) separates 𝐸 but no query in 𝒬[β—‹] or 𝒬[β—‡] does. (4) 𝐴 U 𝐡 separates ({{𝐡(1)}, {𝐴(1), 𝐡(2)}}, {{𝐡(2)}}) but no 𝒬[β—‹, β—‡]-query does. ⊣ 2. Our Contribution We now briefly present our initial results on the complexity of the separability problem for LTL queries, both plain and mediated by an LTL-ontology. Ontology-free LTL queries. Separability in 𝒬[β—‹] is almost trivial as it corresponds to the existence of a time point where some atom holds in all positive examples but in no negative example, which is decidable in polynomial time. For the query languages ranging from 𝒬𝑝 [β—‡] and 𝒬[β—‡] to 𝒬𝑝 [β—‹, β—‡] and 𝒬[β—‹, β—‡] and also 𝒬𝑝 [U], separability turns out to be NP-complete. The upper bound is proved by observing that, in any of these languages, every separable example set can be separated by a query of polynomial size. The matching lower bound is established by reduction of the NP-hard problem of deciding whether the words in a given set contain a common subsequence of a given length [25]. Separability by 𝒬[U]-queries turns out to be trickier because of the interplay of ∧, the left- and the right-hand sides of the U-operator. Example 4. The example set below, where the instance on the right is negative, is separated by 𝐴2 , 𝐡 1 𝐡2 𝐴1 , 𝐡 2 𝐡1 𝐡1 𝐡2 0 1 2 3 4 5 0 1 2 3 0 1 2 3 4 the 𝒬[U]-query β—‡ (𝐴1 U 𝐡1 ) ∧ (𝐴2 U 𝐡2 ) but is not separable in any other class of queries. (οΈ€ )οΈ€ We give a separability criterion in terms of U-simulations between subsets of the disjoint union of the positive examples and points of a negative example (cf. [26]). Then, using a game- theoretic variant of U-simulations, we show that a separating 𝒬[U]-query can be constructed in PSpace. However, at the moment, we only have an NP-lower bound for separability. Separability under LTL Ontologies. Apart from full LTL, we consider its fragment LTLβ–‘β—‡ that only uses the operators β–‘ and β—‡, also known as the Prior logic [27, 28, 29, 30], and the Horn fragment LTLβ–‘ β—‹ horn containing axioms of the form 𝐢1 ∧ Β· Β· Β· ∧ πΆπ‘˜ β†’ πΆπ‘˜+1 , where the 𝐢𝑖 are atoms possibly prefixed by β–‘ and β—‹ for 𝑖 ≀ π‘˜ + 1, and also by β—‡ for 𝑖 ≀ π‘˜. The ontology axioms are supposed to hold at all times. Separability by (path) β—‡-queries is Σ𝑝2 -complete under LTLβ–‘β—‡ ontologies and PSpace- complete under LTLβ–‘ β—‹ horn ontologies. For LTL ontologies, we have a NExpTime upper bound. We conjecture that exactly the same bounds can be proved for (path) β—‹β—‡-queries. As concerns 𝒬𝑝 [U]-queries, separability under LTLβ–‘ β—‹ horn ontologies is shown to be between ExpSpace and NExpTime; for β€˜branching’ 𝒬[Uβˆ’ ]-queries without nesting U-operators on the left of U, it can be decided in ExpTime using U-simulations. We establish the upper bounds by constructing two exponential-size transition systems 𝑆 + and 𝑆 βˆ’ from (π’ͺ, 𝐸 + ) and (π’ͺ, 𝐸 βˆ’ ) such that (𝑖) there is a trace-based simulation of 𝑆 + by 𝑆 βˆ’ iff (𝐸 + , 𝐸 βˆ’ ) is separated in 𝒬𝑝 [U] and (𝑖𝑖) there is a tree-based simulation of 𝑆 + by 𝑆 βˆ’ iff (𝐸 + , 𝐸 βˆ’ ) is separated in 𝒬[Uβˆ’ ]. The existence of trace-based and tree-based simulations can be decided in PSpace- and P, respectively [31]. The obtained complexity results are summarised in the table below: QBE(β„’, 𝒬) LTL LTLβ–‘ β—‹ horn LTLβ–‘β—‡ QBE(𝒬) 𝒬[U] ? β‰₯ NP, ≀ PSpace 𝒬[Uβˆ’ ] ≀ ExpTime 𝒬𝑝 [U] ? β‰₯ NExpTime, ≀ ExpSpace ? 𝒬[β—‹, β—‡] ≀ ExpTime 𝒬𝑝 [β—‹, β—‡] ≀ ExpSpace = NP 𝒬[β—‡] ≀ NExpTime = PSpace = Σ𝑝2 𝒬𝑝 [β—‡] Acknowledgments This research was supported by the EPSRC UK grants EP/S032207 and EP/S032282 for the joint project β€˜quantMD : Ontology-Based Management for Many-Dimensional Quantitative Data’ and by the RSF grant 22-11-00323 when M. Zakharyaschev was visiting the HSE University. References [1] D. M. L. Martins, Reverse engineering database queries from examples: State-of-the-art, challenges, and research opportunities, Inf. Syst. 83 (2019) 89–100. [2] M. Zhang, H. Elmeleegy, C. M. Procopiuc, D. Srivastava, Reverse engineering complex join queries, in: Proc. of SIGMOD, 2013, pp. 809–820. [3] Y. Y. Weiss, S. Cohen, Reverse engineering SPJ-queries from examples, in: Proc. of PODS, 2017, pp. 151–166. [4] D. V. Kalashnikov, L. V. Lakshmanan, D. Srivastava, Fastqre: Fast query reverse engineering, in: Proc. of SIGMOD, 2018, pp. 337–350. [5] D. Deutch, A. Gilad, Reverse-engineering conjunctive queries from provenance examples., in: Proc. of EDBT, 2019, pp. 277–288. [6] S. Staworko, P. Wieczorek, Learning twig and path queries, in: Proc. of ICDT, 2012, pp. 140–154. [7] P. BarcelΓ³, M. Romero, The complexity of reverse engineering problems for conjunctive queries, in: Proc. of ICDT, 2017, pp. 7:1–7:17. [8] S. Cohen, Y. Y. Weiss, The complexity of learning tree patterns from example graphs, ACM Trans. Database Syst. 41 (2016) 14:1–14:44. URL: https://doi.org/10.1145/2890492. doi:10.1145/2890492. [9] M. Arenas, G. I. Diaz, E. V. Kostylev, Reverse engineering SPARQL queries, in: Proc. of WWW, 2016, pp. 239–249. [10] V. GutiΓ©rrez-Basulto, J. C. Jung, L. Sabellek, Reverse engineering queries in ontology- enriched systems: The case of expressive Horn description logic ontologies, in: Proc. of IJCAI-ECAI, 2018. [11] M. Ortiz, Ontology-mediated queries from examples: a glimpse at the DL-Lite case, in: Proc. of GCAI, 2019, pp. 1–14. [12] J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Logical separability of incomplete data under ontologies, in: Proc. of KR, 2020. [13] G. Cima, F. Croce, M. Lenzerini, Query definability and its approximations in ontology- based data management, in: G. Demartini, G. Zuccon, J. S. Culpepper, Z. Huang, H. Tong (Eds.), CIKM ’21: The 30th ACM International Conference on Information and Knowl- edge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, ACM, 2021, pp. 271–280. URL: https://doi.org/10.1145/3459637.3482466. doi:10.1145/3459637. 3482466. [14] J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Separating data examples by description logic concepts with restricted signatures, in: Proc. of KR, 2021. [15] R. Willard, Testing expressibility is hard, in: D. Cohen (Ed.), Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010. Proceedings, volume 6308 of Lecture Notes in Computer Science, Springer, 2010, pp. 9–23. URL: https://doi.org/10.1007/978-3-642-15396-9_4. doi:10. 1007/978-3-642-15396-9\_4. [16] B. ten Cate, V. Dalmau, The product homomorphism problem and applications, in: Proc. of ICDT, 2015, pp. 161–176. [17] M. Funk, J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Learning description logic concepts: When can positive and negative examples be separated?, in: Proc. of IJCAI, 2019, pp. 1682–1688. [18] S. Demri, V. Goranko, M. Lange, Temporal Logics in Computer Science, Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, 2016. [19] C. Lemieux, D. Park, I. Beschastnikh, General LTL specification mining (T), in: Proc. of ASE, IEEE, 2015, pp. 81–92. [20] D. Neider, I. Gavran, Learning linear temporal properties, in: Proc. of FMCAD 2018, IEEE, 2018, pp. 1–10. URL: https://doi.org/10.23919/FMCAD.2018.8603016. doi:10.23919/ FMCAD.2018.8603016. [21] A. Camacho, S. A. McIlraith, Learning interpretable models expressed in linear temporal logic, in: Proc. of ICAPS 2018, AAAI Press, 2019, pp. 621–630. URL: https://aaai.org/ojs/ index.php/ICAPS/article/view/3529. [22] N. Fijalkow, G. Lagarde, The complexity of learning linear temporal formulas from examples, in: J. Chandlee, R. Eyraud, J. Heinz, A. Jardine, M. Zaanen (Eds.), Proceedings of the 15th International Conference on Grammatical Inference, 23-27 August 2021, Virtual Event, volume 153 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 237–250. URL: https://proceedings.mlr.press/v153/fijalkow21a.html. [23] R. Raha, R. Roy, N. Fijalkow, D. Neider, Scalable anytime algorithms for learning fragments of linear temporal logic, in: D. Fisman, G. Rosu (Eds.), Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of Lecture Notes in Computer Science, Springer, 2022, pp. 263–280. URL: https://doi.org/10. 1007/978-3-030-99524-9_14. doi:10.1007/978-3-030-99524-9\_14. [24] M. Fortin, B. Konev, V. Ryzhikov, Y. Savateev, F. Wolter, M. Zakharyaschev, Unique characterisability and learnability of temporal instance queries, in: Proc. of KR, 2022. [25] D. Maier, The complexity of some problems on subsequences and supersequences, J. ACM 25 (1978) 322–336. URL: https://doi.org/10.1145/322063.322075. doi:10.1145/322063. 322075. [26] N. Kurtonina, M. de Rijke, Bisimulations for temporal logic, J. Log. Lang. Inf. 6 (1997) 403– 425. URL: https://doi.org/10.1023/A:1008223921944. doi:10.1023/A:1008223921944. [27] A. Prior, Time and Modality, Oxford University Press, 1956. [28] H. Ono, A. Nakamura, On the size of refutation Kripke models for some linear modal and tense logics, Studia Logica (1980) 325–333. [29] J. P. Burgess, Basic tense logic, in: Handbook of Philosophical Logic: Volume II: Extensions of Classical Logic, Reidel, Dordrecht, 1984, pp. 89–133. [30] M. Y. Vardi, From Church and Prior to PSL, in: 25 Years of Model Checking - History, Achievements, Perspectives, volume 5000 of LNCS, Springer, 2008, pp. 150–171. [31] O. Kupferman, M. Y. Vardi, Verification of fair transisiton systems, in: R. Alur, T. A. Henzinger (Eds.), Computer Aided Verification, 8th International Conference, CAV ’96, New Brunswick, NJ, USA, July 31 - August 3, 1996, Proceedings, volume 1102 of Lecture Notes in Computer Science, Springer, 1996, pp. 372–382. URL: https://doi.org/10.1007/3-540-61474-5_ 84. doi:10.1007/3-540-61474-5\_84.