=Paper=
{{Paper
|id=Vol-3263/abstract-8
|storemode=property
|title=Reverse Engineering of Temporal Queries with and without LTL Ontologies: First Steps (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-3263/abstract-8.pdf
|volume=Vol-3263
|authors=Marie Fortin,Boris Konev,Vladislav Ryzhikov,Yury Savateev,Frank Wolter,Michael Zakharyaschev
|dblpUrl=https://dblp.org/rec/conf/dlog/FortinKRSWZ22
}}
==Reverse Engineering of Temporal Queries with and without LTL Ontologies: First Steps (Extended Abstract)==
Reverse Engineering of Temporal Queries with and without LTL Ontologies: First Steps Extended Abstract Marie Fortin1 , Boris Konev1 , Vladislav Ryzhikov2 , Yury Savateev2 , Frank Wolter1 and Michael Zakharyaschev2,3 1 Department of Computer Science, University of Liverpool, UK 2 Department of Computer Science and Information Systems, Birkbeck, University of London, UK 3 School of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russia Abstract In reverse engineering of database queries, one aims to construct a query from a set of positively and negatively labelled answers and non-answers. The query can then be used to explore the data further or as an explanation of the answers and non-answers. We consider this reverse engineering problem for queries formulated in various fragments of positive linear temporal logic LTL over data instances given by timestamped atomic concepts. We focus on the design of suitable query languages and the complexity of the separability problem: βdoes there exist a query in the given query language that separates the given answers from the non-answers?β. We deal with both plain LTL queries and those that are mediated by ontologies providing background knowledge and formulated in fragments of clausal LTL. Keywords Reverse engineering of queries, query-by-example, explanation, linear temporal logic, ontology-mediated query, computational complexity. 1. Introduction Supporting users of databases by constructing a query from examples of answers and non- answers to the query has been a major research area for many years [1]. In the database community, research has focussed on standard query languages such as (fragments of) SQL, graph query languages, and SPARQL [2, 3, 4, 5, 6, 7, 8, 9]. The KR community has been concerned with constructing queries from examples under the open world semantics and with background knowledge given by an ontology [10, 11, 12, 13, 14]. In both cases, the focus has been on general multi-purpose query languages. A fundamental problem that has been investigated by both communities is known as separability or query-by-example: given sets πΈ + and πΈ β of pairs (π, π) with a database π and a tuple π in π, and a query language π¬, does there exist a query π β π¬ that separates (πΈ + , πΈ β ) in the sense that π |= π(π) for all (π, π) β πΈ + and π ΜΈ|= π(π) for all (π, π) β πΈ β (or πͺ, π |= π(π) for all (π, π) β πΈ + and πͺ, π ΜΈ|= π(π) for all DL 2022: 35th International Workshop on Description Logics, August 7β10, 2022, Haifa, Israel $ marie.fortin@liverpool.ac.uk (M. Fortin); boris.konev@liverpool.ac.uk (B. Konev); vlad@dcs.bbk.ac.uk (V. Ryzhikov); yury@dcs.bbk.ac.uk (Y. Savateev); wolter@liverpool.ac.uk (F. Wolter); michael@dcs.bbk.ac.uk (M. Zakharyaschev) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (π, π) β πΈ β if an ontology πͺ is present)?1 There are various strategies to ensure that the query π is a generalisation of the positive examples and does not overfit. For instance, one can ask for the existence of a small separating query in π¬ or one can choose a query language that enforces generalisation by not admitting disjunction. In the latter case, query-by-example is often very hard computationally: it is coNExpTime-complete for conjunctive queries (CQs) over standard relational databases [15, 16] and even undecidable for CQs under β°ββ or πβπ ontologies [17]. In many applications, the input data is timestamped and queries are naturally formulated in languages with temporal operators. Taking into account the prohibitive complexity of many query-by-example problems already in the static case, it does not seem wise to start an investi- gation of the temporal case by considering temporal extensions of standard query languages (which can only lead to computationally even harder problems). Instead, we investigate the simpler but still very useful case in which data, π, is a set of timestamped atomic concepts. Our query languages are positive fragments of linear temporal logic LTL with the temporal operators β (eventually), β (next), and U (until) interpreted under the strict semantics [18]. To avoid overfitting, we only consider such fragments without β¨. The most expressive query language we deal with, π¬[U], is thus defined as the set of formulas constructed from atoms using β§ and U (via which β and β can be defined). The fragments π¬[β], π¬[β], and π¬[β, β] are defined analogously. Within this temporal setting, we take a broad view of the potential applications of the reverse engineering of queries and the query-by-example problem. On the one hand, there are non- expert end-users who would like to explore data via queries but are not familiar with temporal logic. They usually are, however, capable of providing data examples illustrating the queries they are after. Query-by-example supports such users in the construction of those queries. On the other hand, the positive and negative data examples might come from an application, and the user is interested in possible explanations of the examples. Such an explanation is then provided by a temporal query separating the positive examples from the negative ones. In this case, our goal is similar to recent work on learning linear temporal logic formulas and, more generally, explainable AI [19, 20, 21, 22, 23]. The following example illustrates this point. Example 1. Imagine an engineer whose task is to explain the behaviour of the monitored equipment (say, why an engine stops) in terms of qualitative sensor data such as βlow tempera- tureβ, which can be represented by the atomic concept π , βstrong vibrationβ, π , etc. Suppose the engine stopped after the runs π1+ and π2+ shown below but did not stop after the runs π1β , π2β , π3β , where we assume the runs to start at 0 and measurements to be recorded at 0, 1, 2, . . . : π1+ = {π (2), π (4)}, π2+ = {π (1), π (3)}, π1β = {π (1)}, π2β = {π (4)}, π3β = {π (1), π (2)}. The β-query π = β(π β§ ββπ ) is true at 0 in the positive data instances ππ+ , false in the negative ones ππβ , and so provides a possible natural explanation of what could cause the engine failure. The example set ({π3+ , π4+ }, {π4β }) with π3+ = {π (1), π (2)}, π4+ = {π (1), π (2), π (3)}, π4β = {π (1), π (3)} can be explained by the U-query π U π . Using background knowledge of the domain, we can 1 If such a π exists, then (πΈ + , πΈ β ) is often called satisfiable w.r.t. π¬ and the construction of π is called learning. compensate for sensor failures, which result in incomplete data. To illustrate, suppose that Β―+ π 1 = {π»(3), π (4)}, where π» stands for βheater is onβ. If a background ontology πͺ contains the axiom βπ» β π saying that a heater can only be triggered by the low temperature at the previous moment, then the same π will separate {πΒ― + , π+ } from {πβ , πβ , πβ } under πͺ. β£ 1 2 1 2 3 The queries used in Example 1 are of a particular βlinearβ form and suggest a restriction to path queries in which the order of the atoms is fixed and not left open as in βπ΄ β§ βπ΅. More precisely, path ββ-queries in the class π¬π [β, β] take the form π = π0 β§ π1 (π1 β§ π2 (π2 β§ Β· Β· Β· β§ ππ ππ )), (1) where ππ β {β, β} and ππ is a conjunction of atoms; π¬π [β] and π¬π [β] restrict ππ to {β} and {β}, respectively; and path U-queries in the class π¬π [U] take the form π = π0 β§ (π1 U (π1 β§ (π2 U (. . . (ππ U ππ ) . . . )))), (2) where ππ is a conjunction of atoms or β₯. Path queries are motivated by two observations. First, if a query language π¬ allows conjunctions of queries, then, dually to the overfitting problem for disjunction, the admission of multiple negative examplesβοΈ becomes trivialised: if queries π π separate (πΈ + , {π}) for π β πΈ β , then the conjunction πβπΈ β π π separates (πΈ + , πΈ β ). In particular, the query-by-example problem becomes polynomially reducible to its version with a single negative example. This is clearly not the case for path queries. Example 2. Let π1 = {π΄(1), π΅(2)}, π2 = {π΅(1), π΄(2)}, π3 = {π΄(1)} and π4 = {π΅(1)}. Then ({π1 , π2 }, π3 ) and ({π1 , π2 }, π4 ) are separated in π¬π [β] by βπ΅ and βπ΄, respectively; ({π1 , π2 }, {π3 , π4 }) is separated in π¬[β] by βπ΅ β§ βπ΄, but it is not π¬π [β]-separable. β£ Second, numerous natural types of query classes from applications are represented by path queries. For example, the existence of a common subsequence of the positive examples that is not a subsequence of any negative example corresponds to the existence of a separating query in π¬π [β] with π0 = β€ and ππ ΜΈ= β€ for π > 0, and the existence of a common subword of the positive examples that is not a subword of any negative example corresponds to the existence of a separating query of the form β(π1 β§ β(π2 β§ Β· Β· Β· β§ βππ )). The unique characterisability and learnability of path queries is investigated in [24]. Except for π¬π [β] = π¬[β] (modulo logical equivalence), no nontrivial inclusion relations hold between the separation capabilities of the query languages introduced above, as illustrated by the following example. Example 3. (1) Let π1 = {π΄(1)}, π2 = {π΄(2)} and πΈ = ({π1 }, {π2 }). Then βπ΄ separates πΈ but no query in π¬[β] does. On the other hand, πΈ is not π¬-separable under πͺ = {βπ΄ β π΄}, for any class π¬ defined above, as πͺ, π1 |= π(0) implies πͺ, π2 |= π(0) for all π β π¬. (2) Let πΈ = ({π1 , π2 }, β ) with π1 and π2 as in (1). Then βπ΄ separates πΈ but no π¬[β]-query does. Observe that at least two positive examples are needed to achieve this effect. However, βββπ΅ separates πΈ under πͺ = {π΄ β β‘π΅}. (3) Let πΈ = ({{π΄(1), π΅(2)}, {π΄(2), π΅(3)}}, {{π΄(3), π΅(5)}}). Then β(π΄ β§ βπ΅) separates πΈ but no query in π¬[β] or π¬[β] does. (4) π΄ U π΅ separates ({{π΅(1)}, {π΄(1), π΅(2)}}, {{π΅(2)}}) but no π¬[β, β]-query does. β£ 2. Our Contribution We now briefly present our initial results on the complexity of the separability problem for LTL queries, both plain and mediated by an LTL-ontology. Ontology-free LTL queries. Separability in π¬[β] is almost trivial as it corresponds to the existence of a time point where some atom holds in all positive examples but in no negative example, which is decidable in polynomial time. For the query languages ranging from π¬π [β] and π¬[β] to π¬π [β, β] and π¬[β, β] and also π¬π [U], separability turns out to be NP-complete. The upper bound is proved by observing that, in any of these languages, every separable example set can be separated by a query of polynomial size. The matching lower bound is established by reduction of the NP-hard problem of deciding whether the words in a given set contain a common subsequence of a given length [25]. Separability by π¬[U]-queries turns out to be trickier because of the interplay of β§, the left- and the right-hand sides of the U-operator. Example 4. The example set below, where the instance on the right is negative, is separated by π΄2 , π΅ 1 π΅2 π΄1 , π΅ 2 π΅1 π΅1 π΅2 0 1 2 3 4 5 0 1 2 3 0 1 2 3 4 the π¬[U]-query β (π΄1 U π΅1 ) β§ (π΄2 U π΅2 ) but is not separable in any other class of queries. (οΈ )οΈ We give a separability criterion in terms of U-simulations between subsets of the disjoint union of the positive examples and points of a negative example (cf. [26]). Then, using a game- theoretic variant of U-simulations, we show that a separating π¬[U]-query can be constructed in PSpace. However, at the moment, we only have an NP-lower bound for separability. Separability under LTL Ontologies. Apart from full LTL, we consider its fragment LTLβ‘β that only uses the operators β‘ and β, also known as the Prior logic [27, 28, 29, 30], and the Horn fragment LTLβ‘ β horn containing axioms of the form πΆ1 β§ Β· Β· Β· β§ πΆπ β πΆπ+1 , where the πΆπ are atoms possibly prefixed by β‘ and β for π β€ π + 1, and also by β for π β€ π. The ontology axioms are supposed to hold at all times. Separability by (path) β-queries is Ξ£π2 -complete under LTLβ‘β ontologies and PSpace- complete under LTLβ‘ β horn ontologies. For LTL ontologies, we have a NExpTime upper bound. We conjecture that exactly the same bounds can be proved for (path) ββ-queries. As concerns π¬π [U]-queries, separability under LTLβ‘ β horn ontologies is shown to be between ExpSpace and NExpTime; for βbranchingβ π¬[Uβ ]-queries without nesting U-operators on the left of U, it can be decided in ExpTime using U-simulations. We establish the upper bounds by constructing two exponential-size transition systems π + and π β from (πͺ, πΈ + ) and (πͺ, πΈ β ) such that (π) there is a trace-based simulation of π + by π β iff (πΈ + , πΈ β ) is separated in π¬π [U] and (ππ) there is a tree-based simulation of π + by π β iff (πΈ + , πΈ β ) is separated in π¬[Uβ ]. The existence of trace-based and tree-based simulations can be decided in PSpace- and P, respectively [31]. The obtained complexity results are summarised in the table below: QBE(β, π¬) LTL LTLβ‘ β horn LTLβ‘β QBE(π¬) π¬[U] ? β₯ NP, β€ PSpace π¬[Uβ ] β€ ExpTime π¬π [U] ? β₯ NExpTime, β€ ExpSpace ? π¬[β, β] β€ ExpTime π¬π [β, β] β€ ExpSpace = NP π¬[β] β€ NExpTime = PSpace = Ξ£π2 π¬π [β] Acknowledgments This research was supported by the EPSRC UK grants EP/S032207 and EP/S032282 for the joint project βquantMD : Ontology-Based Management for Many-Dimensional Quantitative Dataβ and by the RSF grant 22-11-00323 when M. Zakharyaschev was visiting the HSE University. References [1] D. M. L. Martins, Reverse engineering database queries from examples: State-of-the-art, challenges, and research opportunities, Inf. Syst. 83 (2019) 89β100. [2] M. Zhang, H. Elmeleegy, C. M. Procopiuc, D. Srivastava, Reverse engineering complex join queries, in: Proc. of SIGMOD, 2013, pp. 809β820. [3] Y. Y. Weiss, S. Cohen, Reverse engineering SPJ-queries from examples, in: Proc. of PODS, 2017, pp. 151β166. [4] D. V. Kalashnikov, L. V. Lakshmanan, D. Srivastava, Fastqre: Fast query reverse engineering, in: Proc. of SIGMOD, 2018, pp. 337β350. [5] D. Deutch, A. Gilad, Reverse-engineering conjunctive queries from provenance examples., in: Proc. of EDBT, 2019, pp. 277β288. [6] S. Staworko, P. Wieczorek, Learning twig and path queries, in: Proc. of ICDT, 2012, pp. 140β154. [7] P. BarcelΓ³, M. Romero, The complexity of reverse engineering problems for conjunctive queries, in: Proc. of ICDT, 2017, pp. 7:1β7:17. [8] S. Cohen, Y. Y. Weiss, The complexity of learning tree patterns from example graphs, ACM Trans. Database Syst. 41 (2016) 14:1β14:44. URL: https://doi.org/10.1145/2890492. doi:10.1145/2890492. [9] M. Arenas, G. I. Diaz, E. V. Kostylev, Reverse engineering SPARQL queries, in: Proc. of WWW, 2016, pp. 239β249. [10] V. GutiΓ©rrez-Basulto, J. C. Jung, L. Sabellek, Reverse engineering queries in ontology- enriched systems: The case of expressive Horn description logic ontologies, in: Proc. of IJCAI-ECAI, 2018. [11] M. Ortiz, Ontology-mediated queries from examples: a glimpse at the DL-Lite case, in: Proc. of GCAI, 2019, pp. 1β14. [12] J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Logical separability of incomplete data under ontologies, in: Proc. of KR, 2020. [13] G. Cima, F. Croce, M. Lenzerini, Query definability and its approximations in ontology- based data management, in: G. Demartini, G. Zuccon, J. S. Culpepper, Z. Huang, H. Tong (Eds.), CIKM β21: The 30th ACM International Conference on Information and Knowl- edge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, ACM, 2021, pp. 271β280. URL: https://doi.org/10.1145/3459637.3482466. doi:10.1145/3459637. 3482466. [14] J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Separating data examples by description logic concepts with restricted signatures, in: Proc. of KR, 2021. [15] R. Willard, Testing expressibility is hard, in: D. Cohen (Ed.), Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010. Proceedings, volume 6308 of Lecture Notes in Computer Science, Springer, 2010, pp. 9β23. URL: https://doi.org/10.1007/978-3-642-15396-9_4. doi:10. 1007/978-3-642-15396-9\_4. [16] B. ten Cate, V. Dalmau, The product homomorphism problem and applications, in: Proc. of ICDT, 2015, pp. 161β176. [17] M. Funk, J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Learning description logic concepts: When can positive and negative examples be separated?, in: Proc. of IJCAI, 2019, pp. 1682β1688. [18] S. Demri, V. Goranko, M. Lange, Temporal Logics in Computer Science, Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, 2016. [19] C. Lemieux, D. Park, I. Beschastnikh, General LTL specification mining (T), in: Proc. of ASE, IEEE, 2015, pp. 81β92. [20] D. Neider, I. Gavran, Learning linear temporal properties, in: Proc. of FMCAD 2018, IEEE, 2018, pp. 1β10. URL: https://doi.org/10.23919/FMCAD.2018.8603016. doi:10.23919/ FMCAD.2018.8603016. [21] A. Camacho, S. A. McIlraith, Learning interpretable models expressed in linear temporal logic, in: Proc. of ICAPS 2018, AAAI Press, 2019, pp. 621β630. URL: https://aaai.org/ojs/ index.php/ICAPS/article/view/3529. [22] N. Fijalkow, G. Lagarde, The complexity of learning linear temporal formulas from examples, in: J. Chandlee, R. Eyraud, J. Heinz, A. Jardine, M. Zaanen (Eds.), Proceedings of the 15th International Conference on Grammatical Inference, 23-27 August 2021, Virtual Event, volume 153 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 237β250. URL: https://proceedings.mlr.press/v153/fijalkow21a.html. [23] R. Raha, R. Roy, N. Fijalkow, D. Neider, Scalable anytime algorithms for learning fragments of linear temporal logic, in: D. Fisman, G. Rosu (Eds.), Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of Lecture Notes in Computer Science, Springer, 2022, pp. 263β280. URL: https://doi.org/10. 1007/978-3-030-99524-9_14. doi:10.1007/978-3-030-99524-9\_14. [24] M. Fortin, B. Konev, V. Ryzhikov, Y. Savateev, F. Wolter, M. Zakharyaschev, Unique characterisability and learnability of temporal instance queries, in: Proc. of KR, 2022. [25] D. Maier, The complexity of some problems on subsequences and supersequences, J. ACM 25 (1978) 322β336. URL: https://doi.org/10.1145/322063.322075. doi:10.1145/322063. 322075. [26] N. Kurtonina, M. de Rijke, Bisimulations for temporal logic, J. Log. Lang. Inf. 6 (1997) 403β 425. URL: https://doi.org/10.1023/A:1008223921944. doi:10.1023/A:1008223921944. [27] A. Prior, Time and Modality, Oxford University Press, 1956. [28] H. Ono, A. Nakamura, On the size of refutation Kripke models for some linear modal and tense logics, Studia Logica (1980) 325β333. [29] J. P. Burgess, Basic tense logic, in: Handbook of Philosophical Logic: Volume II: Extensions of Classical Logic, Reidel, Dordrecht, 1984, pp. 89β133. [30] M. Y. Vardi, From Church and Prior to PSL, in: 25 Years of Model Checking - History, Achievements, Perspectives, volume 5000 of LNCS, Springer, 2008, pp. 150β171. [31] O. Kupferman, M. Y. Vardi, Verification of fair transisiton systems, in: R. Alur, T. A. Henzinger (Eds.), Computer Aided Verification, 8th International Conference, CAV β96, New Brunswick, NJ, USA, July 31 - August 3, 1996, Proceedings, volume 1102 of Lecture Notes in Computer Science, Springer, 1996, pp. 372β382. URL: https://doi.org/10.1007/3-540-61474-5_ 84. doi:10.1007/3-540-61474-5\_84.