Reverse Engineering of Temporal Queries with and
without LTL Ontologies: First Steps
Extended Abstract

Marie Fortin1 , Boris Konev1 , Vladislav Ryzhikov2 , Yury Savateev2 , Frank Wolter1 and
Michael Zakharyaschev2,3
1
  Department of Computer Science, University of Liverpool, UK
2
  Department of Computer Science and Information Systems, Birkbeck, University of London, UK
3
  School of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russia


                                         Abstract
                                         In reverse engineering of database queries, one aims to construct a query from a set of positively and
                                         negatively labelled answers and non-answers. The query can then be used to explore the data further or
                                         as an explanation of the answers and non-answers. We consider this reverse engineering problem for
                                         queries formulated in various fragments of positive linear temporal logic LTL over data instances given
                                         by timestamped atomic concepts. We focus on the design of suitable query languages and the complexity
                                         of the separability problem: ‘does there exist a query in the given query language that separates the
                                         given answers from the non-answers?’. We deal with both plain LTL queries and those that are mediated
                                         by ontologies providing background knowledge and formulated in fragments of clausal LTL.

                                         Keywords
                                         Reverse engineering of queries, query-by-example, explanation, linear temporal logic, ontology-mediated
                                         query, computational complexity.


1. Introduction
Supporting users of databases by constructing a query from examples of answers and non-
answers to the query has been a major research area for many years [1]. In the database
community, research has focussed on standard query languages such as (fragments of) SQL,
graph query languages, and SPARQL [2, 3, 4, 5, 6, 7, 8, 9]. The KR community has been concerned
with constructing queries from examples under the open world semantics and with background
knowledge given by an ontology [10, 11, 12, 13, 14]. In both cases, the focus has been on
general multi-purpose query languages. A fundamental problem that has been investigated
by both communities is known as separability or query-by-example: given sets 𝐸 + and 𝐸 − of
pairs (𝒟, 𝑑) with a database 𝒟 and a tuple 𝑑 in 𝒟, and a query language 𝒬, does there exist
a query 𝑞 ∈ 𝒬 that separates (𝐸 + , 𝐸 − ) in the sense that 𝒟 |= 𝑞(𝑑) for all (𝒟, 𝑑) ∈ 𝐸 + and
𝒟 ̸|= 𝑞(𝑑) for all (𝒟, 𝑑) ∈ 𝐸 − (or 𝒪, 𝒟 |= 𝑞(𝑑) for all (𝒟, 𝑑) ∈ 𝐸 + and 𝒪, 𝒟 ̸|= 𝑞(𝑑) for all

    DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel
$ marie.fortin@liverpool.ac.uk (M. Fortin); boris.konev@liverpool.ac.uk (B. Konev); vlad@dcs.bbk.ac.uk
(V. Ryzhikov); yury@dcs.bbk.ac.uk (Y. Savateev); wolter@liverpool.ac.uk (F. Wolter); michael@dcs.bbk.ac.uk
(M. Zakharyaschev)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
(𝒟, 𝑑) ∈ 𝐸 − if an ontology 𝒪 is present)?1 There are various strategies to ensure that the query
𝑞 is a generalisation of the positive examples and does not overfit. For instance, one can ask for
the existence of a small separating query in 𝒬 or one can choose a query language that enforces
generalisation by not admitting disjunction. In the latter case, query-by-example is often very
hard computationally: it is coNExpTime-complete for conjunctive queries (CQs) over standard
relational databases [15, 16] and even undecidable for CQs under ℰℒℐ or 𝒜ℒ𝒞 ontologies [17].
   In many applications, the input data is timestamped and queries are naturally formulated in
languages with temporal operators. Taking into account the prohibitive complexity of many
query-by-example problems already in the static case, it does not seem wise to start an investi-
gation of the temporal case by considering temporal extensions of standard query languages
(which can only lead to computationally even harder problems). Instead, we investigate the
simpler but still very useful case in which data, 𝒟, is a set of timestamped atomic concepts.
Our query languages are positive fragments of linear temporal logic LTL with the temporal
operators ◇ (eventually), ○ (next), and U (until) interpreted under the strict semantics [18].
To avoid overfitting, we only consider such fragments without ∨. The most expressive query
language we deal with, 𝒬[U], is thus defined as the set of formulas constructed from atoms
using ∧ and U (via which ○ and ◇ can be defined). The fragments 𝒬[◇], 𝒬[○], and 𝒬[○, ◇]
are defined analogously.
   Within this temporal setting, we take a broad view of the potential applications of the reverse
engineering of queries and the query-by-example problem. On the one hand, there are non-
expert end-users who would like to explore data via queries but are not familiar with temporal
logic. They usually are, however, capable of providing data examples illustrating the queries
they are after. Query-by-example supports such users in the construction of those queries. On
the other hand, the positive and negative data examples might come from an application, and
the user is interested in possible explanations of the examples. Such an explanation is then
provided by a temporal query separating the positive examples from the negative ones. In this
case, our goal is similar to recent work on learning linear temporal logic formulas and, more
generally, explainable AI [19, 20, 21, 22, 23]. The following example illustrates this point.
Example 1. Imagine an engineer whose task is to explain the behaviour of the monitored
equipment (say, why an engine stops) in terms of qualitative sensor data such as ‘low tempera-
ture’, which can be represented by the atomic concept 𝑇 , ‘strong vibration’, 𝑉 , etc. Suppose the
engine stopped after the runs 𝒟1+ and 𝒟2+ shown below but did not stop after the runs 𝒟1− , 𝒟2− ,
𝒟3− , where we assume the runs to start at 0 and measurements to be recorded at 0, 1, 2, . . . :

𝒟1+ = {𝑇 (2), 𝑉 (4)}, 𝒟2+ = {𝑇 (1), 𝑉 (3)}, 𝒟1− = {𝑇 (1)}, 𝒟2− = {𝑉 (4)}, 𝒟3− = {𝑉 (1), 𝑇 (2)}.

The ◇-query 𝑞 = ◇(𝑇 ∧ ◇◇𝑉 ) is true at 0 in the positive data instances 𝒟𝑖+ , false in the
negative ones 𝒟𝑖− , and so provides a possible natural explanation of what could cause the engine
failure. The example set ({𝒟3+ , 𝒟4+ }, {𝒟4− }) with

                𝒟3+ = {𝑇 (1), 𝑉 (2)}, 𝒟4+ = {𝑇 (1), 𝑇 (2), 𝑉 (3)}, 𝒟4− = {𝑇 (1), 𝑉 (3)}

can be explained by the U-query 𝑇 U 𝑉 . Using background knowledge of the domain, we can
    1
        If such a 𝑞 exists, then (𝐸 + , 𝐸 − ) is often called satisfiable w.r.t. 𝒬 and the construction of 𝑞 is called learning.
compensate for sensor failures, which result in incomplete data. To illustrate, suppose that
¯+
𝒟 1 = {𝐻(3), 𝑉 (4)}, where 𝐻 stands for ‘heater is on’. If a background ontology 𝒪 contains
the axiom ○𝐻 → 𝑇 saying that a heater can only be triggered by the low temperature at the
previous moment, then the same 𝑞 will separate {𝒟¯ + , 𝒟+ } from {𝒟− , 𝒟− , 𝒟− } under 𝒪. ⊣
                                                   1    2           1   2    3

  The queries used in Example 1 are of a particular ‘linear’ form and suggest a restriction to
path queries in which the order of the atoms is fixed and not left open as in ◇𝐴 ∧ ◇𝐵. More
precisely, path ○◇-queries in the class 𝒬𝑝 [○, ◇] take the form

                           𝑞 = 𝜌0 ∧ 𝑜1 (𝜌1 ∧ 𝑜2 (𝜌2 ∧ · · · ∧ 𝑜𝑛 𝜌𝑛 )),                      (1)

where 𝑜𝑖 ∈ {○, ◇} and 𝜌𝑖 is a conjunction of atoms; 𝒬𝑝 [◇] and 𝒬𝑝 [○] restrict 𝑜𝑖 to {◇} and
{○}, respectively; and path U-queries in the class 𝒬𝑝 [U] take the form

                      𝑞 = 𝜌0 ∧ (𝜆1 U (𝜌1 ∧ (𝜆2 U (. . . (𝜆𝑛 U 𝜌𝑛 ) . . . )))),               (2)

where 𝜆𝑖 is a conjunction of atoms or ⊥. Path queries are motivated by two observations. First,
if a query language 𝒬 allows conjunctions of queries, then, dually to the overfitting problem
for disjunction, the admission of multiple negative examples⋀︀ becomes trivialised: if queries
𝑞 𝒟 separate (𝐸 + , {𝒟}) for 𝒟 ∈ 𝐸 − , then the conjunction 𝒟∈𝐸 − 𝑞 𝒟 separates (𝐸 + , 𝐸 − ). In
particular, the query-by-example problem becomes polynomially reducible to its version with a
single negative example. This is clearly not the case for path queries.
Example 2. Let 𝒟1 = {𝐴(1), 𝐵(2)}, 𝒟2 = {𝐵(1), 𝐴(2)}, 𝒟3 = {𝐴(1)} and 𝒟4 = {𝐵(1)}.
Then ({𝒟1 , 𝒟2 }, 𝒟3 ) and ({𝒟1 , 𝒟2 }, 𝒟4 ) are separated in 𝒬𝑝 [◇] by ◇𝐵 and ◇𝐴, respectively;
({𝒟1 , 𝒟2 }, {𝒟3 , 𝒟4 }) is separated in 𝒬[◇] by ◇𝐵 ∧ ◇𝐴, but it is not 𝒬𝑝 [◇]-separable.     ⊣
   Second, numerous natural types of query classes from applications are represented by path
queries. For example, the existence of a common subsequence of the positive examples that is
not a subsequence of any negative example corresponds to the existence of a separating query
in 𝒬𝑝 [◇] with 𝜌0 = ⊤ and 𝜌𝑖 ̸= ⊤ for 𝑖 > 0, and the existence of a common subword of the
positive examples that is not a subword of any negative example corresponds to the existence
of a separating query of the form ◇(𝜌1 ∧ ○(𝜌2 ∧ · · · ∧ ○𝜌𝑛 )). The unique characterisability
and learnability of path queries is investigated in [24].
   Except for 𝒬𝑝 [○] = 𝒬[○] (modulo logical equivalence), no nontrivial inclusion relations
hold between the separation capabilities of the query languages introduced above, as illustrated
by the following example.
Example 3. (1) Let 𝒟1 = {𝐴(1)}, 𝒟2 = {𝐴(2)} and 𝐸 = ({𝒟1 }, {𝒟2 }). Then ○𝐴 separates
𝐸 but no query in 𝒬[◇] does. On the other hand, 𝐸 is not 𝒬-separable under 𝒪 = {○𝐴 → 𝐴},
for any class 𝒬 defined above, as 𝒪, 𝒟1 |= 𝑞(0) implies 𝒪, 𝒟2 |= 𝑞(0) for all 𝑞 ∈ 𝒬.
  (2) Let 𝐸 = ({𝒟1 , 𝒟2 }, ∅) with 𝒟1 and 𝒟2 as in (1). Then ◇𝐴 separates 𝐸 but no 𝒬[○]-query
does. Observe that at least two positive examples are needed to achieve this effect. However,
○○○𝐵 separates 𝐸 under 𝒪 = {𝐴 → □𝐵}.
  (3) Let 𝐸 = ({{𝐴(1), 𝐵(2)}, {𝐴(2), 𝐵(3)}}, {{𝐴(3), 𝐵(5)}}). Then ◇(𝐴 ∧ ○𝐵) separates
𝐸 but no query in 𝒬[○] or 𝒬[◇] does.
  (4) 𝐴 U 𝐵 separates ({{𝐵(1)}, {𝐴(1), 𝐵(2)}}, {{𝐵(2)}}) but no 𝒬[○, ◇]-query does. ⊣
2. Our Contribution
We now briefly present our initial results on the complexity of the separability problem for LTL
queries, both plain and mediated by an LTL-ontology.
Ontology-free LTL queries. Separability in 𝒬[○] is almost trivial as it corresponds to the existence
of a time point where some atom holds in all positive examples but in no negative example,
which is decidable in polynomial time. For the query languages ranging from 𝒬𝑝 [◇] and 𝒬[◇]
to 𝒬𝑝 [○, ◇] and 𝒬[○, ◇] and also 𝒬𝑝 [U], separability turns out to be NP-complete. The upper
bound is proved by observing that, in any of these languages, every separable example set
can be separated by a query of polynomial size. The matching lower bound is established
by reduction of the NP-hard problem of deciding whether the words in a given set contain
a common subsequence of a given length [25]. Separability by 𝒬[U]-queries turns out to be
trickier because of the interplay of ∧, the left- and the right-hand sides of the U-operator.

Example 4. The example set below, where the instance on the right is negative, is separated by
                           𝐴2 , 𝐵 1   𝐵2              𝐴1 , 𝐵 2   𝐵1                𝐵1           𝐵2

  0     1      2      3       4       5    0     1       2       3    0      1     2      3      4

the 𝒬[U]-query ◇ (𝐴1 U 𝐵1 ) ∧ (𝐴2 U 𝐵2 ) but is not separable in any other class of queries.
                (︀                      )︀

   We give a separability criterion in terms of U-simulations between subsets of the disjoint
union of the positive examples and points of a negative example (cf. [26]). Then, using a game-
theoretic variant of U-simulations, we show that a separating 𝒬[U]-query can be constructed
in PSpace. However, at the moment, we only have an NP-lower bound for separability.
Separability under LTL Ontologies. Apart from full LTL, we consider its fragment LTL□◇ that
only uses the operators □ and ◇, also known as the Prior logic [27, 28, 29, 30], and the Horn
fragment LTL□    ○
               horn containing axioms of the form 𝐶1 ∧ · · · ∧ 𝐶𝑘 → 𝐶𝑘+1 , where the 𝐶𝑖 are atoms
possibly prefixed by □ and ○ for 𝑖 ≤ 𝑘 + 1, and also by ◇ for 𝑖 ≤ 𝑘. The ontology axioms are
supposed to hold at all times.
   Separability by (path) ◇-queries is Σ𝑝2 -complete under LTL□◇ ontologies and PSpace-
complete under LTL□     ○
                      horn ontologies. For LTL ontologies, we have a NExpTime upper bound. We
conjecture that exactly the same bounds can be proved for (path) ○◇-queries. As concerns
𝒬𝑝 [U]-queries, separability under LTL□      ○
                                           horn ontologies is shown to be between ExpSpace and
NExpTime; for ‘branching’ 𝒬[U− ]-queries without nesting U-operators on the left of U, it can
be decided in ExpTime using U-simulations. We establish the upper bounds by constructing
two exponential-size transition systems 𝑆 + and 𝑆 − from (𝒪, 𝐸 + ) and (𝒪, 𝐸 − ) such that (𝑖)
there is a trace-based simulation of 𝑆 + by 𝑆 − iff (𝐸 + , 𝐸 − ) is separated in 𝒬𝑝 [U] and (𝑖𝑖) there
is a tree-based simulation of 𝑆 + by 𝑆 − iff (𝐸 + , 𝐸 − ) is separated in 𝒬[U− ]. The existence of
trace-based and tree-based simulations can be decided in PSpace- and P, respectively [31].
   The obtained complexity results are summarised in the table below:
   QBE(ℒ, 𝒬)           LTL                LTL□  ○
                                              horn               LTL□◇         QBE(𝒬)
     𝒬[U]                                    ?
                                                                           ≥ NP, ≤ PSpace
     𝒬[U− ]                              ≤ ExpTime
     𝒬𝑝 [U]             ?          ≥ NExpTime, ≤ ExpSpace           ?
    𝒬[○, ◇]                              ≤ ExpTime
    𝒬𝑝 [○, ◇]                           ≤ ExpSpace                               = NP
     𝒬[◇]
                  ≤ NExpTime                = PSpace              = Σ𝑝2
     𝒬𝑝 [◇]

Acknowledgments
This research was supported by the EPSRC UK grants EP/S032207 and EP/S032282 for the joint
project ‘quantMD : Ontology-Based Management for Many-Dimensional Quantitative Data’ and
by the RSF grant 22-11-00323 when M. Zakharyaschev was visiting the HSE University.


References
 [1] D. M. L. Martins, Reverse engineering database queries from examples: State-of-the-art,
     challenges, and research opportunities, Inf. Syst. 83 (2019) 89–100.
 [2] M. Zhang, H. Elmeleegy, C. M. Procopiuc, D. Srivastava, Reverse engineering complex
     join queries, in: Proc. of SIGMOD, 2013, pp. 809–820.
 [3] Y. Y. Weiss, S. Cohen, Reverse engineering SPJ-queries from examples, in: Proc. of PODS,
     2017, pp. 151–166.
 [4] D. V. Kalashnikov, L. V. Lakshmanan, D. Srivastava, Fastqre: Fast query reverse engineering,
     in: Proc. of SIGMOD, 2018, pp. 337–350.
 [5] D. Deutch, A. Gilad, Reverse-engineering conjunctive queries from provenance examples.,
     in: Proc. of EDBT, 2019, pp. 277–288.
 [6] S. Staworko, P. Wieczorek, Learning twig and path queries, in: Proc. of ICDT, 2012, pp.
     140–154.
 [7] P. Barceló, M. Romero, The complexity of reverse engineering problems for conjunctive
     queries, in: Proc. of ICDT, 2017, pp. 7:1–7:17.
 [8] S. Cohen, Y. Y. Weiss, The complexity of learning tree patterns from example graphs,
     ACM Trans. Database Syst. 41 (2016) 14:1–14:44. URL: https://doi.org/10.1145/2890492.
     doi:10.1145/2890492.
 [9] M. Arenas, G. I. Diaz, E. V. Kostylev, Reverse engineering SPARQL queries, in: Proc. of
     WWW, 2016, pp. 239–249.
[10] V. Gutiérrez-Basulto, J. C. Jung, L. Sabellek, Reverse engineering queries in ontology-
     enriched systems: The case of expressive Horn description logic ontologies, in: Proc. of
     IJCAI-ECAI, 2018.
[11] M. Ortiz, Ontology-mediated queries from examples: a glimpse at the DL-Lite case, in:
     Proc. of GCAI, 2019, pp. 1–14.
[12] J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Logical separability of incomplete data under
     ontologies, in: Proc. of KR, 2020.
[13] G. Cima, F. Croce, M. Lenzerini, Query definability and its approximations in ontology-
     based data management, in: G. Demartini, G. Zuccon, J. S. Culpepper, Z. Huang, H. Tong
     (Eds.), CIKM ’21: The 30th ACM International Conference on Information and Knowl-
     edge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, ACM,
     2021, pp. 271–280. URL: https://doi.org/10.1145/3459637.3482466. doi:10.1145/3459637.
     3482466.
[14] J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Separating data examples by description logic
     concepts with restricted signatures, in: Proc. of KR, 2021.
[15] R. Willard, Testing expressibility is hard, in: D. Cohen (Ed.), Principles and Practice of
     Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews,
     Scotland, UK, September 6-10, 2010. Proceedings, volume 6308 of Lecture Notes in Computer
     Science, Springer, 2010, pp. 9–23. URL: https://doi.org/10.1007/978-3-642-15396-9_4. doi:10.
     1007/978-3-642-15396-9\_4.
[16] B. ten Cate, V. Dalmau, The product homomorphism problem and applications, in: Proc.
     of ICDT, 2015, pp. 161–176.
[17] M. Funk, J. C. Jung, C. Lutz, H. Pulcini, F. Wolter, Learning description logic concepts:
     When can positive and negative examples be separated?, in: Proc. of IJCAI, 2019, pp.
     1682–1688.
[18] S. Demri, V. Goranko, M. Lange, Temporal Logics in Computer Science, Cambridge Tracts
     in Theoretical Computer Science, Cambridge University Press, 2016.
[19] C. Lemieux, D. Park, I. Beschastnikh, General LTL specification mining (T), in: Proc. of
     ASE, IEEE, 2015, pp. 81–92.
[20] D. Neider, I. Gavran, Learning linear temporal properties, in: Proc. of FMCAD 2018,
     IEEE, 2018, pp. 1–10. URL: https://doi.org/10.23919/FMCAD.2018.8603016. doi:10.23919/
     FMCAD.2018.8603016.
[21] A. Camacho, S. A. McIlraith, Learning interpretable models expressed in linear temporal
     logic, in: Proc. of ICAPS 2018, AAAI Press, 2019, pp. 621–630. URL: https://aaai.org/ojs/
     index.php/ICAPS/article/view/3529.
[22] N. Fijalkow, G. Lagarde, The complexity of learning linear temporal formulas from
     examples, in: J. Chandlee, R. Eyraud, J. Heinz, A. Jardine, M. Zaanen (Eds.), Proceedings of
     the 15th International Conference on Grammatical Inference, 23-27 August 2021, Virtual
     Event, volume 153 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 237–250.
     URL: https://proceedings.mlr.press/v153/fijalkow21a.html.
[23] R. Raha, R. Roy, N. Fijalkow, D. Neider, Scalable anytime algorithms for learning fragments
     of linear temporal logic, in: D. Fisman, G. Rosu (Eds.), Tools and Algorithms for the
     Construction and Analysis of Systems - 28th International Conference, TACAS 2022,
     Held as Part of the European Joint Conferences on Theory and Practice of Software,
     ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of
     Lecture Notes in Computer Science, Springer, 2022, pp. 263–280. URL: https://doi.org/10.
     1007/978-3-030-99524-9_14. doi:10.1007/978-3-030-99524-9\_14.
[24] M. Fortin, B. Konev, V. Ryzhikov, Y. Savateev, F. Wolter, M. Zakharyaschev, Unique
     characterisability and learnability of temporal instance queries, in: Proc. of KR, 2022.
[25] D. Maier, The complexity of some problems on subsequences and supersequences, J. ACM
     25 (1978) 322–336. URL: https://doi.org/10.1145/322063.322075. doi:10.1145/322063.
     322075.
[26] N. Kurtonina, M. de Rijke, Bisimulations for temporal logic, J. Log. Lang. Inf. 6 (1997) 403–
     425. URL: https://doi.org/10.1023/A:1008223921944. doi:10.1023/A:1008223921944.
[27] A. Prior, Time and Modality, Oxford University Press, 1956.
[28] H. Ono, A. Nakamura, On the size of refutation Kripke models for some linear modal and
     tense logics, Studia Logica (1980) 325–333.
[29] J. P. Burgess, Basic tense logic, in: Handbook of Philosophical Logic: Volume II: Extensions
     of Classical Logic, Reidel, Dordrecht, 1984, pp. 89–133.
[30] M. Y. Vardi, From Church and Prior to PSL, in: 25 Years of Model Checking - History,
     Achievements, Perspectives, volume 5000 of LNCS, Springer, 2008, pp. 150–171.
[31] O. Kupferman, M. Y. Vardi, Verification of fair transisiton systems, in: R. Alur, T. A.
     Henzinger (Eds.), Computer Aided Verification, 8th International Conference, CAV ’96, New
     Brunswick, NJ, USA, July 31 - August 3, 1996, Proceedings, volume 1102 of Lecture Notes in
     Computer Science, Springer, 1996, pp. 372–382. URL: https://doi.org/10.1007/3-540-61474-5_
     84. doi:10.1007/3-540-61474-5\_84.