=Paper= {{Paper |id=Vol-3409/paper3 |storemode=property |title=Puzzling over Subsequence-Query Extensions: Disjunction and Generalised Gaps |pdfUrl=https://ceur-ws.org/Vol-3409/paper3.pdf |volume=Vol-3409 |authors=André Frochaux,Sarah Kleest-Meißner |dblpUrl=https://dblp.org/rec/conf/amw/FrochauxK23 }} ==Puzzling over Subsequence-Query Extensions: Disjunction and Generalised Gaps== https://ceur-ws.org/Vol-3409/paper3.pdf
Puzzling over Subsequence-Query Extensions:
Disjunction and Generalised Gaps
André Frochaux1,† , Sarah Kleest-Meißner1,†
1
    Humboldt-Universität zu Berlin, Unter den Linden 6, 10117 Berlin, Germany


                                         Abstract
                                         A query model for sequence data was introduced in [1] in the form of subsequence-queries with wildcards
                                         and gap-size constraints (swg-queries, for short). These queries consist of a pattern over an alphabet
                                         of variables and types, as well as a global window size and a number of local gap-size constraints. We
                                         propose two new extensions of swg-queries, which both enrich the expressive power of swg-queries
                                         in different ways: subsequence-queries with generalised gap-size constraints (swgg-queries, for short)
                                         and disjunctive subsequence-queries (dswg-queries, for short). We discuss a suitable characterisation
                                         of containment, a classical property considered in database theory, and adapt results concerning the
                                         discovery of swg-queries to both, swgg-queries and dswg-queries.

                                         Keywords
                                         subsequence-queries, disjunction, gap-constraints, learning descriptive queries, subsequences, embed-
                                         dings




1. Introduction
Applications in different domains like cluster monitoring [2], urban transportation [3], and
in finance[4], use models for sequence data, which define an order for a set of data items
[5]. Respective systems enable the definition of queries which detect patterns of data items
describing a situation of interest (soi for short), for example error occurence, in a specific order
and temporal context.
   Finding a suitable query is a non-trivial task. A user may know the time at which a certain
job fails execution, but does not exactly conceive a situation which forcasts the failure. It
was therefore suggested to automatically discover a query from historic sequence data which
describes the soi. Such a query may then be used in pro-active applications where they shall
anticipate a soi to prepare for it accordingly [6].
   In [1] a formal model (referred to as swg-queries) was proposed, which covers the essence
of discovering a query from sequence data. In a nutshell, an swg-query consists of a pattern
over an alphabet of variables and types, a global window size and a tuple of local gap-size
constraints. Syntactically, swg-queries are so-called Angluin-style patterns with variables, but
with a semantics adapted to sequence data: each variable in the query string ranges only over a
AMW’23: 15th Alberto Mendelzon International Workshop on Foundations of Data Management, May 22–26, 2023,
Santiago, Chile
†
  These authors contributed equally.
$ andre.frochaux@informatik.hu-berlin.de (A. Frochaux); kleemeis@informatik.hu-berlin.de (S. Kleest-Meißner)
 0009-0001-7918-8725 (A. Frochaux); 0000-0002-4133-7975 (S. Kleest-Meißner)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
single symbol and the query matches if, after replacing the variables by single data items, it
occurs as a subsequence that satisfies the window size and local gap-size constraints. Angluin-
style patterns were introduced in [7] and play a central role for inductive inference, in formal
language theory and combinatorics on words (see [8], [9], [10]). Concepts and algorithms
from inductive inference of so-called pattern languages, that can be described by Angluin-
style patterns, can be adapted to swg-queries. Especially the notion of descriptive patterns
(already introduced in [7], see also [11], [12]) forms a key concept and enables the adaptation
of Shinohara’s algorithm [13] for Angluin-style patterns. This algorithm computes a descriptive
Angluin-style pattern upon input of a finite set of sequences of data items. The corresponding
adaptation to swg-queries including some extensions were presented in [1], and liftetd to a
multi-dimensional data model in [14].
   Subsequences in general have extensively been studied both in a purely combinatorical
sense (in formal language theory, logic and combinatorics on words) and algorithmically (in
string algorithms and bioinformatics); see the introductions of the recent papers [15], [16] for
a comprehensive list of relevant pointers. The problem of matching subsequences with gap-
constraints (and analysis problems with respect to the set of all gap-constrained subsequences
of given strings) has been investigated in the recent papers [17], [18] (see also [19] for a survey).
   Queries defined for complex event recognition (CER, for short) usually use operators such as
sequencing, conjunction and disjunction, Kleene closure, negation and variables which may be
bound to data items in a stream [20]. Inspired by the generalised gap-size constraints described
in [17] that are defined over strings other than patterns over variables and types, and the use of
disjunction in CER languages, we introduce two new notions of subsequence-queries, which
both extend the expressive power of swg-queries in different ways:
    • disjunctive subsequence-queries with wildcards and local gap-size constraints, for short:
      dswg-queries, and
    • subsequence-queries with wildcards and generalised gap-size constraints, for short: swgg-
      queries
Improving the expressive power of the underlying language used for an automatically discovered
decriptive query leads to results of increased precision. Enabling disjunction is a natural and
effective way to reach this. Our second approach of generalised gap-size constraints allows
detecting temporal contexts not only between consecutive data items, but between any data
items in the query string.
   The remainder of this paper is structured as follows. Section 2 introduces both, swgg-queries
and dswg-queries, and discusses the relation to swg-queries. In Section 3, we provide a solution
for the query discovery problem for both kinds of queries. Section 4 concludes the paper. Due
to space limitations, proof details had to be deferred to the paper’s full version [21].


2. Traces and Queries
This section introduces the syntax and semantics of both, swgg-query (Section 2.1), and dswg-
queries (Section 2.2). For a better understanding we consider each extension individually.
  By Z, N, N⩾1 we denote the set of integers, non-negative integers, and positive integers,
respectively. For every set 𝑀 we denote the powerset by 𝒫(𝑀 ), i.e. the set of all subsets
of 𝑀 , and 𝒫fin (𝑀 ) := {𝑋 ∈ 𝒫(𝑀 ) : 𝑋 is finite} is the set of all finite subsets from 𝑀 .
Moreover, we write 𝒫fin  +
                            (𝑀 ) for (𝒫fin (𝑀 ) ∖ ∅) and for every 𝑘 ∈ N⩾1 , we define 𝒫 𝑘 (𝑀 ) :=
{𝑚 ∈ 𝒫fin (𝑀 ) | |𝑚| = 𝑘}. For ℓ ∈ N we let [ℓ] = {𝑖 ∈ N⩾1 : 1 ≤ 𝑖 ≤ ℓ}. For a non-empty set
𝐴 we write 𝐴* (and 𝐴+ ) for the set of all (non-empty) strings built from symbols in 𝐴. By |𝑠|
we denote the length of a string 𝑠, and for a position 𝑖 ∈ [|𝑠|] we write 𝑠[𝑖] to denote the letter
at position 𝑖 in 𝑠. A factor of a string 𝑠 ∈ 𝐴* is a string 𝑡 ∈ 𝐴* such that 𝑠 is of the form 𝑠1 𝑡𝑠2
for 𝑠1 , 𝑠2 ∈ 𝐴* .
   An embedding is a mapping 𝑒 : [ℓ] → [𝑛] with ℓ ≤ 𝑛 such that 𝑖 < 𝑗 implies 𝑒(𝑖) < 𝑒(𝑗) for
all 𝑖, 𝑗 ∈ [ℓ]. Let 𝑠 and 𝑡 be two strings with |𝑠| ≤ |𝑡|. We say that 𝑠 is a subsequence of 𝑡 with
embedding 𝑒 : [|𝑠|] → [|𝑡|], if 𝑒 is an embedding and 𝑠[𝑖] = 𝑡[𝑒(𝑖)] for every 𝑖 ∈ [|𝑠|]. We write
𝑠 ≼𝑒 𝑡 to indicate that 𝑠 is a subsequence of 𝑡 with embedding 𝑒; and we write 𝑠 ≼ 𝑡 to indicate
that there exists an embedding 𝑒 such that 𝑠 ≼𝑒 𝑡.
   We model traces as finite, non-empty strings over some (finite or infinite) alphabet Γ of
types. It will be reasonable to assume that |Γ| ≥ 2. A trace (over Γ) is a string 𝑡 ∈ Γ+ . We
write types(𝑡) for the set of types that occur in 𝑡. Finally, we fix a countably infinite set Vars of
variables, and we will always assume that Vars is disjoint with the set Γ of considered types.

2.1. Syntax and semantics of swgg-queries
Definition 1. An swgg-query 𝑞 = (𝑠, 𝑤, 𝐶) (over Vars and Γ) is specified by
    • a query string 𝑠 ∈ (Vars ∪ Γ)+ ,
    • a global window size 𝑤 ∈ N⩾1 ∪ {∞} with 𝑤 ⩾ |𝑠| and
    • a finite set 𝐶 of generalised gap-size constraints (for |𝑠| and 𝑤) of form
                              (𝑐− , 𝑐+ , 𝑟)𝑗 ∈ (N × N ∪ {∞} × N⩾1 )N⩾1

       for 𝑗 ∈ [|𝑠| − 1], 𝑟 ≤ 𝑐− ≤ 𝑐+ and 𝑗 + 𝑟 ≤ |𝑠|.
The semantics of swgg-queries is defined as follows: each variable in 𝑠 represents an arbitrary
type from Γ. A query 𝑞 = (𝑠, 𝑤, 𝐶) matches in a trace 𝑡 (in symbols: 𝑡 |= 𝑞), if the wildcards in
s can be replaced by types in Γ in such a way that the resulting string 𝑠′ satifies the fowllowing:
𝑡 contains a factor 𝑡′ of length at most 𝑤 such that 𝑠′ occurs as a subsequence in 𝑡′ and for
each (𝑐− , 𝑐+ , 𝑟)𝑗 ∈ 𝐶 the gap between 𝑠[𝑗] and 𝑠[𝑗 + 𝑟] in 𝑡′ has length at least 𝑐− and at most
𝑐+ . Gaps of range 𝑟 for 𝑟 ∈ [𝑤] without any constraints are implicitly set to the most general
constraint (0, ∞, 𝑟).
   A more formal description of these semantics relies on the following additional notation: An
embedding 𝑒 : [ℓ] → [𝑛] satisfies a global window size 𝑤, if 𝑒(ℓ) − 𝑒(1) + 1 ≤ 𝑤. Furthermore, it
satisfies a set of generalised gap-size constraints 𝐶 (for |ℓ| and 𝑤) if 𝑐− ≤ 𝑒(𝑗 +𝑟)−1−𝑒(𝑗) ≤ 𝑐+ ,
for each (𝑐− , 𝑐+ , 𝑟)𝑗 ∈ 𝐶.
   A substitution is a mapping 𝜇 : (Vars ∪ Γ) → (Vars ∪ Γ) with 𝜇(𝛾) = 𝛾 for all 𝛾 ∈ Γ. We
lift substitutions to mappings (Vars ∪ Γ)+ → (Vars ∪ Γ)+ in the obvious way, i.e. 𝜇(𝑠) =
𝜇(𝑠[1])𝜇(𝑠[2]) . . . 𝜇(𝑠[ℓ]) for 𝑠 ∈ (Vars ∪ Γ)+ and ℓ := |𝑠|.
   An swgg-query 𝑞 = (𝑠, 𝑤, 𝐶) matches in a trace 𝑡 ∈ Γ+ (or 𝑡 matches 𝑞), if and only if there
are a substitution 𝜇 : (Vars ∪ Γ) → Γ and an embedding 𝑒 : [ℓ] → [𝑛] that satisfies 𝑤 and 𝐶,
such that 𝜇(𝑠) ≼𝑒 𝑡. We call (𝜇, 𝑒) a witness for 𝑡 |= 𝑞.
   The model set of a query 𝑞 w.r.t. to a type set Δ ⊆ Γ is ModΔ (𝑞) := {𝑡 ∈ Δ+ : 𝑡 |= 𝑞}.
Note that there exist swgg-queries 𝑞 such that ModΓ (𝑞) = ∅. They have in common that
either their generalised gap-size constraints conflict with the global window size or some gap-
size constraints are in conflict among themselves. Lemma 3 characterises swgg-queries with
compatible constraints.

Example 2. Let Γ = {a, b}. Let 𝑞1 = (𝑠, 𝑤, 𝐶1 ) and 𝑞2 = (𝑠, 𝑤, 𝐶2 ) be the two swgg-queries over
Vars, where 𝑠 = a a a a a a and 𝑤 = 10 for both queries, and 𝐶1 = ((7, 7, 3)1 , (6, 6, 3)2 , (0, 0, 1)5 )
and 𝐶2 = ((4, 4, 5)1 , (2, 5, 2)3 ).
              1           2          3    4      5                6                        1          2       3          4     5   6
𝑠      =      a           a          a    a      a                a           𝑠    =       a          a       a          a     a   a
                         ⏟     ⏞                      ⏟ ⏞                                                          ⏟ ⏞
𝐶1     :                  (7, 7, 3)1                 (0, 0, 1)5               𝐶2    :                             (2, 5, 2)3
                                 ⏟     ⏞                                                                  ⏟        ⏞
                                  (6, 6, 3)2                                                                  (4, 4, 5)1
                          ⏟                  ⏞                                                    ⏟                ⏞
𝑤      :                                   10                                 𝑤     :                             10

A shortest trace over Γ which satisfies 𝐶1 is 𝑡 = a b a b b b b a a a a. But 𝑡 ̸|= 𝑞1 since 𝑡 does
not satisfy 𝑤 = 10. Since 𝑡 is a shortest trace there exists no trace satisfying both, 𝑤 and 𝐶. (The
shortest trace is not unique since the sequences of bs could be replaced by arbitrary types from Γ.)
   Note that ModΓ (𝑞2 ) = ∅ holds as well since (4, 4, 5)1 and (2, 5, 3)3 are incompatible: For each
trace 𝑡, substitution 𝜇 and embedding 𝑒 such that 𝜇(𝑠) ≼𝑒 𝑡 and 𝑒 satisfies (4, 4, 5)1 , the second
gap-size constraint is not satisfied, since it demands at least one further type between 𝑒(3) and
𝑒(4), contradicting (4, 4, 5)1 .

Lemma 3. An swgg-query 𝑞 = (𝑠, 𝑤, 𝐶) (over Vars and Γ) is satisfiable, i.e. ModΓ (𝑞) ̸= ∅, iff
there are no two sequences
                    (︁                                                                          )︁
              𝐶 ′ = (𝑐′−
                       1 , 𝑐′+ ′
                            1 , 𝑟1 ) 𝑗1
                                       ′ , (𝑐
                                             ′− ′+ ′
                                             2 , 𝑐2 , 𝑟2 )𝑗
                                                                          ′−    ′+    ′
                                                            ′ , . . . , (𝑐 ′ , 𝑐 ′ , 𝑟 ′ )𝑗 ′
                                                            2             |𝐶 | |𝐶 | |𝐶 |      ′                   |𝐶 |


and
                        (︁                                                                                            )︁
                  𝐶 ′′ = (𝑐′′−  ′′+ ′′           ′′− ′′+ ′′                    ′′−        ′′+        ′′
                           1 , 𝑐1 , 𝑟1 )𝑗1′′ , (𝑐2 , 𝑐2 , 𝑟2 )𝑗2′′ , . . . , (𝑐|𝐶 ′′ | , 𝑐|𝐶 ′′ | , 𝑟|𝐶 ′′ | )𝑗 ′′ ′′|𝐶 |


from 𝐶 ∪ {(0, ∞, 1)1 , . . . , (0, ∞, 1)|𝑠|−1 } where 𝑗1′ = 𝑗1′′ , 𝑗|𝐶
                                                                    ′         ′         ′′         ′′
                                                                       ′ | + 𝑟|𝐶 ′ | = 𝑗|𝐶 ′′ | + 𝑟|𝐶 ′′ | , and
 ′      ′    ′      ′′         ′′   ′′               ′
𝑗𝑖+1 = 𝑗𝑖 + 𝑟𝑖 and 𝑗𝑖+1 = 𝑗𝑖 + 𝑟𝑖 for all 𝑖 ∈ [|𝐶 | − 1], with
                              |𝐶 ′
                               ∑︀|
           (i) |𝑠| +                 𝑐′−   ′
                                      𝑖 − 𝑟𝑖 + 1     >        𝑤        , or
                              𝑖=1
                  |𝐶 ′                                  ′′
                   ∑︀|                               |𝐶
                                                      ∑︀|
           (ii)          𝑐′−   ′
                          𝑖 − 𝑟𝑖 + 1             >           𝑐′′+  ′′
                                                              𝑖 − 𝑟𝑖 + 1.
                  𝑖=1                                𝑖=1

     For the rest of this paper, we only focus on queries with a non-empty model set.
2.2. Syntax and semantics of dswg-queries
Definition 4. A dswg-query 𝑞 = (𝑠, 𝑤, 𝑐) (over Vars {︃       and Γ) is specified by
                                                              𝑥 ∈ Vars
   • a query string 𝑠 = 𝑠[1] . . . 𝑠[ℓ], whereby 𝑠[𝑖] =              +        ,
                                                              𝜒 ∈ 𝒫fin (Γ)
   • a global window size 𝑤 ∈ N⩾1 ∪ {∞} with 𝑤 ⩾ |𝑠| and
   • a tuple 𝑐 = (𝑐1 , 𝑐2 , . . . , 𝑐|𝑠|−1 ) of local gap-size constraints (for |𝑠| and 𝑤), where 𝑐𝑖 =
                                                                                         ∑︀|𝑠| −
     (𝑐−    +                                     −     +
       𝑖 , 𝑐𝑖 ) ∈ N×(N∪{∞}), such that 𝑐𝑖 ≤ 𝑐𝑖 for every 𝑖 ∈ [|𝑠|−1] and |𝑠|+               𝑖=1 𝑐𝑖 ≤ 𝑤.
   Note that setting all gap-size constraints of a dswg-query 𝑞 to (0, ∞) corresponds to a query
without gap-size constraints.
   The semantics of dswg-queries is defined as follows: Again, variables in 𝑠 represent an
arbitrary type, and each set 𝜒 stands for a disjunction. Intuitively, a trace 𝑡 matches a query
𝑞 = (𝑠, 𝑤, 𝑐) if the variables in 𝑠 can be replaced by types and each occuring of an 𝜒 can be
mapped to a single type 𝛾 ∈ 𝜒, such that the resulting string 𝑠′ occurs as a subsequence in 𝑡
that spans at most 𝑤 types and the gap between 𝑠′ [𝑖] and 𝑠′ [𝑖+1] in 𝑡 has length between 𝑐−        𝑖
and 𝑐+𝑖 , for  all 𝑖 < ℓ := |𝑠|.
   An alternative description of these semantics, which will be more convenient for our formal
proofs, involves a bit more notation: We say that an embedding 𝑒 : [ℓ] → [𝑛] satisfies a global
window size 𝑤, if 𝑒(ℓ) − 𝑒(1) + 1 ≤ 𝑤; and we say that 𝑒 satisfies a tuple 𝑐 = (𝑐1 , 𝑐2 , . . . , 𝑐ℓ−1 )
of local gap-size constraints (for ℓ and 𝑤), if 𝑐−𝑖 ≤ 𝑒(𝑖+1)−1 − 𝑒(𝑖) ≤ 𝑐𝑖 for all 𝑖 < ℓ.
                                                                              +

   A substitution of size ℓ is a mapping 𝜇ℓ : ([ℓ] × Vars ∪ 𝒫fin
                                                               +                  +
                                                                 (Γ)) → (Vars ∪ 𝒫fin (Γ)) with:

                       ⎨𝑥 ∈ (Vars ∪ 𝒫fin (Γ)) , 𝑖 = 1 and 𝑧 ∈ Vars
                       ⎧              +
                       ⎪
           𝜇ℓ (𝑖, 𝑧) = 𝜇ℓ (1, 𝑧)                , 𝑖 > 1 and 𝑋 ∈ Vars
                                                , ∅ ⊂ 𝑧 ′ ⊆ 𝑧 for all non-empty 𝑧 ⊆fin Γ.
                       ⎪
                       ⎩ ′
                          𝑧
We extend substitutions of size ℓ to mappings ([ℓ] × Vars ∪ 𝒫fin                +
                                                                                  (Γ))+ → (Vars ∪
𝒫fin (Γ)) for strings 𝑠 ∈ (Vars ∪ 𝒫fin (Γ)) of size ℓ in the obvious way, i.e., 𝜇(𝑠) =
  +       +                                     +     +

𝜇ℓ (1, 𝑠[1])𝜇ℓ (2, 𝑠[2]) · · · 𝜇ℓ (ℓ, 𝑠[ℓ]). Since the size of the string must match the size of the
substitution, we can omit the index ℓ if we apply it to a string. Particularly, we can omit the
parameter 𝑖 for the position if the second parameter is a variable, as we have for all variables 𝑧
that 𝜇(𝑖, 𝑧) = 𝜇(𝑖′ , 𝑧) for all 𝑖, 𝑖′ ∈ [𝑙], or if the position is given by the context, i.e. we write
𝜇(𝑠[𝑖]) instead of 𝜇(𝑖, 𝑠[𝑖]).
   A dswg-query 𝑞 = (𝑠, 𝑤, 𝑐) matches in a trace 𝑡 ∈ Γ+ (or, 𝑡 matches 𝑞, in symbols: 𝑡 |= 𝑞), if
and only if there are a substitution 𝜇 : ([ℓ] × Vars ∪ 𝒫fin  +
                                                               (Δ)) → Γ (i.e., there are only singeltons
in the co-domain of 𝜇 and every type of 𝜇(𝑠) is the unique element of its singleton) and an
embedding 𝑒 : [|𝑠|] → [|𝑡|] that satisfies 𝑤 and 𝑐, such that 𝜇(𝑠) ≼𝑒 𝑡. We call (𝜇, 𝑒) a witness
for 𝑡 |= 𝑞.
Example 5. Let 𝑥1 , 𝑥2 , 𝑥3 ∈ Vars and Γ = {a, b, c}. We consider a query 𝑞 = (𝑠, 𝑤, 𝑐),
where 𝑠 = 𝑥1 {a, b}𝑥1 𝑥2 {c}𝑥3 {a, c}𝑥1 , 𝑤 = 25 and 𝑐 = ((0, 1), (2, ∞), (3, ∞), (0, 5), (0, 5),
(1, 5), (1, 2)). For 𝑡1 , 𝑡2 ∈ Γ* we consider the trace 𝑡 = 𝑡1 c a b b c a b a c a b a c b c b b a c 𝑡2 . We
observe that 𝑡 |= 𝑞, and a witness substitution and embedding can be illustrated as follows:
   𝑠    =          𝑥1   {a, b}          𝑥1            𝑥2   {c}    𝑥3         {a, b}        𝑥1 ,
   𝑡    =    𝑡1    c      a       bb    c     aca     a      c    b     c      b      b    c      𝑡2 .
We close this subsection with two little observations. First, it is reasonable to assume that 𝜒 is a
proper subset of Γ, otherwise we could also use a wildcard instead of a disjunction. Second, in
the case of a finite alphabet Γ, we obtain the possibility to express a simple kind of negation. We
can build a query where 𝑠 contains a substring 𝑠′ = a 𝜒 b with 𝜒 = Γ ∖ {𝑐} and corresponding
conditions 𝑐 = ((0, 0), (0, 0)) that is only matched by traces where in between of the relevant a
and b occurs exactly one letter that is not c. Unfortunately, it is not possible to express negation
in general, so we cannot express the following: a and b have one or two letters in between,
none is c.

2.3. About containment
This section is dedicated to a characterisation of containment, a classical property considered
in database theory.
   An swgg-query 𝑞 is called an (ℓ, 𝑤, 𝐶)-swgg-query (over Vars and Γ) if 𝑞 = (𝑠, 𝑤, 𝐶) with
|𝑠| = ℓ, (ℓ, 𝑤, 𝑐)-dswg-queries are analogously defined. The parameter ℓ will be called string
length. If the maximal size of typesets occuring in an (ℓ, 𝑤, 𝑐)-dswg-query 𝑞 is bounded by a
number 𝑘 ≥ 1, we call 𝑞 an (ℓ, 𝑤, 𝑐, 𝑘)-dswg-query (for swgg-queries 𝑘 always equals 1).
   Given an swgg-query or dswg-query 𝑞 we omit the prefix and call 𝑞 a query, if 𝑞 may be both,
or it is clear from the context whether 𝑞 is an swgg-query or dswg-query. We use (ℓ, 𝑤, ˜𝑐)-
query (or (ℓ, 𝑤, ˜𝑐, 𝑘)-query) as notation for queries which might be swgg-query or dswg-query
and assume that the gap-size constraints ˜𝑐 are compatibile with ℓ and 𝑤 or satisfy Lemma 3,
respectively.
   We write types(𝑞) (or types(𝑠)), typesets(𝑞) (or typesets(𝑠)) and vars(𝑞) (or vars(𝑠)) for the
set of types, the set of all typesets and the set of variables, respectively, that occur in 𝑞’s query
string 𝑠. I.e.,

                       types(𝑞) := {𝛾 ∈ Γ | there ex. 𝑖 ∈ [|𝑠|] : 𝛾 ∈ 𝑠[𝑖]}
                     typesets(𝑞) := {𝜒 ⊆ Γ | there ex. 𝑖 ∈ [|𝑠|] : 𝑠[𝑖] = 𝜒}
                        vars(𝑞) := {𝑥 ∈ Vars | there ex. 𝑖 ∈ [|𝑠|] : 𝑠[𝑖] = 𝑥}

For the reason of readability we omit braces in query strings if the set consists only of
an unique element. Therefore, we write for example 𝑠 = 𝑥1 a b 𝑥2 a{a, c} b instead of
𝑠 = 𝑥1 {a}{b}𝑥2 {a}{a, c}{b}. Vice versa, we can consider a query string 𝑠 over Vars ∪ Γ
as a string over Vars ∪ 𝒫 1 (Γ) where every 𝑠[𝑖] is a singleton.
   A query 𝑞 is said to be contained in a query 𝑞 ′ w.r.t. to a set Δ ⊆ Γ (we write 𝑞 ⊆Δ 𝑞 ′ ) if
ModΔ (𝑞) ⊆ ModΔ (𝑞 ′ ).

Definition 6. A homomorphism from 𝑞 ′ to 𝑞 is a substitution ℎ such that ℎ(𝑠′ ) = 𝑠 and the
following property holds:
      For every 𝑧 ∈ Vars that occurs at least twice in the query string 𝑠′ of 𝑞 ′ and is mapped
      to a subset of Γ via ℎ, we have ℎ(𝑧) is a singelton.
             hom
We write 𝑞 ′ −→ 𝑞 to express that there exists a homomorphism from 𝑞 ′ to 𝑞.
This additional property of homomorphisms feels arbitrary or artificial, but it is perfectly tailored
to our discovery algorithm and if the considered class of queries in Definition 6 is the class of
all (ℓ, 𝑤, 𝐶)-swgg-query, then in any way, we have 𝑠[𝑖] is a singelton for all 𝑖 ∈ [ℓ]. Now, the
following theorem gives a characterisation of containment.

Theorem 7. Given some sufficiently large Γ. Let 𝑞 and 𝑞 ′ be (𝑠, 𝑤, ˜𝑐)-queries over Vars and Γ. If
𝑞 and 𝑞 ′ are satisfiable, it holds, that:
                                                           hom
                                       𝑞 ⊆Γ 𝑞 ′ ⇐⇒ 𝑞 ′ −→ 𝑞.

Sufficiently large, in the context of Theorem 7 means, |Γ| ≥ 2 for the case of dswg-queries.
In the case of swgg-queries the size of Γ depends on gap-size constraints with range greater
than 1. Intuitively, the necessary size of Γ depicts how much structure information of 𝑞 ′ can be
hidden in the gaps of 𝑞. For further information and an example consider the theorems proof in
the paper’s full version [21].

2.4. Correlation to swg-queries
Note that an swgg-query query 𝑞 = (𝑠, 𝑤, 𝐶) with 𝐶 = {𝑐1 , . . . , 𝑐ℓ−1 } and 𝑟𝑖 = 1 for all
𝑖 ∈ [ℓ − 1] precisely corresponds to the notion of swg-queries introduced in [1]. Let 𝑞 be an
(ℓ, 𝑤, 𝑐, 𝑘)-query containing typesets over 𝒫fin
                                             +
                                                 (Γ). If 𝑘 = 1 this correpsonds to the syntax and
semantics of swg-queries as well.
    In [14] a mapping between one-dimensional and multi-dimensional sequence data was
introduced, such that a multi-dimensional trace matches a multi-dimensional query if and only
if the corresponding one-dimensional trace matches the corresponding one-dimensional query.
This mapping can be adapted to swgg-queries and dswg-queries.


3. Discovery
The question of how meaningful swgg-queries and dswg-queries can be discovered from a given
set of traces is of peculiar interest and was answered algorithmically in [1] for swg-queries. We
adapt these results to swgg-queries and dswg-queries.
   A sample is a finite, non-empty    set 𝒮 of traces over Γ. Given a sample 𝒮, let Γ𝒮 be the set of
all types occurring in 𝒮, i.e. 𝑡∈𝒮 types(𝑡). The support supp(𝑞, 𝒮) of a query 𝑞 in 𝒮 is defined
                                ⋃︀

as the fraction of traces in the sample that match 𝑞, i.e. supp(𝑞, 𝒮) := |{𝑡∈𝒮|𝒮|     : 𝑡|=𝑞}|
                                                                                               . A support
threshold is a rational number sp with 0 < sp ≤ 1. A query 𝑞 is said to cover a sample 𝒮 with
support sp if supp(𝑞, 𝒮) ≥ sp. Let 𝒮 be a sample, sp be a support threshold and 𝑘 ∈ [|Γ𝒮 | − 1].
An (ℓ, 𝑤, ˜𝑐, 𝑘)-query 𝑞 is called descriptive for 𝒮 w.r.t (sp, (ℓ, 𝑤, ˜𝑐, 𝑘)) if supp(𝑞, 𝒮) ≥ sp, and
there is no other (ℓ, 𝑤, ˜𝑐, 𝑘)-query 𝑞 ′ with supp(𝑞 ′ , 𝒮) ≥ sp and 𝑞’ ⊂Γ 𝑞. A type 𝛾 ∈ Γ (or a
typeset 𝜒 ∈ 𝒫fin+
                  (Γ)) satisfies sp w.r.t. to 𝒮, if the fraction of traces containing 𝛾 (or some 𝛾 ∈ 𝜒)
is greater than or equal to sp. The set of all types (or typesets) that satisfy sp w.r.t. to 𝒮 is

                            +              |{𝑡 ∈ 𝒮 : ex. 𝛾 ∈ 𝜒 s.t. 𝛾 ∈ types(𝑡)}|
          Δ(𝒮, sp) := {𝜒 ∈ 𝒫fin (Γ) :                                              ≥ sp}.
                                                            |𝒮|
We omit 𝒮 and sp, if they are clear from the context. For 𝑖 ∈ [|Γ𝒮 | − 1] we write Δ𝑖 to denote
the subset of Δ which contains only typesets of size 𝑖. For a descriptive (ℓ, 𝑤, ˜𝑐, 𝑘)-query 𝑞,
typesets(𝑞) ⊆ Δ1 ∪˙ . . .∪˙ Δ𝑘 holds. This corresponds to Δ = Δ1 = {𝛾 ∈ Γ : |{𝑡∈𝒮 : 𝛾∈types(𝑡)}|
                                                                                         |𝒮|        ≥
sp} if the considered query is an swgg-query.
   Given an (ℓ, 𝑤, ˜𝑐, 𝑘)-query 𝑞 = (𝑠, 𝑤, ˜𝑐) and a symbol 𝑧 from Vars∪𝒫fin +
                                                                               (Γ) we let pos(𝑞, 𝑧) =
pos(𝑠, 𝑧) = {𝑖 ∈ [ℓ] : 𝑠[𝑖] = 𝑧} be set of all positions 𝑖 in 𝑠 that carry 𝑧. Given a set of positions
𝑃 ⊆ [ℓ], and a symbol 𝑧 ∈ Vars ∪ 𝒫fin    +
                                            (Γ) we write 𝑠⟨𝑃 ↦→ 𝑧⟩ to denote the query string 𝑠′
which is obtained from 𝑠 by setting 𝑠[𝑖] to 𝑧, for all 𝑖 ∈ 𝑃 . Let 𝑥 ∈ Vars. We write 𝑠⟨𝑥 ↦→ 𝑧⟩
as an abbreviation for 𝑠⟨pos(𝑞, 𝑥) ↦→ 𝑧⟩. Next, we present the algorithmical idea for query
discovery:
   Compute Descriptive Query Problem (CompDescQuery): On input of a sample 𝒮 over Γ,
a support threshold sp, a string length ℓ ∈ N, a global window size 𝑤 ≥ ℓ, a tuple ˜𝑐 of gap-size
constraints, and 𝑘 ∈ [|Γ𝒮 | − 1], the task is to compute an (ℓ, 𝑤, ˜𝑐, 𝑘)-query 𝑞 that is descriptive
for 𝒮 w.r.t. (sp, (ℓ, 𝑤, ˜𝑐, 𝑘)).
  Pseudocode of an algorithm solving CompDescQuery is provided in Algorithm 1. Given 𝒮,
sp and query parameters (ℓ, 𝑤, ˜𝑐, 𝑘) as input, the algorithm first builds the most general query
𝑞 = 𝑞mg for (ℓ, 𝑤, ˜𝑐, 𝑘). Its query string consists of ℓ distinct variables, i.e. 𝑠mg = 𝑥1 . . . 𝑥ℓ , and
𝑞mg is most general in the sense that 𝑞 ′ ⊆Γ 𝑞mg for each (ℓ, 𝑤, ˜𝑐, 𝑘)-query 𝑞 ′ . If supp(𝑞, 𝒮) < sp

 ALGORITHM 1: DescrQuery(𝒮,sp,(ℓ, 𝑤, ˜𝑐, 𝑘))
   Input : sample 𝒮; support threshold sp with 0 < sp ≤ 1; (ℓ, 𝑤, ˜𝑐, 𝑘)
   Returns : descriptive query 𝑞 for 𝒮 w.r.t. (sp, (ℓ, 𝑤, ˜𝑐, 𝑘)) or error message ⊥
 1 𝑠 := 𝑠mg ; 𝑞 := (𝑠mg , 𝑤, ˜𝑐)                                       // query string and query
 2 if supp(𝑞, 𝒮) < sp then stop and return ⊥
 3 Δ := Δ1 ∪˙ . . . ∪˙ Δ𝑘                                           // typesets to be considered
 4 𝑈 := vars(𝑞); 𝑉 := ∅                                   // unvisited and available variables
 5 while 𝑈 ̸= ∅ do
 6     select an arbitrary 𝑥 ∈ 𝑈 and let 𝑈 := 𝑈 ∖ {𝑥} and Δ1 := Δ1 ∪ 𝑉
 7     for 𝑖 = 1 to 𝑘 do
 8         replace := False
 9         while Δ𝑖 ̸= ∅ do
10             select an arbitrary 𝑦 ∈ Δ𝑖 and let Δ𝑖 := Δ𝑖 ∖ {𝑦}
11             𝑞 ′ := (𝑠⟨𝑥 ↦→ 𝑦⟩, 𝑤, ˜𝑐)
12             if supp(𝑞 ′ , 𝒮) ≥ sp then
13                  𝑠 := 𝑠⟨𝑥 ↦→ 𝑦⟩; replace := True                                // ReplaceOp
14                  break for loop
15     if replace is False then 𝑉 := 𝑉 ∪ {𝑥}                                     // NoChangeOp
16 stop and return 𝑞 := (𝑠, 𝑤, ˜   𝑐)

the algorithm stops and returns ⊥ (line 2), because no other query 𝑞 ′ with 𝑞 ′ ⊆Γ 𝑞 = 𝑞mg can
describe 𝒮 with sufficient support.
  Otherwise, the algorithm searches for an admissable replacement operation for each variable
𝑥 ∈ 𝑈 := vars(𝑠) = {𝑥1 , . . . , 𝑥ℓ } during the main loop (Line 5). A replacement operation
replaces 𝑥 by an element 𝑦 ∈ Δ𝑖 (during the 𝑖-th iteration of the for-loop) which may be
a typeset or an available variable 𝑦 ∈ 𝑉 (if i=1). The replacement operation is stored in 𝑞
and called admissable if the resulting query satisfies the support threshold (lines 11–13). If
supp(⟨pos(𝑞, 𝑥) ↦→ 𝑦⟩, 𝒮) < sp for all 𝑦 ∈ Δ𝑖 and all 𝑖 ∈ [𝑘] the query string remains unchanged
and 𝑥 gets available (line 15). After each variable in vars(𝑠mg ) has been considered, the algorithm
terminates and produces the current query as output (line 16).
   Next we depict an exemplaric run of Algorithm 1. We refer to the paper’s full version [21]
for a brief discussion, why Δ is passed through incrementally in Line 7.

Example 8. Let Γ = {a, b, c} and 𝑥1 , 𝑥2 , 𝑥3 ∈ Vars. Consider the sample 𝒮 = {a b b, a c c},
sp = 1.0, ℓ = 𝑤 = 3, 𝑐 = ((0, 0), (0, 0)) and 𝑘 = 2.
   On input (𝒮, sp, (ℓ, 𝑤, 𝑐, 𝑘)) the algorithm first generates 𝑞 = (𝑥1 𝑥2 𝑥3 , 𝑤, 𝑐). Since 𝑞 satisfies
the support threshold the algorithm proceeds by computing Δ = {{a}} ∪˙ {{a, b}, {a, c}, {b, c}}.
Assume the algorithm selects 𝑥 := 𝑥3 during the first iteration of the main loop. It turns out
that Δ1 = {{a}} does not contain a typeset for an admissable replacement of 𝑥3 . Hence, the
algorithm considers Δ2 during the second transition of the for-loop in Line 7. The only admissable
replacement is 𝑠⟨𝑥3 ↦→ {b, c}⟩, and 𝑠 is replaced by 𝑥1 𝑥2 {b, c} (𝑉 remains empty).
   Let us assume that during the second transition through the main loop the algorithm selects
𝑥 := 𝑥1 and 𝑦 := {a} ∈ Δ1 . The replacement of 𝑥1 by {a} is admissible (as it has support 1 on
𝒮). Therefore, 𝑠 is replaced by {a}𝑥2 {b, c} and 𝑉 remains unchanged again.
   In its last iteration (during the second transition through the for-loop), 𝑠⟨𝑥2 ↦→ {b, c}⟩ is the
only admissible replacement operation. The run terminates after this iteration and outputs the
query 𝑞 = (𝑠, 𝑤, 𝑐) with 𝑠 = {a}{b, c}{b, c}.

                                                    {𝑎, 𝑏, 𝑐, 𝑑}

                              {𝑎, 𝑏, 𝑐}     {𝑎, 𝑏, 𝑑}        {𝑎, 𝑐, 𝑑}     {𝑏, 𝑐, 𝑑}

                     {𝑎, 𝑏}        {𝑎, 𝑐}    {𝑎, 𝑑}           {𝑏, 𝑐}     {𝑏, 𝑑}        {𝑐, 𝑑}

                                {𝑎}           {𝑏}                  {𝑐}       {𝑑}


Figure 1: Let Γ = {a, b, c, d}, 𝒮 = {c a b b c a c b, c b b b a c c b, c c b b c c c b}, sp = 1.0 and 𝑘 = 2.
                                       +
Depicted is a top-down walk through 𝒫fin (Γ) for 𝒮, starting with typesets of size 𝑘 = 2. The typesets
marked in blue represent Δ.

  Note that algorithm 1 computes an swgg-query if 𝑘 = 1 and generalised gap-size constraints
are given. Furthermore, during each iteration of the main loop, the for-loop is transisted only
once. It remains to discuss how Δ can be calculated in case that 𝑘 > 1. Starting with all subsets
𝜒 of 𝒫fin
       +
          (Γ) with |𝜒| = 𝑘 it suffices to explore 𝒫fin
                                                   +
                                                       (Γ) in a top-down manner: we walk through
the search space level-wise and check whether the current typesets satisfy sp w.r.t. 𝒮. If this is
not the case for a typeset 𝜒, all typesets 𝜒′ ⊂ 𝜒 can be deleted from the search space, since they
do not satisfy sp. An example is depicted in Figure 1.
Theorem 9. Given some sufficiently large Γ. Let 𝒮 be a sample, let sp be a support threshold with
0 < sp ≤ 1, let (ℓ, 𝑤, ˜𝑐, 𝑘) be query parameters with 𝑘 = 1 if ˜𝑐 = 𝐶.

(a) If there does not exist any (ℓ, 𝑤, ˜𝑐, 𝑘)-swgg– or dswg–query that is descriptive for 𝒮 w.r.t.
    (sp, (ℓ, 𝑤, ˜𝑐, 𝑘)) then there is only one run of Algorithm 1 upon the defined input, and it stops
    in line 2 with output ⊥.

(b) Otherwise, every run of Algorithm 1 upon input (𝒮, sp, (ℓ, 𝑤, ˜𝑐, 𝑘)) terminates and outputs an
    swgg-query or dswg-query 𝑞 (depending on 𝑘), with |𝜒| ≤ 𝑘 for all 𝜒 ∈ typesets(𝑞), that is
    descriptive for 𝒮 w.r.t. (sp, (ℓ, 𝑤, ˜𝑐, 𝑘)).

   We refer to the paper’s full version [21] for the proof. Analysing the complexity of the
algorithm, identifies two bottle necks. First the Δ-calculation in the case of dswg-queries. This
can be handeled by adjusting the parameter 𝑘, i.e. by bounding the size of the disjunctive clauses
in the query string. The second is already known from [1] and is caused by the recurring calls
of a matching subroutine. The refered results imply NP-hardness for our algorithm as we can
use it as well for swg-queries from [1]. Membership can be obtained by guessing a witness.


4. Conclusion and Future Work
We model sequence data as traces and discover descriptive queries over traces to find a character-
istic template for situations of interests. Since an increased expressive power of the underlying
query language leads to a more detailed picture of sois, we extended swg-queries, introduced
in [1], in two different ways. First, by generalising the gap size constraints (Section 2.1) and
second, by adding the possibilty of disjunctions (Section 2.2). We adopted and extended the
discovery algorithm to our approach and ensured that the essential complexity properties are
preserved (Section 3). Note that the extended approach can be applied to the multi-dimensional
setting, analogously to [14].
   For future work we will merge both extensions to one query language. We are interested in a
more general notion of disjunction and negation, and a more generous possibilty to describe
gaps. For the latter [17] is a good yardstick. An in-depth (parameterised) complexity analysis is
intended as well. Since the crucial point is the inherent complexity of the matching problem,
we are working on data strcutures to improve the computation in practical application. In the
long run we will investigate containment for relaxed query parameters (ℓ, 𝑤, 𝑐, 𝑘).


Acknowledgments
We thank Markus L. Schmid for useful discussions. Sarah Kleest-Meißner was supported by
the German Research Foundation (DFG), CRC 1404: “FONDA: Foundation of Workflows for
Large-Scale Scientific Data Analysis”.
References
 [1] S. Kleest-Meißner, R. Sattler, M. L. Schmid, N. Schweikardt, M. Weidlich, Discovering
     event queries from traces: Laying foundations for subsequence-queries with wildcards
     and gap-size constraints, in: 25th International Conference on Database Theory, ICDT
     2022, volume 220 of LIPIcs, 2022, pp. 18:1–18:21. doi:10.4230/LIPIcs.ICDT.2022.18.
 [2] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, J. Wilkes, Large-scale cluster
     management at google with borg, in: L. Réveillère, T. Harris, M. Herlihy (Eds.), Proceedings
     of the Tenth European Conference on Computer Systems, EuroSys 2015, Bordeaux, France,
     April 21-24, 2015, ACM, 2015, pp. 18:1–18:17. URL: https://doi.org/10.1145/2741948.2741964.
     doi:10.1145/2741948.2741964.
 [3] A. Artikis, M. Weidlich, F. Schnitzler, I. Boutsis, T. Liebig, N. Piatkowski, C. Bockermann,
     K. Morik, V. Kalogeraki, J. Marecek, A. Gal, S. Mannor, D. Gunopulos, D. Kinane, Hetero-
     geneous stream processing and crowdsourcing for urban traffic management, in: S. Amer-
     Yahia, V. Christophides, A. Kementsietsidis, M. N. Garofalakis, S. Idreos, V. Leroy (Eds.),
     Proceedings of the 17th International Conference on Extending Database Technology,
     EDBT 2014, Athens, Greece, March 24-28, 2014, OpenProceedings.org, 2014, pp. 712–723.
     URL: https://doi.org/10.5441/002/edbt.2014.77. doi:10.5441/002/edbt.2014.77.
 [4] K. Teymourian, M. Rohde, A. Paschke, Knowledge-based processing of complex stock
     market events, in: 15th International Conference on Extending Database Technology,
     EDBT ’12, Berlin, Germany, March 27-30, 2012, Proceedings, ACM, 2012, pp. 594–597.
     doi:10.1145/2247596.2247674.
 [5] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issues in data stream
     systems, in: L. Popa, S. Abiteboul, P. G. Kolaitis (Eds.), Proceedings of the Twenty-first
     ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June
     3-5, Madison, Wisconsin, USA, ACM, 2002, pp. 1–16. URL: https://doi.org/10.1145/543613.
     543615. doi:10.1145/543613.543615.
 [6] A. Artikis, C. Baber, P. Bizarro, C. Canudas-de-Wit, O. Etzion, F. Fournier, P. Goulart,
     A. Howes, J. Lygeros, G. Paliouras, A. Schuster, I. Sharfman, Scalable proactive event-
     driven decision making, IEEE Technol. Soc. Mag. 33 (2014) 35–41. URL: https://doi.org/10.
     1109/MTS.2014.2345131. doi:10.1109/MTS.2014.2345131.
 [7] D. Angluin, Inductive inference of formal languages from positive data, Inf. Control.
     45 (1980) 117–135. URL: https://doi.org/10.1016/S0019-9958(80)90285-5. doi:10.1016/
     S0019-9958(80)90285-5.
 [8] T. Shinohara, S. Arikawa, Pattern inference, in: Algorithmic Learning for Knowledge-Based
     Systems, GOSLER Final Report, 1995, pp. 259–291. doi:10.1007/3-540-60217-8\_13.
 [9] F. Manea, M. L. Schmid, Matching patterns with variables, in: R. Mercas, D. Reidenbach
     (Eds.), Combinatorics on Words - 12th International Conference, WORDS 2019, Lough-
     borough, UK, September 9-13, 2019, Proceedings, volume 11682 of Lecture Notes in Com-
     puter Science, Springer, 2019, pp. 1–27. URL: https://doi.org/10.1007/978-3-030-28796-2_1.
     doi:10.1007/978-3-030-28796-2\_1.
[10] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Volume 1: Word,
     Language, Grammar, Springer, 1997. URL: https://doi.org/10.1007/978-3-642-59136-5.
     doi:10.1007/978-3-642-59136-5.
[11] D. D. Freydenberger, D. Reidenbach, Existence and nonexistence of descriptive patterns,
     Theor. Comput. Sci. 411 (2010) 3274–3286. URL: https://doi.org/10.1016/j.tcs.2010.05.033.
     doi:10.1016/j.tcs.2010.05.033.
[12] D. D. Freydenberger, D. Reidenbach, Inferring descriptive generalisations of formal lan-
     guages, J. Comput. Syst. Sci. 79 (2013) 622–639. URL: https://doi.org/10.1016/j.jcss.2012.10.
     001. doi:10.1016/j.jcss.2012.10.001.
[13] T. Shinohara, Polynomial time inference of pattern languages and its application, in:
     Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science,
     MFCS, 1982, pp. 191–209.
[14] S. Kleest-Meißner, R. Sattler, M. L. Schmid, N. Schweikardt, M. Weidlich, Discover-
     ing multi-dimensional subsequence queries from traces - from theory to practice, in:
     B. König-Ries, S. Scherzinger, W. Lehner, G. Vossen (Eds.), Datenbanksysteme für Busi-
     ness, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs „Daten-
     banken und Informationssysteme" (DBIS), 06.-10, März 2023, Dresden, Germany, Pro-
     ceedings, volume P-331 of LNI, Gesellschaft für Informatik e.V., 2023, pp. 511–533. URL:
     https://doi.org/10.18420/BTW2023-24. doi:10.18420/BTW2023-24.
[15] P. Gawrychowski, M. Kosche, T. Koß, F. Manea, S. Siemer, Efficiently testing simon’s
     congruence, in: M. Bläser, B. Monmege (Eds.), 38th International Symposium on The-
     oretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken,
     Germany (Virtual Conference), volume 187 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum
     für Informatik, 2021, pp. 34:1–34:18. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.34.
     doi:10.4230/LIPIcs.STACS.2021.34.
[16] J. D. Day, P. Fleischmann, M. Kosche, T. Koß, F. Manea, S. Siemer, The edit distance to k-
     subsequence universality, in: M. Bläser, B. Monmege (Eds.), 38th International Symposium
     on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken,
     Germany (Virtual Conference), volume 187 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum
     für Informatik, 2021, pp. 25:1–25:19. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.25.
     doi:10.4230/LIPIcs.STACS.2021.25.
[17] J. D. Day, M. Kosche, F. Manea, M. L. Schmid, Subsequences with gap constraints:
     Complexity bounds for matching and analysis problems,               CoRR abs/2206.13896
     (2022). URL: https://doi.org/10.48550/arXiv.2206.13896. doi:10.48550/arXiv.2206.
     13896. arXiv:2206.13896.
[18] M. Kosche, T. Koß, F. Manea, V. Pak, Subsequences in bounded ranges: Matching and
     analysis problems, CoRR abs/2207.09201 (2022). URL: https://doi.org/10.48550/arXiv.2207.
     09201. doi:10.48550/arXiv.2207.09201. arXiv:2207.09201.
[19] M. Kosche, T. Koß, F. Manea, S. Siemer, Combinatorial algorithms for subsequence match-
     ing: A survey, 2022. arXiv:2208.14722.
[20] N. Giatrakos, E. Alevizos, A. Artikis, A. Deligiannakis, M. N. Garofalakis, Complex
     event recognition in the big data era: a survey, VLDB J. 29 (2020) 313–352. URL: https:
     //doi.org/10.1007/s00778-019-00557-w. doi:10.1007/s00778-019-00557-w.
[21] A. Frochaux, S. Kleest-Meißner, Puzzling over subsequence-query extensions: Disjunction
     and generalised gaps, CoRR 2305.08236 (2023). URL: https://doi.org/10.48550/arXiv.2305.
     08236. doi:10.48550/arXiv.2305.08236. arXiv:2305.08236.