<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Monotonic Aggregation for Temporal Datalog</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luigi Bellomarini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Nissl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuel Sallinger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Banca d'Italia</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Wien</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Understanding time-based e ects has become an important aspect for the analysis of Knowledge Graphs (KGs). We have seen this in di erent areas such as IoT or economics. Scalable solutions for using Datalog-based KGs with time are in their infancy and the usage together with aggregation has not been considered so far. Yet, one needs both aggregation and time-based analysis when analysing KGs such as those of economic phenomena. In this paper, we analyze monotonic aggregation over DatalogMTL, establishing the rst work that covers full recursion like in Datalog, aggregation, and temporal reasoning.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Datalog</kwd>
        <kwd>Temporal Reasoning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Understanding time-based e ects has become a central aspect of reasoning on
Knowledge Graphs (KGs), particularly in speci c but prominent application
settings and business domains. They include: IoT, where context awareness requires
aggregating temporal data from continuous (and heterogeneous) sources [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ];
declarative business process management, where activities and tasks need
careful scheduling and prioritization [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]; state-of-the-art security information and
event management systems (SIEM) with time-based alert and log events [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
More traditionally, a standard application for time-based reasoning lies in the
analysis of economic phenomena, where time is a natural dimension of
analysis (e.g., for time series). It is our experience that reasoning on KGs is being
increasingly applied to economic settings, often characterized by a complex
network of intertwined entities. It has been shown that a Datalog-based KG with
monotonic aggregation (i.e., aggregation that is only increasing/decreasing with
new values) provides su cient expressive power and scalability in many
applications of the nancial realm, including company ownership, fraud detection or
prevention of potential takeovers [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], from which a number of paradigmatic use
cases of temporal reasoning and aggregations emerge:
{ UC1: Revenue Calculation. Shareholder are interested in the revenue of
a company per week/month/year/over the complete lifespan.
{ UC2: Number of Trades. Financial analysts are interested in the number
of trades in the last hour/day.
{ UC3: Company Ownership/Shares. Detecting hidden company
ownerships is important to study and, if the case, prevent company takeovers.
Having this information not only for a single snapshot provides deeper
insights on the takeover determinants and improves accuracy in prediction.
{ UC4: Change of Control. Analysts wish to analyse ownership structures
over time, e.g., monotonically increasing or decreasing shareholding.
These four use cases highlight di erent aggregation scenarios. While UC1
requires xed intervals (which may vary in size, e.g., a month has 28 to 31 days),
UC2 has a moving window of xed size (e.g., an hour). UC3 requires to aggregate
temporal information recursively over structures (e.g., over paths of arbitrary
length). We call the kind of aggregation used in the previous use cases
timepoint aggregation as aggregation is applied on single time points, or time points
in an interval. Di erently from the previous cases, UC4 requires aggregating
potentially along the entire time axis and is not bounded by any pre-determined
interval; thus, we talk about time-axis aggregation.
      </p>
      <p>
        In order to support such use cases in Datalog-based reasoning, a temporal
extension of Datalog for aggregation over time is required. MTL (Metric
Temporal Logic) for Datalog (short DatalogMTL) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has been proposed as a
suitable extension to reason over the temporal domain. DatalogMTL extends
Datalog with rules such as ⊟[x;y]A → B or x[x;y]A → B, with x, y being positive
rational numbers, where B holds at time t if A holds at each (resp. some) time
in the interval [t − y; t − x].
      </p>
      <p>
        To support the mentioned use cases, temporal reasoning must be enriched
with the possibility to compute aggregations over time. Monotonic aggregation
has received a great deal of attention in the literature [
        <xref ref-type="bibr" rid="ref13 ref16">13, 16</xref>
        ] and has been shown
to be a suitable extension of Datalog for aggregation. For example, Shkapsky et
al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] introduced rules of the form A(g; x) → B(g; msum(x)), to express the
monotonic sum of x values in A grouped by g. However, to the best of our
knowledge, monotonic aggregation (or any other kind of temporal aggregation)
in a temporal extension of Datalog has not been discussed so far.
Challenges. Monotonic aggregation over the temporal domain raises a
number of technical challenges. Monotonic aggregation as de ned for non-temporal
reasoning is ine cient in memory and runtime performance over the temporal
domain. In addition, forms of temporal-speci c aggregation along the time axis
are required, for example to retrieve the intervals where a value is monotonically
increasing. Such operations should be non-blocking (i.e., return intermediate
results without waiting for the entire aggregation to be complete) to sustain
reactivity of the system.
      </p>
      <p>Our contribution. In this paper, we investigate aggregation for temporal
Datalog languages. Our main contributions are:
{ We provide the rst reasoning language that covers at the same time (1) full
recursion (i.e., it comprises Datalog); (2) aggregation capabilities, and
(3) temporal reasoning (i.e., it comprises MTL as in DatalogMTL).
{ We provide a principled integration of state-of-the-art semantics of (1-3),
overcoming critical semantic di erences while sustaining e cient evaluation.
{ We propose tractable, non-blocking algorithms for calculating monotonic
time-point aggregation in Datalog. To this end, we augment speci c
fragments of DatalogMTL with monotonic aggregation, by using advanced
interval tracking and enabling several time-based optimisations.
{ We introduce speci c, novel monotonic aggregation over the time axis
without any pre-determined bounds, which requires special care as time is the
aggregation dimension. In particular, we introduce, as an example, an
operator to analyse monotonic trends.
{ We provide real-world examples and show the practical relevance of our
extensions within settings of industrial relevance, such as those in the nancial
settings experienced in our work with the Central Bank of Italy.
Organization. The remainder of this paper is organized as follows: In Section 2
we introduce some preliminaries for DatalogMTL and non-temporal monotonic
aggregation. In Section 3 we discuss the requirements for temporal aggregation
in Datalog. In Section 4 we present time-aware algorithms for time-point
aggregation and discuss aggregation along the time axis in Section 5. We provide
related work in Section 6 and conclude the paper in Section 7. We provide proofs
and additional explanations in the appendix.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        In this section, we introduce DatalogMTLFP under continuous semantics [
        <xref ref-type="bibr" rid="ref17 ref18">17,
18</xref>
        ] as well as non-temporal aggregation as de ned by Shkapsky et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. FP
stands for forward propagating and refers to the fragment that does not allow
reasoning backwards in time, as this is not required for most applications and
has signi cant negative impact on complexity [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>Syntax of DatalogMTLFP. Let C and V be disjoint sets of constants, and
variables, respectively. A term is either a constant or a variable. An atom is an
expression of the form P ( ), where P is a predicate of arity n ≥ 0 and is a
n-tuple of terms. A rule is an expression of the form</p>
      <p>A1 ∧ ⋅ ⋅ ⋅ ∧ Ak → B
for k ≥ 0
where all Ai and B are literals that follow the grammar:</p>
      <p>A ∶= P ( ) S ⊟ A S x A</p>
      <p>B ∶= P ( )
The conjunction of Ai is the rule body, whereas B is the rule head. Instead of
∧ we sometimes use \;" to denote conjunction. A predicate p ∈ P is intensional
(IDB), if p occurs in the head of the rule whose body is not empty, otherwise
p is extensional (EDB). A rule is ground if it contains no variables, and safe if
each variable in the head is also contained in the body. A program is a nite
set of safe rules and is in normal form if it contains only rules of the form:
P1( 1) ∧ ⋅ ⋅ ⋅ ∧ Pn( n) →P0( 0)
(n ≥ 0)
x P1( 1) →P0( 0)
⊟</p>
      <p>P1( 1) →P0( 0)
(1)
(2)
(3)</p>
      <sec id="sec-2-1">
        <title>An interval</title>
        <p>= ⟨x; y⟩ is de ned over the domain Q≥0 ∪ {+∞} where
⟨x; y⟩ = {t ∈ Q≥0 S x ≤ t ≤ y and x; y ≥ 0 and x ≠ t if ⟨ is `(', and y ≠ t if ⟩ is `)'}:
We use ⟨, (resp. ⟩) depending on the context for unspeci ed or value-speci c, (,
resp. ) for open and [, resp. ] for closed intervals. An interval is bounded if
bxy;yt∈. QTh≥0e, lpeuftncetnudapl oifinitt iiss odfetnhoetefodr mby[t;−t],ani.de.,thhaesraiglhetngetnhdpofoi0n,tabnyd is+d,ewnohteerde
+ = ∞ if the interval is unbounded to the right. The length is de ned as S S = ∞
if + = ∞, otherwise S S = + − −. For two non-empty intervals 1 and 2, we
de ne the union 1 ∪ 2 and intersection 1 ∩ 2 as usual. Note that, a union
operation creates one or two intervals, whereas an intersection of two intervals is
either the empty set or exactly one interval. We de ne 1 + 2 as ⟨ 1− + 2−; 1+ + 2+⟩,
where ⟨ (resp. ⟩) is closed if 1 and 2 are left-closed (resp. right-closed), as well
as 1 ⊕ 2 as ⟨ 1− + 2+; 1+ + 2−⟩, where ⟨ (resp. ⟩) is closed if 1 is left-closed
and 2 is right-open (resp. if 1 is right-closed and 2 is left-open). We de ne
an interval as an empty set if − &gt; +.</p>
        <p>is a ground atom and t a punctual interval.</p>
        <p>Semantics of DatalogMTLFP. Let M be an interpretation based on a domain
≠ ; for the variables and constants that speci es, for each ground atom and
each time point t ∈ Q≥0, whether is true at t. In this case, we write M; t ⊧ .
Let be an assignment of elements of to terms such that (c) = c for each
constant c. We then de ne inductively:</p>
        <p>M; t ⊧ P ( )
M; t ⊧ ⊟ A
M; t ⊧ x A
if M; t ⊧ P ( ( ))
if M; s ⊧ A for all s with t − s ∈
We say that M satis es a rule under an assignment if M; t ⊧ B, whenever
M; t ⊧ Ai, for each 1 ≤ i ≤ k, and satis es a program if it satis es all its rules.
The interpretation M is a model of and D if M satis es for all possible
assignments (i.e., model of ), and if M; t ⊧ , for all facts a@t ∈ D (i.e.,
model of D). Finally, a program and a dataset D entail a fact @t, written
( ; D) ⊧ @t, if M; t ⊧ for each model M of and D.</p>
        <p>In the following, we provide as example the rules for detecting spamming traders
to illustrate the usage of DatalogMTL. We de ne spamming traders as trader
that send at least every 5 seconds a trading signal for a duration of one hour
(assuming seconds as granularity), where X is the trader identi er.</p>
        <p>
          ⊟[0;3600) x[0;5)tradingSignal (X) → spammingTrader (X)
Monotonic Aggregation. Monotonic aggregation so far has only been
introduced for non-temporal Datalog. Here we summarize the aggregation operators
mcount, msum, mmax, mmin, as introduced by Shkapsky et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The authors
de ne head atoms of the form P (k1; : : : ; km; aggr ⟨x; cX⟩), where ki are zero or
more group-by terms and aggr is one of the four mentioned aggregation
operations, where x is the aggregate term and cX, de ned only for sum and count, is
a sequence of contributor terms for the aggregation. The semantics of aggregates
are operationally de ned as the application of a mapping function from an input
set or multiset G that represents the values for a single group-by key to an output
set D. Given G, for mmin (resp. mmax) each element g ∈ G is put into D, if g is
smaller (greater) than the previously computed value. For mcount and msum,
g is of the form (NJ ; J ), where NJ indicates the partial count/sum contributed
by J , that is, NJ maps to x and J maps to cX. The aggregate functions now
map the input set or multiset G to D by computing the sum over all maximum
partial count NJ for all J and put the resulting value into D, in case the value
is higher than the previously computed value. That is, the output set D consists
of numbers (the (intermediary) results of the aggregation) that where inserted
in a monotonically increasing order.
        </p>
        <p>
          In the following, we provide as example the rules for path-counting in a DAG [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
to illustrate the usage of monotonic aggregation. The rst rule counts each edge
between two nodes as one path. The second rule counts the number of distinct
paths between two nodes X and Y through every Z. The last rule derives the
maximum value for each pair of nodes. For details of the execution of the rules,
we refer to Shkapsky et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>edge(X; Y ) → cpaths(X; Y; mcount⟨(1; X)⟩)
edge(Z; Y ); cpaths(X; Z; C) → cpaths(X ; Y ; mcount ⟨(C ; Z )⟩)
cpaths(X; Y; C) → countpaths(X ; Y ; max ⟨(C )⟩)
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Requirements</title>
      <p>In this section, we analyze the requirements of temporal reasoning and
aggregation in Datalog. We rst consider general requirements that are desirable in a
declarative AI solution in this context and then instantiate them into speci c
requirements for temporal aggregation:
{ RQ1: Declarative. Managing temporal properties calls for complex interval
checking logic and arithmetic. This easily leads to an error-prone and
laborintensive procedural approach. Declarative temporal operators that specify
what should be done instead of how it should be done avoid such pitfalls.
Example. A baseline implementation for UC2 would rst extend a time-point
interval to an hour-long interval, then split the intervals into independent
\counting" intervals|at each start or end of some interval in the data the count
changes, i.e., either a trade is added or removed from the data|and nally
count the number of entries per interval.</p>
      <p>{ RQ2: Implicit Time. While explicit time in rules provides the user the
possibility to access and modify temporal properties, they require semantic
restrictions for the chosen operations to block arbitrary rule behavior.
Implicit time handling, such as in temporal logics or DatalogMTL, does not
have such issues, allowing for a fully declarative, composable solution.
Example. For example, if time is explicit part of tuples and manipulated using
arithmetic, a user can add arbitrary arithmetic operations to the start and end
points of a rule (P (S; E) → R(S + 5; E + 5) or P (S; E) → R(S ∗ S; E + 5)).
{ RQ3: Optimizability. Temporal data often has the property of staying
the same for a longer interval. Yet, many temporal operations are de ned by
time point. Declarative operators and implicit time already lead to a certain
degree of optimizability, but any operators de ned should take into account
optimizable evaluation on intervals.</p>
      <p>We also need support for fundamental types of temporal aggregation, as e.g.,
required by the archetypal use cases discussed in the introduction.
{ RQ4: Windowed Temporal Aggregation. The ability to aggregate over
a xed time window, e.g., aggregate all values across the last hour. This
should at least support aggregates min, max, count, sum.
{ RQ5: Fixed-interval Temporal Aggregation. The ability to aggregate
over a xed interval, e.g., aggregate all values from the month of April 2021.</p>
      <p>This should at least support aggregates min, max, count, sum.
{ RQ6: Time Axis Aggregation. The ability to aggregate over intervals
of arbitrary length, e.g., nding periods of time where values are
monotonically increasing (a temporal trend in the data). This should at least support
monotonic increases and decreases.</p>
      <p>Looking at our archetypal use cases, UC2 is possible in a system that meets
RQ4, UC1 in RQ5, and UC4 in RQ6. UC3 is already possible using recursion
provided by Datalog, as well as, e.g., a system meeting RQ4 (it, however, would
be hard to meet in a system that does not support recursion).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Time Point Aggregation</title>
      <p>
        In this section, we take on the requirements for windowed and xed-interval
temporal aggregation we have laid out and introduce our core approach based
on declarative operators with implicit representation of time. In particular, we
focus on windowed and xed-interval aggregation and show e cient algorithms
extending the application of the standard non-temporal aggregations [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to the
temporal context. Time axis aggregation (RQ6) is dealt with in the next section.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Moving Windows</title>
        <p>
          The rst type of aggregation we discuss are moving window aggregations (e.g.,
covering the last hour; RQ4). This form of aggregation aggregates per time point
t all the facts that have been valid between t−w and t, where w is some arbitrary
window size. For this, we build upon DatalogMTLFP. Our basic approach is using
the x operator to extend the time validity of facts to size w and then applying
the non-temporal aggregation operation [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for each time point.
Syntax. Let us start by de ning the syntax of DatalogMTLFP with time point
aggregation. For this, we extend the DatalogMTLFP normal form (1-3; cf.
preliminaries) with additional time point aggregation rules of the following form:
x = aggr(P1( 0; 1; a)) →P0( 0; x)
(4)
where aggr is the aggregation type (e.g., count, sum), P1 and P0 are predicates,
0 are the group-by attributes, 1 are the contributor terms for sum and count,
and a the aggregate term (for count, where a is not part of the predicate, it
should be assumed as 1 in the following algorithm). Note that P1 has arity of
size S 0S + S 1S, in case aggr is of type count, else S 0S + S 1S + 1 and P2 has arity
S 0S + 1.
        </p>
        <p>
          Semantics. Let us continue by de ning the semantics of the newly introduced
rules. For this, we de ne the semantics on top of regular aggregation in
Datalog [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and apply it per time point:
        </p>
        <p>M; t ⊧ x = aggr (P ( 0; 1; a))</p>
        <p>if
= {P ( 0; 1; a) → AggrResult ( 0; aggr⟨a; 1⟩)} and
S = {P ( ) S M; t ⊧
(x) ∈ {u S AggrResult( 0; u) ∈ R}</p>
        <p>P ( )} and R = Eval (S;
) and
where stands for the sequence of terms ( 0; 1; a), AggrResult is a predicate
storing the aggregation value, and Eval returns the sets of facts resulting from
the evaluation of on S, using the regular aggregation in Datalog.
Example. Let us show the bene ts of using native temporal operators by
expressing UC2 in DatalogMTL extended by monotonic aggregation. Rule 1
extends the interval of Trade facts to one hour (assuming a time granularity of
seconds) and Rule 2 applies the count operation, using the trader account u as
the group-by term and a unique identi er id as the contributing term, to derive
the number of trades for each time point.</p>
        <p>x[0;3600)Trade(u; id) → TradeInterval(u; id)
m = mcount (TradeInterval(u; id)) → NumberOfTrades(u; m)
(1)
This example immediately highlights the visual and usability advantages of
using temporal operators. Note that we have provided a formulation using
nontemporal Datalog in the Appendix, underpinning the highly increased readability
of this version.</p>
        <p>
          Algorithm. The application of the rules follows the standard chase procedure
used with Datalog [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. More speci cally, we build upon the temporal work
of Walega et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] who apply derivation rules for each rule in normal form
plus additional merging rules for adjacent intervals exhaustively. Details on this
algorithm are provided in extended form in the Appendix.
        </p>
        <p>Our algorithm for time point aggregation extends existing temporal Datalog
derivation rules as follows. Let x = aggr (A( 0; 1; a)) → B( 0; x) be an instance
of a rule of form 4. We apply Algorithm 1 to derive a set B of facts in the form
@ . The algorithm takes as input the type T of the aggregation (e.g., min,
max, etc.), the aggregation predicate A, and a number of group-by terms g; it
returns as output a set of facts B with arity g + 1, where the rst g terms are the
group-by terms followed by the aggregated value. Line 1 de nes a map, whose
key is the group-by clause and the value is an ordered set (per time-interval)
of the current aggregation values, which, by construction of the algorithm, can
contain only non-overlapping time intervals. Line 2 de nes a similar map,
storing the intermediate results of speci c contributor terms. Line 3 iterates over
all currently derived facts ltered by the matching predicate name. Lines 4-6
extract the relevant properties of the fact and retrieve the current sets for the
provided keys. In particular, the function getData maps the rst g terms to the
Algorithm 2: Update List for Temporal Aggregation
groupByKey , the last term to a and the remaining terms to the cKey
(contributor Key). Then the algorithm branches depending on the aggregation type. For
instance, we continue with Line 7 for monotonic minimum, Line 9 for monotonic
maximum, or with Line 11. In case of min or max (Line 7-10) we can directly
update the nal result whereas for count and sum, we rst calculate the highest
value per contributor (Line 11). Since such value may increase over time, in
order to avoid full recomputation, we just consider the di erence with respect to
the previous contributor value for a certain interval (Lines 13-14). At the end
(Lines 14-16) we iterate over the lists and call mergeAdjacentInterval, which as
the name says merge non-overlapping adjacent intervals with the same value to
reduce the list size and write the resulting intervals back to the storage. Line
17 returns all output facts of the algorithm (that is, it removes the required
grouping for the calculation).</p>
        <p>The UpdateList function is detailed in Algorithm 2. It iterates over a list of
intervals and updates it with the new values. In case the interval list is empty,
it just adds the interval (Lines 3-6), otherwise it searches for the rst interval
starting after the interval to be inserted (Line 10). It then checks whether there
is some interval to be inserted before the current interval (Lines 14-17), modi es
the boundaries of the current entry in case the interval starts or ends in this
entry (Lines 18-23), and updates the value of the current entry (Lines 24-27).
Having reached the end of the list or an interval that is after the insertion range
(Lines 10-12), it adds the remaining interval to the list (Lines 28-30).</p>
        <p>Note that storing the aggregation result enables an e cient incremental
approach in the algorithm, so that in further applications of the derivation rules,
only partial \delta-updates" are computed. For this reason, we skip the
initialization of the global variables and keep the current values. In addition, in Line 3
we do not check over all facts, but only over the newly derived ones. We use
the function symbols prec and succ to reference the preceding (resp. succeeding)
interval points and use them to harmonise the interval endpoints.
Theorem 1. Algorithm 1 has worst-case runtime O(n2) and worst-case
memory consumption O(n), where n is the number of facts contributing to the
aggregation. The output has maximum size O(n).</p>
        <p>Proof Sketch. We now show the basic idea behind the complexity of the
algorithm. The interested reader can nd the full proof in the Appendix. The O(n)
space requirement depends only on UpdateList. In particular, we argue that at
most 2n intervals are created in the process of adding n intervals to the list.</p>
        <p>This follows from a rather technical argument, but can be intuitively grasped
as follows: Let denote an inserted interval. Our interval can intersect at
its left endpoint ( −) one interval from the current list (and, importantly, at
most one such interval, as maintaining non-overlapping intervals is central to the
algorithm). In this case we need to split , creating a total of two new intervals:
we replace by the interval − to −, create a new interval from − to +, and
another new interval one from + to +. Multiple other variants are possible. The
interesting part is when our newly inserted interval intersects more than one
interval, in which, intuitively speaking it uses \budget" stemming from earlier
inserts that necessarily have not used their full \budget" of two inserts. As
mentioned before, the full, technical argument can be found in the Appendix.</p>
        <p>The runtime of O(n2) descends from the fact we iterate over all facts, and
for each fact the UpdateList is performed, taking O(n) time. Since both changes
and aggrVals are sorted lists, they can be scanned together within a one-pass
iteration, hence keeping linear complexity (intuitively, like in the usual merging
of sorted lists).</p>
        <p>We now show soundness and completeness of Algorithm 1, i.e., that every
interval produced by the semantics is also derived by the algorithm, and the
other way around, every interval derived by the algorithm is also produced by
the semantics. Note that this is in the context of the facts that are input of the
algorithm. As discussed earlier we consider recursive derivation in the Appendix.</p>
        <sec id="sec-4-1-1">
          <title>Theorem 2. Algorithm 1 is sound and complete.</title>
          <p>Proof Sketch. We now show the basic idea behind the soundness and
completeness of the algorithm. The interested reader can nd the full proof in the
Appendix. Soundness requires to retrieve the best (e.g., min, max, highest) value for
derived intervals. This is trivially achieved by Line 24 of UpdateList.
Completeness requires that all intervals are derived. Since possible overlapping intervals
at the start and end are split (Line 18-23), missing intervals before an entry are
inserted (Line 14-17) and missing intervals after the last entry are added (Line
28-30), all possible intervals are created.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Fixed Intervals</title>
        <p>
          The next type of aggregation we discuss are xed interval aggregations (e.g.,
per month). In comparison to moving windows, we cannot rely on temporal
operators provided by DatalogMTL due to di erent lengths, e.g., months vary
between 28 and 31 days. Therefore, we introduce an additional operator that
extends intervals to their complete unit of interest, e.g., an interval from the
15th of March to 17th of April to the 1st of March to the 30th of April. Such
kinds of operators have been successfully applied in the context of temporal
databases (cf., Bettini et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]).
        </p>
        <p>Syntax. Let us start by formally de ning the new operator. For this, we extend
the DatalogMTLFP normal form with an additional rule of the following form:
(5)
(1)
(2)
△unit P1( 1) →P0( 0)
where △ is the new time extension operator and unit is a time unit, e.g., day,
month, or year.</p>
        <p>Semantics. The semantics of the operator △ is de ned as follows:
M; t ⊧ △unit A</p>
        <p>A for some s with conv(t; unit) = conv(s; unit)
where conv converts the time point to the provided time unit. For example, the
date 31.12.2020 with unit year, would be converted to 2020.</p>
        <p>Example. Let us demonstrate the xed interval aggregation by expressing UC1.
Rule 1 extends the interval of the sales to its corresponding year and Rule 2
applies the sum operation, using the id (and price) as contributing terms, to
derive the revenue of the year.</p>
        <p>△yearSale(id; price) → YearSale(id; price)
m = msum(YearSale(id; price)) → Revenue(m)
Derivation Rule. In order to handle such rules in the chase procedure, we use
the following derivation rule: If △u → is a ground instance of a rule, and
@ 1 ∈ F , then add @ to F where = [a; b), where a = conv(conv( 1−; u); u2),
b = conv(conv( 1+; u) + 1 u); u2) and u2 is the initial unit of 1, i.e., the time
unit gets converted back to the precision of the predicate, e.g. 2021-02-16 with
a precision of month gets after the inner conversion 2021-02 and after applying
the outer conversion 2021-02-01.</p>
        <p>Limitations. We brie y want to discuss the limitations of this approach. The
main goal of using the extension of intervals is keeping compatibility with
DatalogMTL and its interval semantics by calculating the aggregates over the
interval boundaries asked in the query. However, using such an approach limits the
possibility to answer queries that require changing time granularity, like in the
following example where we wish to compare the revenue on a weekday basis.
For such requirements, we suggest the introduction of unwrapping operations
that map the timestamp to an appropriate representation in usual Datalog, e.g.,
by introducing rules of the form Sale(id; price)@t → SaleEscaped (id; price;
weekday(t)). A detailed discussion of such rules is out of scope for this paper.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Time Axis Aggregation</title>
      <p>So far, we have focused on aggregations which work per time point. In this
section, we now move to aggregations along the time axis. As introduced in UC4
and RQ6, this means we need to summarize values changing over time
considering adjacent intervals. One such function, which we study in detail in this paper,
is the one detecting whether a trend is monotonically increasing/decreasing over
time. Let us call it minc resp. mdec. This is e ective in many domains, e.g., that
of change of control we have introduced, but also for example population counts.
We rst start with an example and consider the desired formulation.
Example. Assume that we want to detect changes of control described in UC4.
Then, we are interested in nding the time intervals in which the number of
shares has been monotonically increasing as well as the minimum and maximum
values in those intervals. Assume now that the atom Shares(p; c; s) represents
the number of shares s that an investor p owns of a company c. Rule 1 shows
the minc operator in action. It takes as argument the number of shares, groups
them by investor and company and returns as output the lower bound (i.e., the
leftmost value) and the upper bound (i.e., the rightmost value) per monotonically
increasing interval.</p>
      <p>⟨l; u⟩ = minc(Shares(p; c; s)) → SInc(p; c; l; u)
Similar to the previous section, we (i) provide a normal form (syntax) for time
axis aggregations, (ii) specify the semantics of each time axis aggregation, (iii)
provide a derivation rule, and (iv) suggest an algorithm for time axis operation.
Syntax. We start by providing a normal form (actually already used in Rule (1),
generalizing the normal form introduced for time point aggregation:
⟨x⟩ = aggr(P1( 0; 1; a)) → P0( 0; x)
(1)
(6)
Functor aggr is the name of the time axis aggregation, in our case minc or
mdec, and P1 and P0 are predicates. Like in time point aggregation, 0 de nes
the group-by clause, 1 the contribution terms, not directly used in this form
of aggregation but kept for uniformity reasons, and a is the aggregation term of
P1. Over time, there can be multiple values of interest, for example the start and
end value of a monotonically increasing interval. Hence, the function returns a
vector x = x1; : : : ; xn of aggregation values instead of a single value. For minc
and mdec we use x = l; u to denote the lower and upper bound of the monotonic
interval.</p>
      <p>Semantics. We assume that the domain of temporal aggregation is that of
disjoint intervals, and so, for a single group-by key, no ambiguity arises w.r.t. the
aggregation term. Also, this makes time axis aggregation fully orthogonal to
time point aggregation, where, instead, intervals are combined. Also, time point
aggregation can be e ectively used to disambiguate values per time interval (e.g.,
by considering their maximum/minimum resp. the summation) before
proceeding with time axis aggregation. Another e ect of such disjoint intervals is that
the semantics of time axis aggregation can be easily formulated by moving from
time points to time intervals, i.e., M; ⊧ if for all t ∈ it holds that M; t ⊧ .
⊧ ⟨l; u⟩ = minc(P1( 0; a))
if M; ⊧</p>
      <p>M (P1; 0; l; u)
M;
M; ⊧
M; ⊧</p>
      <p>M (P1; 0; a; a)
M (P1; 0; l; u)
if M; ⊧ P1( 0; a)
if M; 1 ⊧</p>
      <p>M; 2 ⊧</p>
      <p>M (P1; 0; l; u1) and</p>
      <p>M (P1; 0; l2; u) and
u1 ≤ l2 and +
1 ≺ 2− and
= 1 ∪ 2
where ≺ is the predecessor relation (i.e., the intervals are adjacent), M a fresh
predicate for deriving minc. In short, the second de nition states that a value
constant in an interval is both lower and upper bound, and the third one merges
two intervals if their lower and upper bounds match. The semantics for mdec is
analogous, with u1 ≤ l2 changed to u1 ≥ l2.</p>
      <p>Derivation rule. Let us now introduce our derivation mechanism. For each
instance ⟨x⟩ = aggr (A( 0; 1; a)) → B( 0; x) of a rule of form 6, where aggr is
minc or mdec and 1 is the empty set, apply Algorithm 3 to derive a set B of
facts of the form @ , where contains the group-by values and the aggregation
values l and u.</p>
      <p>Algorithm. Like for time point aggregation, we provide an e cient algorithm
for minc and mdec which uses the bene ts of native temporal operators.
Algorithm 3 takes as input the aggregation predicate A and a number of group-by
indices g; it returns as output a set of facts B. Line 1 de nes an empty map
for the result, where the keys are the group-by terms and the values are B-trees
with the intervals as key of the entries. Then for each fact, we add the fact to the
appropriate group-by clause (Lines 3-5). We then merge the inserted fact (Line
7-12) with their adjacent intervals, if they exist, so that we derive the largest
possible, monotonically increasing interval. For deriving monotonically
decreasing intervals, the comparison operator in Lines 7 and 11 has to be changed from
≤ to ≥. Line 13 returns all output facts of the algorithm (that is, it removes
the required grouping for the calculation). Note that, similarly to Algorithm 1,
delta-updates can be applied by skipping Line 1 of the algorithm.
Theorem 3. Algorithm 3 has runtime O(nlog(n)) in the worst case and
worstcase memory consumption O(n), where n is the number of facts contributing to
the aggregation. The output has maximum size O(n).</p>
      <p>Proof Sketch. We show here the basic idea behind the complexity of the algorithm
(details in Appendix). The space requirement of O(n) depends on the fact that
at most n entries are inserted in the tree. The runtime of O(nlog(n)) is based
that we iterate over all facts (O(n)), and for each fact we insert to and remove
entries from the tree, which has a complexity of O(log(n)).</p>
      <sec id="sec-5-1">
        <title>Theorem 4. Algorithm 3 is sound and complete.</title>
        <p>Proof Sketch. We show the basic idea behind the soundness and completeness
regarding the maximum derived intervals (details in Appendix). Completeness
follows immediately, since all intervals are inserted in the tree. For soundness
we have to show that maximum intervals are derived with the correct lower and
upper value. This follows by the two merging operations of possible adjacent
intervals, where adjacent intervals are replaced with a new interval and the
lowest value is chosen from the left and the highest value is chosen from the
right interval.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        First temporal extensions to Datalog were suggested already in the 1980s. Most
approaches can be grouped into two types, one focusing on implementing
temporal constructs via arithmetic operations, e.g., by applying the +1 function to
model di erent discrete temporal units (Datalog1S [
        <xref ref-type="bibr" rid="ref14 ref23 ref7">7, 14, 23</xref>
        ]), the other
including operators from temporal logic such as the always and eventually operator
from LTL or CTL into Datalog (Templog [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], DatalogLite [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). Newest
developments are based on MTL [
        <xref ref-type="bibr" rid="ref11 ref17 ref6">6, 17, 11</xref>
        ], an extension of LTL to enrich the expressive
power of Datalog programs. While most of the work for DatalogMTL so far
purely consisted in complexity results, we nd a proposal for a non-recursive
fragment [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], as well as algorithm for stream reasoning by Walega et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
The latter work clearly di ers from the one we have presented in this paper, in
that our focus is on monotonic aggregation over the temporal domain.
      </p>
      <p>
        Similarly, aggregation in Datalog has been discussed for many years. The
standard Datalog xpoint semantics only works as long as rules de ne monotonic
transformations (w.r.t. set containment). Aggregation breaks this requirement.
Earlier solutions [
        <xref ref-type="bibr" rid="ref15 ref24">15, 24</xref>
        ] use non-deterministic choice constructs, partial-orders
that are more powerful than set containment, or an in nite level of strati
cation. These approaches were of limited generality and also required to resort to
sophisticated compilers to detect monotonic programs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. DatalogFS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is the
rst approach that uses continuous aggregate functions to support monotonic
counts, sums over positive values, and extrema aggregates (min,max). Shkapsky
et al.[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] provided rst practical algorithms for monotonic aggregation by
introducing contributor and group-by terms, which we used as foundations for our
approach. While their work focused on optimizing aggregation for a single time
point, our approach aims at building monotonic aggregates for all time points
e ciently, which requires speci c handling of fact validity intervals, and dealing
with time axis aggregation. Recent work studied the de nition of min and max
over limit Datalog [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], but only considers restricted use of sum and count by
means of a sorted list of facts. Finally, Wang et al.[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] studied techniques to
convert non-monotonic aggregates to monotonic ones.
      </p>
      <p>
        Apart from Datalog-speci c related work, LARS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a temporal stream
reasoning framework focusing on nite streams by extending propositional logic and
can be seen as an extension for ASP. It supports the usage of a window operator
that returns a sub-stream containing only n time points. In comparison with
DatalogMTL it o ers a model-centric perspecitve (instead of a query-centric),
and only considers nite data. With a recent extension called weighted LARS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
a formula can be evaluated as an algebraic expression over a semi-ring, allowing
to compute all aggregates bounded by the semi-ring along the time-axis, e.g.,
they support to count the number of time points at which an expression holds.
In detail, the diamond-operator interprets the formula (i.e., all values of the
sub-stream) by using the sum-operator of the semi-ring and the box-operator
by using the product-operator of the semi-ring, while true values are mapped to
one, false values are mapped to zero, and is mapped to times and or is mapped
to plus. In comparison, we focus on combining two fundamental directions of
Datalog studies, namely DatalogMTL and monotonic aggregation and discuss
e cient aggregation algorithms for DatalogMTL.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper, we presented a solution for using monotonic aggregation over
temporal Datalog fragments. This paper is, to the best of our knowledge, the
rst work to include (1) full recursion as in Datalog, (2) aggregation, and (3)
temporal reasoning - a combination required by important use cases as we have
shown. We showed that native temporal operators allow to apply speci c
optimization techniques to reduce performance overhead and memory consumption
as well as increase usability. Our suggested algorithms can be executed in a
nonblocking fashion (i.e., there is no need to wait for all the aggregation contributes
to be available). In future work, we aim at providing a choice of experimental
results showing the bene ts of using native temporal operators for monotonic
aggregation.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The nancial support by the Vienna Science and Technology Fund (WWTF)
grant VRG18-013 is gratefully acknowledged.</p>
    </sec>
    <sec id="sec-9">
      <title>Overview of Appendix</title>
      <p>The appendix is structured as follows: In Section B we provide an extended
DatalogMTL reasoning algorithm, in Section C we provide additional background
information to Section 4, and in Section D we explain details for Section 5.
B</p>
    </sec>
    <sec id="sec-10">
      <title>DatalogMTL Reasoning Algorithm</title>
      <p>
        In this section we present a modi ed version of the DatalogMTL reasoning
algorithm presented in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This version modi es and extends the derivation rules
with the newly introduced rules and establishes new output possibilities.
Critically, existing approaches are speci c to streaming data and only allow to output
derived facts for pre-chosen time points, requiring to decide before knowing the
reasoning result which time points contain relevant information.
      </p>
      <p>Algorithm 4: A reasoning algorithm for monotonic aggregation over
time by using DatalogMTLFP</p>
      <p>Input: A program , a dataset S, a list of output predicates Q, a set of
output intervals X (optional, if not present X = {D}), a bounded
reasoning interval domain D</p>
      <p>Output: All valid predicates, given in Q, with its intervals in D and X
1 F := S;
2 do
3 Apply rules from Table 1 exhaustively;
4 F := F ∩ D, i.e., map each P ( )@ f ∈ F to P ( )@ , s.t., = f ∩ D;
5 while F has changed ;
6 foreach P ( ) in F do
7 if P in Q then
8 P ′( ) ← P ( ) ∩ X, i.e., map each x ∈ X and P ( )@ f to P ′( )@
s.t. = x ∩ f is not empty;
9 output P ′( );
Algorithm 4 takes as input a program in normal form, a dataset S, a list of
output predicates Q, optionally a set of output intervals X, a bounded reasoning
interval domain D and returns as output a maximal set of facts (i.e., due to the
union rule all intervals are merged, if possible). Only facts valid in the reasoning
domain interval D are returned, and, if X is given, in particular, only those valid
within non-empty intersection with some interval x ∈ X. De ning X as a single
punctual interval allows to output all valid facts at a certain time point. Line 1
initializes the derived facts with the provided dataset. Lines 2-5 apply the rules
of Table 1 to derive new facts and restrict those facts to the chosen interval
domain.</p>
      <p>{ The Horn rule adds all heads, where the intervals of the bounded body
predicates overlap.
(D) If x 1</p>
      <p>where
(B) If ⊟ 1</p>
      <p>where
(H) If ∧in=1 i → a ground instance of a rule and for each i i@ i ∈ F , then add
@ to F with = ∩in=1 i</p>
      <p>a ground instance of a rule, and 2@ 2 ∈ F , then add @ to F
→
= 1 + 2.</p>
      <p>
        a ground instance of a rule and 2@ 2 ∈ F , then add
{ The Diamond rule captures the semantics of the x operator. It expands
(i.e., if 1− &lt; 1+) the interval 2 and shifts (i.e., if 1− &gt; 0) it into the future.
{ The Box rule captures the semantic of the ⊟ operator. It reduces (i.e., if
1− &lt; 1+) the interval 2 and shifts (i.e., if 1− &gt; 0) it into the future.
{ The Triangle rule captures the semantics of the △ operator. It expands the
interval to the range of the chosen unit.
{ The time Point aggregation rule captures the semantics of time point
aggregation. It applies the aggregation per time point over all facts where the
time point is in the interval of the fact.
{ The time Axis aggregation rule captures the semantics of monotonic
increasing and decreasing intervals. It applies the aggregation over the time axis
and merges adjacent intervals following the given criteria.
{ The Union rule derives new intervals based on derived facts by merging
overlapping intervals. In addition, it optimizes the storage size by removing
the old, now merged, intervals. This rule includes the deletion of subsets.
Lines 6-9 output the derived facts that are part of the list of predicates Q and
intersect with the output intervals X. Due to the union rule, each interval of
a fact in the output is not a subset of another interval of the same fact in the
output. Note that this algorithm does not terminate, in case the time point
aggregation derives no xpoint, that is for example when a recursive sum is
calculated, where additional facts are produced per application of the rule. This
is a typical issue for monotonic aggregation, which is also experienced in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
C
      </p>
    </sec>
    <sec id="sec-11">
      <title>Appendix for Section 3</title>
      <p>In this section we provide a non-temporal Datalog encoding of UC2 and the
proofs of Theorem 1 and Theorem 2.</p>
      <p>Trade(u; id; t) → TradeInterval (u; id; t; t + 3600; 1; 0)
TradeInterval (u; id; x; y; xb; yb) → IPoint (x; xb); IPoint (y; yb)</p>
      <p>IPoint (t ; c) → Int ′(0; mmin(t); 1; c)</p>
      <p>Int ′(x; y; 1; 0) → Int (x; y; 1; 1)</p>
      <p>Int ′(x; y; 1; 1) → Int (x; y; 1; 0)
Int ( ; tp; ; 1); IPoint (t ; c); t &gt; tp → Int ′(tp; mmin(t); 0; c)
Int ( ; tp; ; 0); IPoint (t ; c); t &gt; tp → Int ′(tp; mmin(t); 1; c)</p>
      <p>Int ′(x; y; 0; 0) → Int (x; y; 0; 1)</p>
      <p>Int ′(x; y; 0; 1) → Int (x; y; 0; 0)</p>
      <p>Note that Rules 3b and 3c complete 4c and 4d
Int (x; y; xc; yc); TradeInterval (u; id; st; et; sc; ec);</p>
      <p>(x &lt; et); (y &gt; st); → V (u; id; x; y; xc; yc)
Int (x; y; xc; yc); TradeInterval (u; id; st; et; sc; ec);
UC2 in non-temporal Datalog. Figure 1 highlights the required rules for
writing such an aggregation in non-temporal Datalog. In a rst step, for all four
aggregation types, we calculate all interval points at which the query answer may
change. We provide the calculation of such points in Rules 1-4. Rule 1 extends
the Trade facts to one-hours intervals TradeInterval (i.e., the desired time of the
query) and de nes that the intervals are left-closed and right-open (represented
by 1 resp. 0). Rule 2 extracts all relevant interval points IPoint . Rules 3 and
4 create intervals Int between all extracted points and assign open and closed
properties. Having established the relevant intervals, we continue in Rule 5 with
detecting overlaps between TradeInterval and Int and assigning the values of
TradeInterval to the overlapping intervals Int . In particular, this rule covers
overlapping checks of all combinations of open and close intervals. Finally, Rule
6 calculates the number of trades per interval and user (so we are grouping per
T radeInterval
1
2
user and time-interval). In order to support other aggregation types, we have to
adapt Rule 6, where the aggregation has to be changed accordingly.</p>
      <p>Consider Figure 2 for an example of ve Trade events already extended to one
hour intervals. The ve top-most intervals denote all TradeInterval predicates.
Vertical, dashed lines show all IPoint . The intervals at the bottom of Figure 2
refer to the nal result, that is, the NumberOfTrades predicates.
Theorem 1 Algorithm 1 has worst-case runtime O(n2) and worst-case memory
consumption O(n), where n is the number of facts contributing to the
aggregation. The output has maximum size O(n).</p>
      <p>Proof. We rst show the space requirements. For this, we consider UpdateList,
which is the only part where new data is added, where we show a space
requirement of O(n). Let us start with the list which in total contains at most
2n entries. There are two di erent options: (1) the list is empty, then the fact
is added (Line 3-6), which creates at most one entry for a single fact, or (2) the
set is not empty, then the number of added facts depend on existing facts in the
list . It is clear that Line 14-17 (resp. 18-20), only re at most once, that is when
the new interval to be inserted starts (resp. ends) inside an existing interval.
The same applies when the interval starts or ends not in an existing interval,
where Line 14-17 add the starting interval before the current entry and Line
28-30 append the ending interval after the current entry. This in total creates
at most 2 new entries. It remains to show the additional rings of Line 14-17.
If n entries exist in the set, there can be a maximum of n − 1 gaps. It is clear
that a gap is only created when the inserted interval ts between two
existing intervals, requiring only 1 new entry. So, in worst case, Line 14-17 only ll
up the remaining budget of 2n entries. Let us now consider changes. In worst
case, when the new interval spans over the complete list , then each entry in the
list requires one changes entry, that is in total at most 2n entries. In total, this
yields O(2n + 2n) = O(4n) = O(n) as the space bound. For runtime, we can build
upon the space requirements. It is clearly visible that the runtime for mmin and
mmax equals to the rst part of msum and mcount, i.e., the update operation
of line 11 and 9 are equal for mmax or similar for mmin. Hence, we only have
to consider the else part inside the outer foreach loop. For each fact (O(n)), we
update the contributor set (Line 13) and then apply updates to the aggregation
value (Line 15-16). We start by showing the runtime of the update for the
contributor set. It is clear that UpdateList iterates once over the list, which require
O(2n) times in worst case. We now consider the update process and show that
we will not exceed the runtime of the updating process. We have shown that
the space bound of aggrValues as well as changes is O(2n). Since those sets are
ordered, we can scan both sets together within a one-pass iteration by checking
the boundaries of the following intervals. Hence, we do not exceed the runtime of
O(2n). The same applies to the merging operations, which iterates over the list
once. This concludes the argument that the algorithm takes O(2n) = O(n) time
in worst case for a single added value. Now we consider that n facts contribute to
2 .
the aggregation, hence the algorithm is executed n times and thus run in O(n )
The nal attening step loops over the aggrResult of size O(2n) and maps the
result to the appropriate format of the output. Hence, the output size is also at
most O(n).</p>
      <p>Theorem 2 Algorithm 1 is sound and complete.</p>
      <p>Proof. Let us start with the soundness of the algorithm, which is trivial. We
have to show that the returned results are the best result per interval, where
best mean the minimal one for mmin, the maximum one for mmax, the highest
sum for msum or count for mcount. This means, for each interval we have in the
output list, we detected the best value. This is given by the chosen aggr-function
and Line 24 of UpdateList that calculates the best value for each interval. Let
us now consider the completeness of the algorithm to show that all interval
rules are considered. For this, we show that we do not miss any interval in the
update process, when we consider a new fact. It is clear by Line 14-17 that an
interval that does not exist before a list entry, is added and by Line 28-30 that
an interval that is after the last list entry or together with Line 11-13 ends after
an existing list entry is added. It remains to show that also existing intervals are
modi ed correctly. This is handled by Lines 18-20 if the interval starts inside
another interval and Lines 21-23 if the interval ends inside another interval. The
remaining part of time point algorithm does not modify the intervals, except of
the merging functions, which only combine existing intervals to maximum ones,
having no impact on the covering of the intervals. Hence, the algorithm is sound
and complete.</p>
      <p>D</p>
    </sec>
    <sec id="sec-12">
      <title>Appendix for Section 4</title>
      <p>In this section, we show the proofs of Theorem 3 and Theorem 4.
Theorem 3 Algorithm 3 has runtime O(nlog(n)) in the worst case and
worstcase memory consumption O(n), where n is the number of facts contributing to
the aggregation. The output has maximum size O(n).</p>
      <p>Proof. First, we show the space requirement of O(n). Each fact can be added
to the tree at most once (Line 5). This state is reached, if the facts are strictly
monotonically decreasing when monotonically increasing is asked. The other
lines remove more facts than those added, hence the size of the list only reduces.
This ends the argument of space requirement of at most O(n). This also shows
the output size of O(n) of the algorithm.</p>
      <p>Now, we show the runtime complexity of O(nlog(n)). Inserting and removing
facts in a B-tree has a complexity of O(log(n)). We then consider that we have
to execute this for each fact contributing to the aggregation, i.e., O(n) times.
This yields a runtime complexity of O(nlog(n)).</p>
      <p>Theorem 4 Algorithm 3 is sound and complete.</p>
      <p>Proof. Let us start by completeness, which follows immediately. Since all values
(with their intervals) are inserted in the tree, the algorithm does not miss any
case. We now show the soundness of the algorithm. We show this by induction.
In the base case, we add the rst interval to a group-by key, which is simply the
value of the fact. We now assume that we have maximal monotonic increasing
intervals and that we add another fact. Since the considered facts must have a
disjunct interval by de nition, there are three cases, we have to distinguish.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Temporal logic programming</article-title>
          .
          <source>J. Symb. Comput</source>
          .
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>277</volume>
          {
          <fpage>295</fpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dao-Tran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fink</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>LARS: A logic-based framework for analyzing reasoning over streams</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <volume>1431</volume>
          {
          <fpage>1438</fpage>
          . AAAI Press (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bellomarini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magnanimi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
          </string-name>
          , E.:
          <article-title>Neither in the programs nor in the data: Mining the hidden nancial knowledge with knowledge graphs and reasoning</article-title>
          .
          <source>In: MIDAS@PKDD/ECML. Lecture Notes in Computer Science</source>
          , vol.
          <volume>12591</volume>
          , pp.
          <volume>119</volume>
          {
          <fpage>134</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Benson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grulich</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeuch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rabl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Disco: E cient distributed window aggregation</article-title>
          .
          <source>In: EDBT</source>
          . pp.
          <volume>423</volume>
          {
          <fpage>426</fpage>
          . OpenProceedings.org (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bettini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jajodia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.S.:</given-names>
          </string-name>
          <article-title>Time granularities in databases, data mining, and temporal reasoning</article-title>
          . Springer (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalayci</surname>
            ,
            <given-names>E.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryzhikov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ontology-based data access with a Horn fragment of metric temporal logic</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <volume>1070</volume>
          {
          <fpage>1076</fpage>
          . AAAI Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chomicki</surname>
          </string-name>
          , J.:
          <article-title>Polynomial time query processing in temporal deductive databases</article-title>
          .
          <source>In: PODS</source>
          . pp.
          <volume>379</volume>
          {
          <fpage>391</fpage>
          . ACM Press (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiesel</surname>
          </string-name>
          , R.:
          <article-title>Weighted LARS for quantitative stream reasoning</article-title>
          .
          <source>In: ECAI. Frontiers in Arti cial Intelligence and Applications</source>
          , vol.
          <volume>325</volume>
          , pp.
          <volume>729</volume>
          {
          <fpage>736</fpage>
          . IOS Press (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Gradel</article-title>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Veith</surname>
          </string-name>
          , H.:
          <article-title>Datalog LITE: a deductive query language with linear time model checking</article-title>
          .
          <source>ACM Trans. Comput. Log</source>
          .
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <volume>42</volume>
          {
          <fpage>79</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostylev</surname>
            ,
            <given-names>E.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Complexity and expressive power of disjunction and negation in limit datalog</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <volume>2862</volume>
          {
          <issue>2869</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kikot</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryzhikov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walega</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>On the data complexity of ontology-mediated queries with MTL operators over timed words</article-title>
          .
          <source>In: Description Logics. CEUR Workshop Proceedings</source>
          , vol.
          <volume>2211</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendelzon</surname>
            ,
            <given-names>A.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagiv</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Testing implications of data dependencies</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>4</volume>
          (
          <issue>4</issue>
          ),
          <volume>455</volume>
          {
          <fpage>469</fpage>
          (
          <year>1979</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mazuran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serra</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Extending the power of datalog recursion</article-title>
          .
          <source>VLDB J</source>
          .
          <volume>22</volume>
          (
          <issue>4</issue>
          ),
          <volume>471</volume>
          {
          <fpage>493</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ronca</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Stream reasoning in temporal datalog</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <year>1941</year>
          {
          <year>1948</year>
          . AAAI Press (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ross</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagiv</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Monotonic aggregation in deductive databases</article-title>
          .
          <source>In: PODS</source>
          . pp.
          <volume>114</volume>
          {
          <fpage>126</fpage>
          . ACM Press (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shkapsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Optimizing recursive queries with monotonic aggregates in deals</article-title>
          .
          <source>In: ICDE</source>
          . pp.
          <volume>867</volume>
          {
          <fpage>878</fpage>
          . IEEE Computer Society (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Walega</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostylev</surname>
            ,
            <given-names>E.V.</given-names>
          </string-name>
          :
          <article-title>Datalogmtl: Computational complexity and expressive power</article-title>
          .
          <source>In: IJCAI</source>
          . pp.
          <year>1886</year>
          {
          <year>1892</year>
          . ijcai.
          <source>org</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Walega</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostylev</surname>
            ,
            <given-names>E.V.</given-names>
          </string-name>
          :
          <article-title>Tractable fragments of datalog with metric temporal operators</article-title>
          .
          <source>In: IJCAI</source>
          . pp.
          <year>1919</year>
          {
          <year>1925</year>
          . ijcai.
          <source>org</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Walega</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>Reasoning over streaming data in metric temporal datalog</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <volume>3092</volume>
          {
          <fpage>3099</fpage>
          . AAAI Press (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          :
          <article-title>Automating incremental and asynchronous evaluation for recursive aggregate data processing</article-title>
          .
          <source>In: SIGMOD Conference</source>
          . pp.
          <volume>2439</volume>
          {
          <fpage>2454</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Yongchareon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>Resource management for business process scheduling in the presence of availability constraints</article-title>
          .
          <source>ACM Trans. Manag. Inf. Syst</source>
          .
          <volume>7</volume>
          (
          <issue>3</issue>
          ), 9:
          <issue>1</issue>
          {9:
          <issue>26</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oprea</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Onarlioglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leetham</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>W.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juels</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirda</surname>
          </string-name>
          , E.:
          <article-title>Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks</article-title>
          .
          <source>In: ACSAC</source>
          . pp.
          <volume>199</volume>
          {
          <fpage>208</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Logical foundations of continuous query languages for data streams</article-title>
          .
          <source>In: Datalog. Lecture Notes in Computer Science</source>
          , vol.
          <volume>7494</volume>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arni</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ong</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Negation and aggregates in recursive rules: the LDL++ approach</article-title>
          . In: DOOD. LNCS, vol.
          <volume>760</volume>
          , pp.
          <volume>204</volume>
          {
          <fpage>221</fpage>
          . Springer (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>