Monotonic Aggregation for Temporal Datalog Luigi Bellomarini1 , Markus Nissl2 , and Emanuel Sallinger2,3 1 Banca d’Italia, Italy 2 TU Wien, Austria 3 University of Oxford, UK Abstract. Understanding time-based effects has become an important aspect for the analysis of Knowledge Graphs (KGs). We have seen this in different areas such as IoT or economics. Scalable solutions for using Datalog-based KGs with time are in their infancy and the usage together with aggregation has not been considered so far. Yet, one needs both aggregation and time-based analysis when analysing KGs such as those of economic phenomena. In this paper, we analyze monotonic aggregation over DatalogMTL, establishing the first work that covers full recursion like in Datalog, aggregation, and temporal reasoning. Keywords: Knowledge Graphs · Datalog · Temporal Reasoning. 1 Introduction Understanding time-based effects has become a central aspect of reasoning on Knowledge Graphs (KGs), particularly in specific but prominent application set- tings and business domains. They include: IoT, where context awareness requires aggregating temporal data from continuous (and heterogeneous) sources [4]; declarative business process management, where activities and tasks need care- ful scheduling and prioritization [21]; state-of-the-art security information and event management systems (SIEM) with time-based alert and log events [22]. More traditionally, a standard application for time-based reasoning lies in the analysis of economic phenomena, where time is a natural dimension of analy- sis (e.g., for time series). It is our experience that reasoning on KGs is being increasingly applied to economic settings, often characterized by a complex net- work of intertwined entities. It has been shown that a Datalog-based KG with monotonic aggregation (i.e., aggregation that is only increasing/decreasing with new values) provides sufficient expressive power and scalability in many appli- cations of the financial realm, including company ownership, fraud detection or prevention of potential takeovers [3], from which a number of paradigmatic use cases of temporal reasoning and aggregations emerge: – UC1: Revenue Calculation. Shareholder are interested in the revenue of a company per week/month/year/over the complete lifespan. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). – UC2: Number of Trades. Financial analysts are interested in the number of trades in the last hour/day. – UC3: Company Ownership/Shares. Detecting hidden company owner- ships is important to study and, if the case, prevent company takeovers. Having this information not only for a single snapshot provides deeper in- sights on the takeover determinants and improves accuracy in prediction. – UC4: Change of Control. Analysts wish to analyse ownership structures over time, e.g., monotonically increasing or decreasing shareholding. These four use cases highlight different aggregation scenarios. While UC1 re- quires fixed intervals (which may vary in size, e.g., a month has 28 to 31 days), UC2 has a moving window of fixed size (e.g., an hour). UC3 requires to aggregate temporal information recursively over structures (e.g., over paths of arbitrary length). We call the kind of aggregation used in the previous use cases time- point aggregation as aggregation is applied on single time points, or time points in an interval. Differently from the previous cases, UC4 requires aggregating po- tentially along the entire time axis and is not bounded by any pre-determined interval; thus, we talk about time-axis aggregation. In order to support such use cases in Datalog-based reasoning, a temporal extension of Datalog for aggregation over time is required. MTL (Metric Temporal Logic) for Datalog (short DatalogMTL) [6] has been proposed as a suitable extension to reason over the temporal domain. DatalogMTL extends Datalog with rules such as ⊟[x,y] A → B or x[x,y] A → B, with x, y being positive rational numbers, where B holds at time t if A holds at each (resp. some) time in the interval [t − y, t − x]. To support the mentioned use cases, temporal reasoning must be enriched with the possibility to compute aggregations over time. Monotonic aggregation has received a great deal of attention in the literature [13, 16] and has been shown to be a suitable extension of Datalog for aggregation. For example, Shkapsky et al. [16] introduced rules of the form A(g, x) → B(g, msum(x)), to express the monotonic sum of x values in A grouped by g. However, to the best of our knowledge, monotonic aggregation (or any other kind of temporal aggregation) in a temporal extension of Datalog has not been discussed so far. Challenges. Monotonic aggregation over the temporal domain raises a num- ber of technical challenges. Monotonic aggregation as defined for non-temporal reasoning is inefficient in memory and runtime performance over the temporal domain. In addition, forms of temporal-specific aggregation along the time axis are required, for example to retrieve the intervals where a value is monotonically increasing. Such operations should be non-blocking (i.e., return intermediate results without waiting for the entire aggregation to be complete) to sustain reactivity of the system. Our contribution. In this paper, we investigate aggregation for temporal Dat- alog languages. Our main contributions are: – We provide the first reasoning language that covers at the same time (1) full recursion (i.e., it comprises Datalog); (2) aggregation capabilities, and (3) temporal reasoning (i.e., it comprises MTL as in DatalogMTL). – We provide a principled integration of state-of-the-art semantics of (1-3), overcoming critical semantic differences while sustaining efficient evaluation. – We propose tractable, non-blocking algorithms for calculating monotonic time-point aggregation in Datalog. To this end, we augment specific frag- ments of DatalogMTL with monotonic aggregation, by using advanced in- terval tracking and enabling several time-based optimisations. – We introduce specific, novel monotonic aggregation over the time axis with- out any pre-determined bounds, which requires special care as time is the aggregation dimension. In particular, we introduce, as an example, an oper- ator to analyse monotonic trends. – We provide real-world examples and show the practical relevance of our ex- tensions within settings of industrial relevance, such as those in the financial settings experienced in our work with the Central Bank of Italy. Organization. The remainder of this paper is organized as follows: In Section 2 we introduce some preliminaries for DatalogMTL and non-temporal monotonic aggregation. In Section 3 we discuss the requirements for temporal aggregation in Datalog. In Section 4 we present time-aware algorithms for time-point ag- gregation and discuss aggregation along the time axis in Section 5. We provide related work in Section 6 and conclude the paper in Section 7. We provide proofs and additional explanations in the appendix. 2 Preliminaries In this section, we introduce DatalogMTLFP under continuous semantics [17, 18] as well as non-temporal aggregation as defined by Shkapsky et al. [16]. FP stands for forward propagating and refers to the fragment that does not allow reasoning backwards in time, as this is not required for most applications and has significant negative impact on complexity [19]. Syntax of DatalogMTLFP . Let C and V be disjoint sets of constants, and variables, respectively. A term is either a constant or a variable. An atom is an expression of the form P (τ ), where P is a predicate of arity n ≥ 0 and τ is a n-tuple of terms. A rule is an expression of the form A1 ∧ ⋅ ⋅ ⋅ ∧ Ak → B for k ≥ 0 where all Ai and B are literals that follow the grammar: A ∶= P (τ ) ∣ ⊟σ A ∣ xσ A B ∶= P (τ ) The conjunction of Ai is the rule body, whereas B is the rule head. Instead of ∧ we sometimes use “,” to denote conjunction. A predicate p ∈ P is intensional (IDB), if p occurs in the head of the rule whose body is not empty, otherwise p is extensional (EDB). A rule is ground if it contains no variables, and safe if each variable in the head is also contained in the body. A program Π is a finite set of safe rules and is in normal form if it contains only rules of the form: P1 (τ 1 ) ∧ ⋅ ⋅ ⋅ ∧ Pn (τ n ) →P0 (τ 0 ) (n ≥ 0) (1) xσ P1 (τ 1 ) →P0 (τ 0 ) (2) ⊟σ P1 (τ 1 ) →P0 (τ 0 ) (3) An interval σ = ⟨x, y⟩ is defined over the domain Q≥0 ∪ {+∞} where ⟨x, y⟩ = {t ∈ Q≥0 ∣ x ≤ t ≤ y and x, y ≥ 0 and x ≠ t if ⟨ is ‘(’, and y ≠ t if ⟩ is ‘)’}. We use ⟨, (resp. ⟩) depending on the context for unspecified or value-specific, (, resp. ) for open and [, resp. ] for closed intervals. An interval σ is bounded if x, y ∈ Q≥0 , punctual if it is of the form [t, t], i.e., has a length of 0, and is denoted by t. The left endpoint is denoted by σ − and the right endpoint by σ + , where σ + = ∞ if the interval is unbounded to the right. The length is defined as ∣σ∣ = ∞ if σ + = ∞, otherwise ∣σ∣ = σ + − σ − . For two non-empty intervals σ1 and σ2 , we define the union σ1 ∪ σ2 and intersection σ1 ∩ σ2 as usual. Note that, a union operation creates one or two intervals, whereas an intersection of two intervals is either the empty set or exactly one interval. We define σ1 +σ2 as ⟨σ1− +σ2− , σ1+ +σ2+ ⟩, where ⟨ (resp. ⟩) is closed if σ1 and σ2 are left-closed (resp. right-closed), as well as σ1 ⊕ σ2 as ⟨σ1− + σ2+ , σ1+ + σ2− ⟩, where ⟨ (resp. ⟩) is closed if σ1 is left-closed and σ2 is right-open (resp. if σ1 is right-closed and σ2 is left-open). We define an interval as an empty set if σ − > σ + . A fact is of the form α@t, where α is a ground atom and t a punctual interval. A dataset D is a set of facts. Semantics of DatalogMTLFP . Let M be an interpretation based on a domain ∆ ≠ ∅ for the variables and constants that specifies, for each ground atom α and each time point t ∈ Q≥0 , whether α is true at t. In this case, we write M, t ⊧ α. Let ν be an assignment of elements of ∆ to terms such that ν(c) = c for each constant c. We then define inductively: M, t ⊧ν P (τ ) if M, t ⊧ P (ν(τ )) M, t ⊧ν ⊟σ A if M, s ⊧ν A for all s with t − s ∈ σ M, t ⊧ν xσ A if M, s ⊧ν A for some s with t − s ∈ σ We say that M satisfies a rule under an assignment ν if M, t ⊧ν B, whenever M, t ⊧ν Ai , for each 1 ≤ i ≤ k, and satisfies a program if it satisfies all its rules. The interpretation M is a model of Π and D if M satisfies Π for all possible assignments (i.e., model of Π), and if M, t ⊧ν α, for all facts a@t ∈ D (i.e., model of D). Finally, a program Π and a dataset D entail a fact α@t, written (Π, D) ⊧ α@t, if M, t ⊧ α for each model M of Π and D. In the following, we provide as example the rules for detecting spamming traders to illustrate the usage of DatalogMTL. We define spamming traders as trader that send at least every 5 seconds a trading signal for a duration of one hour (assuming seconds as granularity), where X is the trader identifier. ⊟[0,3600) x[0,5) tradingSignal (X) → spammingTrader (X) Monotonic Aggregation. Monotonic aggregation so far has only been intro- duced for non-temporal Datalog. Here we summarize the aggregation operators mcount, msum, mmax, mmin, as introduced by Shkapsky et al. [16]. The authors define head atoms of the form P (k1 , . . . , km , aggr ⟨x, cX ⟩), where ki are zero or more group-by terms and aggr is one of the four mentioned aggregation opera- tions, where x is the aggregate term and cX , defined only for sum and count, is a sequence of contributor terms for the aggregation. The semantics of aggregates are operationally defined as the application of a mapping function from an input set or multiset G that represents the values for a single group-by key to an output set D. Given G, for mmin (resp. mmax) each element g ∈ G is put into D, if g is smaller (greater) than the previously computed value. For mcount and msum, g is of the form (NJ , J), where NJ indicates the partial count/sum contributed by J, that is, NJ maps to x and J maps to cX . The aggregate functions now map the input set or multiset G to D by computing the sum over all maximum partial count NJ for all J and put the resulting value into D, in case the value is higher than the previously computed value. That is, the output set D consists of numbers (the (intermediary) results of the aggregation) that where inserted in a monotonically increasing order. In the following, we provide as example the rules for path-counting in a DAG [16] to illustrate the usage of monotonic aggregation. The first rule counts each edge between two nodes as one path. The second rule counts the number of distinct paths between two nodes X and Y through every Z. The last rule derives the maximum value for each pair of nodes. For details of the execution of the rules, we refer to Shkapsky et al. [16]. edge(X, Y ) → cpaths(X, Y, mcount⟨(1, X)⟩) edge(Z, Y ), cpaths(X, Z, C) → cpaths(X , Y , mcount⟨(C , Z )⟩) cpaths(X, Y, C) → countpaths(X , Y , max ⟨(C )⟩) 3 Requirements In this section, we analyze the requirements of temporal reasoning and aggrega- tion in Datalog. We first consider general requirements that are desirable in a declarative AI solution in this context and then instantiate them into specific requirements for temporal aggregation: – RQ1: Declarative. Managing temporal properties calls for complex interval checking logic and arithmetic. This easily leads to an error-prone and labor- intensive procedural approach. Declarative temporal operators that specify what should be done instead of how it should be done avoid such pitfalls. Example. A baseline implementation for UC2 would first extend a time-point in- terval to an hour-long interval, then split the intervals into independent “count- ing” intervals—at each start or end of some interval in the data the count changes, i.e., either a trade is added or removed from the data—and finally count the number of entries per interval. – RQ2: Implicit Time. While explicit time in rules provides the user the possibility to access and modify temporal properties, they require semantic restrictions for the chosen operations to block arbitrary rule behavior. Im- plicit time handling, such as in temporal logics or DatalogMTL, does not have such issues, allowing for a fully declarative, composable solution. Example. For example, if time is explicit part of tuples and manipulated using arithmetic, a user can add arbitrary arithmetic operations to the start and end points of a rule (P (S, E) → R(S + 5, E + 5) or P (S, E) → R(S ∗ S, E + 5)). – RQ3: Optimizability. Temporal data often has the property of staying the same for a longer interval. Yet, many temporal operations are defined by time point. Declarative operators and implicit time already lead to a certain degree of optimizability, but any operators defined should take into account optimizable evaluation on intervals. We also need support for fundamental types of temporal aggregation, as e.g., required by the archetypal use cases discussed in the introduction. – RQ4: Windowed Temporal Aggregation. The ability to aggregate over a fixed time window, e.g., aggregate all values across the last hour. This should at least support aggregates min, max, count, sum. – RQ5: Fixed-interval Temporal Aggregation. The ability to aggregate over a fixed interval, e.g., aggregate all values from the month of April 2021. This should at least support aggregates min, max, count, sum. – RQ6: Time Axis Aggregation. The ability to aggregate over intervals of arbitrary length, e.g., finding periods of time where values are monotoni- cally increasing (a temporal trend in the data). This should at least support monotonic increases and decreases. Looking at our archetypal use cases, UC2 is possible in a system that meets RQ4, UC1 in RQ5, and UC4 in RQ6. UC3 is already possible using recursion provided by Datalog, as well as, e.g., a system meeting RQ4 (it, however, would be hard to meet in a system that does not support recursion). 4 Time Point Aggregation In this section, we take on the requirements for windowed and fixed-interval temporal aggregation we have laid out and introduce our core approach based on declarative operators with implicit representation of time. In particular, we focus on windowed and fixed-interval aggregation and show efficient algorithms extending the application of the standard non-temporal aggregations [16] to the temporal context. Time axis aggregation (RQ6) is dealt with in the next section. 4.1 Moving Windows The first type of aggregation we discuss are moving window aggregations (e.g., covering the last hour; RQ4). This form of aggregation aggregates per time point t all the facts that have been valid between t−w and t, where w is some arbitrary window size. For this, we build upon DatalogMTLFP . Our basic approach is using the x operator to extend the time validity of facts to size w and then applying the non-temporal aggregation operation [16] for each time point. Syntax. Let us start by defining the syntax of DatalogMTLFP with time point aggregation. For this, we extend the DatalogMTLFP normal form (1-3; cf. pre- liminaries) with additional time point aggregation rules of the following form: x = aggr(P1 (τ 0 , τ 1 , a)) →P0 (τ 0 , x) (4) where aggr is the aggregation type (e.g., count, sum), P1 and P0 are predicates, τ 0 are the group-by attributes, τ 1 are the contributor terms for sum and count, and a the aggregate term (for count, where a is not part of the predicate, it should be assumed as 1 in the following algorithm). Note that P1 has arity of size ∣τ 0 ∣ + ∣τ 1 ∣, in case aggr is of type count, else ∣τ 0 ∣ + ∣τ 1 ∣ + 1 and P2 has arity ∣τ 0 ∣ + 1. Semantics. Let us continue by defining the semantics of the newly introduced rules. For this, we define the semantics on top of regular aggregation in Data- log [16] and apply it per time point: M, t ⊧ν x = aggr (P (τ0 , τ1 , a)) if Π = {P (τ0 , τ1 , a) → AggrResult(τ0 , aggr⟨a, τ1 ⟩)} and S = {P (τ ) ∣ M, t ⊧ν P (τ )} and R = Eval (S, Π) and ν(x) ∈ {u ∣ AggrResult(τ0 , u) ∈ R} where τ stands for the sequence of terms (τ0 , τ1 , a), AggrResult is a predicate storing the aggregation value, and Eval returns the sets of facts resulting from the evaluation of Π on S, using the regular aggregation in Datalog. Example. Let us show the benefits of using native temporal operators by ex- pressing UC2 in DatalogMTL extended by monotonic aggregation. Rule 1 ex- tends the interval of Trade facts to one hour (assuming a time granularity of seconds) and Rule 2 applies the count operation, using the trader account u as the group-by term and a unique identifier id as the contributing term, to derive the number of trades for each time point. x[0,3600) Trade(u, id) → TradeInterval(u, id) (1) m = mcount(TradeInterval(u, id)) → NumberOfTrades(u, m) (2) Algorithm 1: Calculation of time point aggregation Input: an aggregation type T , number of group-by terms g, and the aggregation predicate A, a set of Facts Output: A set of aggregated facts B 1 aggrResult := ∅; 2 cStorage := ∅; 3 foreach α@σ in Facts; matching predicate of A do 4 a, groupByKey, cKey = getData(α, A, T , g); 5 aggrVals = aggrResult[groupByKey]; 6 cVals = cStorage[groupByKey][cKey]; 7 if T = mmin then 8 UpdateList(aggrVals, a, σ − , σ + , (a, b) => min(a, b)); 9 else if T = mmax then 10 UpdateList(aggrVals, a, σ − , σ + , (a, b) => max(a, b)); 11 else 12 changes := UpdateList(cVals, a,σ − ,σ + ,(a, b) => max(a, b)); 13 foreach {e, σ2− , σ2+ } in changes do 14 UpdateList(aggrVals, e, σ2− , σ2+ , (a, b) => a + b); 15 cStorage[groupByKey][cKey] = mergeAdjacentInterval(cVals); 16 aggrResult[groupByKey] = mergeAdjacentInterval(aggrVals); 17 return aggrResult.values() This example immediately highlights the visual and usability advantages of us- ing temporal operators. Note that we have provided a formulation using non- temporal Datalog in the Appendix, underpinning the highly increased readability of this version. Algorithm. The application of the rules follows the standard chase procedure used with Datalog [12]. More specifically, we build upon the temporal work of Walega et al. [19] who apply derivation rules for each rule in normal form plus additional merging rules for adjacent intervals exhaustively. Details on this algorithm are provided in extended form in the Appendix. Our algorithm for time point aggregation extends existing temporal Datalog derivation rules as follows. Let x = aggr (A(τ0 , τ1 , a)) → B(τ0 , x) be an instance of a rule of form 4. We apply Algorithm 1 to derive a set B of facts in the form β@σ. The algorithm takes as input the type T of the aggregation (e.g., min, max, etc.), the aggregation predicate A, and a number of group-by terms g; it returns as output a set of facts B with arity g + 1, where the first g terms are the group-by terms followed by the aggregated value. Line 1 defines a map, whose key is the group-by clause and the value is an ordered set (per time-interval) of the current aggregation values, which, by construction of the algorithm, can contain only non-overlapping time intervals. Line 2 defines a similar map, stor- ing the intermediate results of specific contributor terms. Line 3 iterates over all currently derived facts filtered by the matching predicate name. Lines 4-6 extract the relevant properties of the fact and retrieve the current sets for the provided keys. In particular, the function getData maps the first g terms to the Algorithm 2: Update List for Temporal Aggregation 1 Function UpdateList(list, value, intStart, intEnd, aggr): 2 changes := emptyList; 3 if list.isEmpty() then 4 list.append({value, intStart,intEnd }); 5 changes.append({value, intStart,intEnd }); 6 return changes; 7 it := list.iterator(); 8 while list.hasNext() do 9 el := list.next(); 10 if intStart > el .intEnd then continue; 11 if intEnd < el .intStart then 12 it.prev(); 13 break 14 if intStart < el .intStart then 15 it.prepend({value, intStart, prec(el .intStart)}); 16 changes.append({value, intStart, prec(el .intStart)}); 17 intStart = el .intStart 18 if intStart > el .intStart then 19 it.prepend({el .value, el .intStart, prec(intStart)}); 20 el .intStart = intStart; 21 if intEnd < el .intEnd then 22 it.append({el .value, succ(intEnd ), el .intEnd }); 23 el .intEnd = intEnd ; 24 newValue := aggr(el .value, value); 25 changes.append({newValue - el .value, el .intStart, el .intEnd }); 26 el .value = newValue; 27 intStart = succ(el .intEnd ); 28 if intStart ≤ intEnd then 29 it.append({value, intStart, intEnd }); 30 changes.append({value, intStart, intEnd }); 31 return changes groupByKey, the last term to a and the remaining terms to the cKey (contribu- tor Key). Then the algorithm branches depending on the aggregation type. For instance, we continue with Line 7 for monotonic minimum, Line 9 for monotonic maximum, or with Line 11. In case of min or max (Line 7-10) we can directly update the final result whereas for count and sum, we first calculate the highest value per contributor (Line 11). Since such value may increase over time, in or- der to avoid full recomputation, we just consider the difference with respect to the previous contributor value for a certain interval (Lines 13-14). At the end (Lines 14-16) we iterate over the lists and call mergeAdjacentInterval, which as the name says merge non-overlapping adjacent intervals with the same value to reduce the list size and write the resulting intervals back to the storage. Line 17 returns all output facts of the algorithm (that is, it removes the required grouping for the calculation). The UpdateList function is detailed in Algorithm 2. It iterates over a list of intervals and updates it with the new values. In case the interval list is empty, it just adds the interval (Lines 3-6), otherwise it searches for the first interval starting after the interval to be inserted (Line 10). It then checks whether there is some interval to be inserted before the current interval (Lines 14-17), modifies the boundaries of the current entry in case the interval starts or ends in this entry (Lines 18-23), and updates the value of the current entry (Lines 24-27). Having reached the end of the list or an interval that is after the insertion range (Lines 10-12), it adds the remaining interval to the list (Lines 28-30). Note that storing the aggregation result enables an efficient incremental ap- proach in the algorithm, so that in further applications of the derivation rules, only partial “delta-updates” are computed. For this reason, we skip the initial- ization of the global variables and keep the current values. In addition, in Line 3 we do not check over all facts, but only over the newly derived ones. We use the function symbols prec and succ to reference the preceding (resp. succeeding) interval points and use them to harmonise the interval endpoints. Theorem 1. Algorithm 1 has worst-case runtime O(n2 ) and worst-case mem- ory consumption O(n), where n is the number of facts contributing to the aggre- gation. The output has maximum size O(n). Proof Sketch. We now show the basic idea behind the complexity of the algo- rithm. The interested reader can find the full proof in the Appendix. The O(n) space requirement depends only on UpdateList. In particular, we argue that at most 2n intervals are created in the process of adding n intervals to the list. This follows from a rather technical argument, but can be intuitively grasped as follows: Let σ denote an inserted interval. Our interval σ can intersect at its left endpoint (σ − ) one interval λ from the current list (and, importantly, at most one such interval, as maintaining non-overlapping intervals is central to the algorithm). In this case we need to split λ, creating a total of two new intervals: we replace λ by the interval λ− to σ − , create a new interval from σ − to λ+ , and another new interval one from λ+ to σ + . Multiple other variants are possible. The interesting part is when our newly inserted interval σ intersects more than one interval, in which, intuitively speaking it uses “budget” stemming from earlier inserts that necessarily have not used their full “budget” of two inserts. As mentioned before, the full, technical argument can be found in the Appendix. The runtime of O(n2 ) descends from the fact we iterate over all facts, and for each fact the UpdateList is performed, taking O(n) time. Since both changes and aggrVals are sorted lists, they can be scanned together within a one-pass iteration, hence keeping linear complexity (intuitively, like in the usual merging of sorted lists). We now show soundness and completeness of Algorithm 1, i.e., that every interval produced by the semantics is also derived by the algorithm, and the other way around, every interval derived by the algorithm is also produced by the semantics. Note that this is in the context of the facts that are input of the algorithm. As discussed earlier we consider recursive derivation in the Appendix. Theorem 2. Algorithm 1 is sound and complete. Proof Sketch. We now show the basic idea behind the soundness and complete- ness of the algorithm. The interested reader can find the full proof in the Ap- pendix. Soundness requires to retrieve the best (e.g., min, max, highest) value for derived intervals. This is trivially achieved by Line 24 of UpdateList. Complete- ness requires that all intervals are derived. Since possible overlapping intervals at the start and end are split (Line 18-23), missing intervals before an entry are inserted (Line 14-17) and missing intervals after the last entry are added (Line 28-30), all possible intervals are created. 4.2 Fixed Intervals The next type of aggregation we discuss are fixed interval aggregations (e.g., per month). In comparison to moving windows, we cannot rely on temporal operators provided by DatalogMTL due to different lengths, e.g., months vary between 28 and 31 days. Therefore, we introduce an additional operator that extends intervals to their complete unit of interest, e.g., an interval from the 15th of March to 17th of April to the 1st of March to the 30th of April. Such kinds of operators have been successfully applied in the context of temporal databases (cf., Bettini et al. [5]). Syntax. Let us start by formally defining the new operator. For this, we extend the DatalogMTLFP normal form with an additional rule of the following form: △unit P1 (τ 1 ) →P0 (τ 0 ) (5) where △ is the new time extension operator and unit is a time unit, e.g., day, month, or year. Semantics. The semantics of the operator △ is defined as follows: M, t ⊧ν △unit A if M, s ⊧ν A for some s with conv(t, unit) = conv(s, unit) where conv converts the time point to the provided time unit. For example, the date 31.12.2020 with unit year, would be converted to 2020. Example. Let us demonstrate the fixed interval aggregation by expressing UC1. Rule 1 extends the interval of the sales to its corresponding year and Rule 2 applies the sum operation, using the id (and price) as contributing terms, to derive the revenue of the year. △year Sale(id, price) → YearSale(id, price) (1) m = msum(YearSale(id, price)) → Revenue(m) (2) Derivation Rule. In order to handle such rules in the chase procedure, we use the following derivation rule: If △u α → β is a ground instance of a rule, and α@σ1 ∈ F , then add β@σ to F where σ = [a, b), where a = conv(conv(σ1− , u), u2), b = conv(conv(σ1+ , u) + 1 u), u2) and u2 is the initial unit of σ1 , i.e., the time unit gets converted back to the precision of the predicate, e.g. 2021-02-16 with a precision of month gets after the inner conversion 2021-02 and after applying the outer conversion 2021-02-01. Limitations. We briefly want to discuss the limitations of this approach. The main goal of using the extension of intervals is keeping compatibility with Data- logMTL and its interval semantics by calculating the aggregates over the inter- val boundaries asked in the query. However, using such an approach limits the possibility to answer queries that require changing time granularity, like in the following example where we wish to compare the revenue on a weekday basis. For such requirements, we suggest the introduction of unwrapping operations that map the timestamp to an appropriate representation in usual Datalog, e.g., by introducing rules of the form Sale(id, price)@t → SaleEscaped (id, price, wee- kday(t)). A detailed discussion of such rules is out of scope for this paper. 5 Time Axis Aggregation So far, we have focused on aggregations which work per time point. In this sec- tion, we now move to aggregations along the time axis. As introduced in UC4 and RQ6, this means we need to summarize values changing over time consider- ing adjacent intervals. One such function, which we study in detail in this paper, is the one detecting whether a trend is monotonically increasing/decreasing over time. Let us call it minc resp. mdec. This is effective in many domains, e.g., that of change of control we have introduced, but also for example population counts. We first start with an example and consider the desired formulation. Example. Assume that we want to detect changes of control described in UC4. Then, we are interested in finding the time intervals in which the number of shares has been monotonically increasing as well as the minimum and maximum values in those intervals. Assume now that the atom Shares(p, c, s) represents the number of shares s that an investor p owns of a company c. Rule 1 shows the minc operator in action. It takes as argument the number of shares, groups them by investor and company and returns as output the lower bound (i.e., the leftmost value) and the upper bound (i.e., the rightmost value) per monotonically increasing interval. ⟨l, u⟩ = minc(Shares(p, c, s)) → SInc(p, c, l, u) (1) Similar to the previous section, we (i) provide a normal form (syntax) for time axis aggregations, (ii) specify the semantics of each time axis aggregation, (iii) provide a derivation rule, and (iv) suggest an algorithm for time axis operation. Syntax. We start by providing a normal form (actually already used in Rule (1), generalizing the normal form introduced for time point aggregation: ⟨x⟩ = aggr(P1 (τ 0 , τ 1 , a)) → P0 (τ0 , x) (6) Functor aggr is the name of the time axis aggregation, in our case minc or mdec, and P1 and P0 are predicates. Like in time point aggregation, τ0 defines the group-by clause, τ1 the contribution terms, not directly used in this form of aggregation but kept for uniformity reasons, and a is the aggregation term of P1 . Over time, there can be multiple values of interest, for example the start and end value of a monotonically increasing interval. Hence, the function returns a vector x = x1 , . . . , xn of aggregation values instead of a single value. For minc and mdec we use x = l, u to denote the lower and upper bound of the monotonic interval. Semantics. We assume that the domain of temporal aggregation is that of dis- joint intervals, and so, for a single group-by key, no ambiguity arises w.r.t. the aggregation term. Also, this makes time axis aggregation fully orthogonal to time point aggregation, where, instead, intervals are combined. Also, time point aggregation can be effectively used to disambiguate values per time interval (e.g., by considering their maximum/minimum resp. the summation) before proceed- ing with time axis aggregation. Another effect of such disjoint intervals is that the semantics of time axis aggregation can be easily formulated by moving from time points to time intervals, i.e., M, σ ⊧ν φ if for all t ∈ σ it holds that M, t ⊧ν φ. M, σ ⊧ν ⟨l, u⟩ = minc(P1 (τ 0 , a)) if M, σ ⊧ν M (P1 , τ0 , l, u) M, σ ⊧ν M (P1 , τ0 , a, a) if M, σ ⊧ν P1 (τ0 , a) M, σ ⊧ν M (P1 , τ0 , l, u) if M, σ1 ⊧ν M (P1 , τ0 , l, u1 ) and M, σ2 ⊧ν M (P1 , τ0 , l2 , u) and u1 ≤ l2 and σ1+ ≺ σ2− and σ = σ1 ∪ σ2 where ≺ is the predecessor relation (i.e., the intervals are adjacent), M a fresh predicate for deriving minc. In short, the second definition states that a value constant in an interval is both lower and upper bound, and the third one merges two intervals if their lower and upper bounds match. The semantics for mdec is analogous, with u1 ≤ l2 changed to u1 ≥ l2 . Derivation rule. Let us now introduce our derivation mechanism. For each instance ⟨x⟩ = aggr (A(τ0 , τ1 , a)) → B(τ0 , x) of a rule of form 6, where aggr is minc or mdec and τ1 is the empty set, apply Algorithm 3 to derive a set B of facts of the form β@σ, where β contains the group-by values and the aggregation values l and u. Algorithm. Like for time point aggregation, we provide an efficient algorithm for minc and mdec which uses the benefits of native temporal operators. Algo- rithm 3 takes as input the aggregation predicate A and a number of group-by indices g; it returns as output a set of facts B. Line 1 defines an empty map for the result, where the keys are the group-by terms and the values are B-trees with the intervals as key of the entries. Then for each fact, we add the fact to the appropriate group-by clause (Lines 3-5). We then merge the inserted fact (Line 7-12) with their adjacent intervals, if they exist, so that we derive the largest possible, monotonically increasing interval. For deriving monotonically decreas- ing intervals, the comparison operator in Lines 7 and 11 has to be changed from Algorithm 3: Calculation of monotonic monotonically increasing in- tervals Input: number of group-by terms g in α, and the aggregation predicate A, a set of Facts Output: A set of aggregated facts B 1 B := ∅; 2 foreach α@σ in Facts; matching predicate of A do 3 a, groupByKey := getData(α,A,g); 4 aggrGroup := X [groupByKey]; 5 nNode := aggrGroup.insert((a, a)#σ); 6 lNode := nNode.previous; 7 if σ − = lNode.σ + ∧ lNode.max ≤ a then 8 Remove lNode, nNode from aggrGroup; 9 nNode := Insert (lNode.min, nNode.max )#⟨lNode.σ − , σ + ⟩ to aggrGroup; 10 rNode := nNode.next; 11 if σ + = rNode.σ − ∧ a ≤ rNode.min then 12 Apply merging for rNode similar to lN ode; 13 return X.values() ≤ to ≥. Line 13 returns all output facts of the algorithm (that is, it removes the required grouping for the calculation). Note that, similarly to Algorithm 1, delta-updates can be applied by skipping Line 1 of the algorithm. Theorem 3. Algorithm 3 has runtime O(nlog(n)) in the worst case and worst- case memory consumption O(n), where n is the number of facts contributing to the aggregation. The output has maximum size O(n). Proof Sketch. We show here the basic idea behind the complexity of the algorithm (details in Appendix). The space requirement of O(n) depends on the fact that at most n entries are inserted in the tree. The runtime of O(nlog(n)) is based that we iterate over all facts (O(n)), and for each fact we insert to and remove entries from the tree, which has a complexity of O(log(n)). Theorem 4. Algorithm 3 is sound and complete. Proof Sketch. We show the basic idea behind the soundness and completeness regarding the maximum derived intervals (details in Appendix). Completeness follows immediately, since all intervals are inserted in the tree. For soundness we have to show that maximum intervals are derived with the correct lower and upper value. This follows by the two merging operations of possible adjacent intervals, where adjacent intervals are replaced with a new interval and the lowest value is chosen from the left and the highest value is chosen from the right interval. 6 Related Work First temporal extensions to Datalog were suggested already in the 1980s. Most approaches can be grouped into two types, one focusing on implementing tem- poral constructs via arithmetic operations, e.g., by applying the +1 function to model different discrete temporal units (Datalog1S [7, 14, 23]), the other includ- ing operators from temporal logic such as the always and eventually operator from LTL or CTL into Datalog (Templog [1], DatalogLite [9]). Newest develop- ments are based on MTL [6, 17, 11], an extension of LTL to enrich the expressive power of Datalog programs. While most of the work for DatalogMTL so far purely consisted in complexity results, we find a proposal for a non-recursive fragment [6], as well as algorithm for stream reasoning by Walega et al. [19]. The latter work clearly differs from the one we have presented in this paper, in that our focus is on monotonic aggregation over the temporal domain. Similarly, aggregation in Datalog has been discussed for many years. The standard Datalog fixpoint semantics only works as long as rules define monotonic transformations (w.r.t. set containment). Aggregation breaks this requirement. Earlier solutions [15, 24] use non-deterministic choice constructs, partial-orders that are more powerful than set containment, or an infinite level of stratifica- tion. These approaches were of limited generality and also required to resort to sophisticated compilers to detect monotonic programs [13]. DatalogFS [13] is the first approach that uses continuous aggregate functions to support monotonic counts, sums over positive values, and extrema aggregates (min,max). Shkapsky et al.[16] provided first practical algorithms for monotonic aggregation by intro- ducing contributor and group-by terms, which we used as foundations for our approach. While their work focused on optimizing aggregation for a single time point, our approach aims at building monotonic aggregates for all time points efficiently, which requires specific handling of fact validity intervals, and dealing with time axis aggregation. Recent work studied the definition of min and max over limit Datalog [10], but only considers restricted use of sum and count by means of a sorted list of facts. Finally, Wang et al.[20] studied techniques to convert non-monotonic aggregates to monotonic ones. Apart from Datalog-specific related work, LARS [2] is a temporal stream rea- soning framework focusing on finite streams by extending propositional logic and can be seen as an extension for ASP. It supports the usage of a window operator that returns a sub-stream containing only n time points. In comparison with DatalogMTL it offers a model-centric perspecitve (instead of a query-centric), and only considers finite data. With a recent extension called weighted LARS [8], a formula can be evaluated as an algebraic expression over a semi-ring, allowing to compute all aggregates bounded by the semi-ring along the time-axis, e.g., they support to count the number of time points at which an expression holds. In detail, the diamond-operator interprets the formula (i.e., all values of the sub-stream) by using the sum-operator of the semi-ring and the box-operator by using the product-operator of the semi-ring, while true values are mapped to one, false values are mapped to zero, and is mapped to times and or is mapped to plus. In comparison, we focus on combining two fundamental directions of Datalog studies, namely DatalogMTL and monotonic aggregation and discuss efficient aggregation algorithms for DatalogMTL. 7 Conclusion In this paper, we presented a solution for using monotonic aggregation over temporal Datalog fragments. This paper is, to the best of our knowledge, the first work to include (1) full recursion as in Datalog, (2) aggregation, and (3) temporal reasoning - a combination required by important use cases as we have shown. We showed that native temporal operators allow to apply specific opti- mization techniques to reduce performance overhead and memory consumption as well as increase usability. Our suggested algorithms can be executed in a non- blocking fashion (i.e., there is no need to wait for all the aggregation contributes to be available). In future work, we aim at providing a choice of experimental results showing the benefits of using native temporal operators for monotonic aggregation. Acknowledgements The financial support by the Vienna Science and Technology Fund (WWTF) grant VRG18-013 is gratefully acknowledged. References 1. Abadi, M., Manna, Z.: Temporal logic programming. J. Symb. Comput. 8(3), 277– 295 (1989) 2. Beck, H., Dao-Tran, M., Eiter, T., Fink, M.: LARS: A logic-based framework for analyzing reasoning over streams. In: AAAI. pp. 1431–1438. AAAI Press (2015) 3. Bellomarini, L., Magnanimi, D., Nissl, M., Sallinger, E.: Neither in the programs nor in the data: Mining the hidden financial knowledge with knowledge graphs and reasoning. In: MIDAS@PKDD/ECML. Lecture Notes in Computer Science, vol. 12591, pp. 119–134. Springer (2020) 4. Benson, L., Grulich, P.M., Zeuch, S., Markl, V., Rabl, T.: Disco: Efficient dis- tributed window aggregation. In: EDBT. pp. 423–426. OpenProceedings.org (2020) 5. Bettini, C., Jajodia, S., Wang, X.S.: Time granularities in databases, data mining, and temporal reasoning. Springer (2000) 6. Brandt, S., Kalayci, E.G., Kontchakov, R., Ryzhikov, V., Xiao, G., Zakharyaschev, M.: Ontology-based data access with a Horn fragment of metric temporal logic. In: AAAI. pp. 1070–1076. AAAI Press (2017) 7. Chomicki, J.: Polynomial time query processing in temporal deductive databases. In: PODS. pp. 379–391. ACM Press (1990) 8. Eiter, T., Kiesel, R.: Weighted LARS for quantitative stream reasoning. In: ECAI. Frontiers in Artificial Intelligence and Applications, vol. 325, pp. 729–736. IOS Press (2020) 9. Gottlob, G., Grädel, E., Veith, H.: Datalog LITE: a deductive query language with linear time model checking. ACM Trans. Comput. Log. 3(1), 42–79 (2002) 10. Kaminski, M., Grau, B.C., Kostylev, E.V., Horrocks, I.: Complexity and expressive power of disjunction and negation in limit datalog. In: AAAI. pp. 2862–2869 (2020) 11. Kikot, S., Ryzhikov, V., Walega, P.A., Zakharyaschev, M.: On the data complexity of ontology-mediated queries with MTL operators over timed words. In: Descrip- tion Logics. CEUR Workshop Proceedings, vol. 2211. CEUR-WS.org (2018) 12. Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans. Database Syst. 4(4), 455–469 (1979) 13. Mazuran, M., Serra, E., Zaniolo, C.: Extending the power of datalog recursion. VLDB J. 22(4), 471–493 (2013) 14. Ronca, A., Kaminski, M., Grau, B.C., Motik, B., Horrocks, I.: Stream reasoning in temporal datalog. In: AAAI. pp. 1941–1948. AAAI Press (2018) 15. Ross, K.A., Sagiv, Y.: Monotonic aggregation in deductive databases. In: PODS. pp. 114–126. ACM Press (1992) 16. Shkapsky, A., Yang, M., Zaniolo, C.: Optimizing recursive queries with monotonic aggregates in deals. In: ICDE. pp. 867–878. IEEE Computer Society (2015) 17. Walega, P.A., Grau, B.C., Kaminski, M., Kostylev, E.V.: Datalogmtl: Computa- tional complexity and expressive power. In: IJCAI. pp. 1886–1892. ijcai.org (2019) 18. Walega, P.A., Grau, B.C., Kaminski, M., Kostylev, E.V.: Tractable fragments of datalog with metric temporal operators. In: IJCAI. pp. 1919–1925. ijcai.org (2020) 19. Walega, P.A., Kaminski, M., Grau, B.C.: Reasoning over streaming data in metric temporal datalog. In: AAAI. pp. 3092–3099. AAAI Press (2019) 20. Wang, Q., Zhang, Y., Wang, H., Geng, L., Lee, R., Zhang, X., Yu, G.: Automating incremental and asynchronous evaluation for recursive aggregate data processing. In: SIGMOD Conference. pp. 2439–2454. ACM (2020) 21. Xu, J., Liu, C., Zhao, X., Yongchareon, S., Ding, Z.: Resource management for business process scheduling in the presence of availability constraints. ACM Trans. Manag. Inf. Syst. 7(3), 9:1–9:26 (2016) 22. Yen, T., Oprea, A., Onarlioglu, K., Leetham, T., Robertson, W.K., Juels, A., Kirda, E.: Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: ACSAC. pp. 199–208. ACM (2013) 23. Zaniolo, C.: Logical foundations of continuous query languages for data streams. In: Datalog. Lecture Notes in Computer Science, vol. 7494. Springer (2012) 24. Zaniolo, C., Arni, N., Ong, K.: Negation and aggregates in recursive rules: the LDL++ approach. In: DOOD. LNCS, vol. 760, pp. 204–221. Springer (1993) A Overview of Appendix The appendix is structured as follows: In Section B we provide an extended Dat- alogMTL reasoning algorithm, in Section C we provide additional background information to Section 4, and in Section D we explain details for Section 5. B DatalogMTL Reasoning Algorithm In this section we present a modified version of the DatalogMTL reasoning algo- rithm presented in [19]. This version modifies and extends the derivation rules with the newly introduced rules and establishes new output possibilities. Criti- cally, existing approaches are specific to streaming data and only allow to output derived facts for pre-chosen time points, requiring to decide before knowing the reasoning result which time points contain relevant information. Algorithm 4: A reasoning algorithm for monotonic aggregation over time by using DatalogMTLFP Input: A program Π, a dataset S, a list of output predicates Q, a set of output intervals X (optional, if not present X = {D}), a bounded reasoning interval domain D Output: All valid predicates, given in Q, with its intervals in D and X 1 F := S; 2 do 3 Apply rules from Table 1 exhaustively; 4 F := F ∩ D, i.e., map each P (τ )@σf ∈ F to P (τ )@σ, s.t., σ = σf ∩ D; 5 while F has changed ; 6 foreach P (τ ) in F do 7 if P in Q then 8 P ′ (τ ) ← P (τ ) ∩ X, i.e., map each σx ∈ X and P (τ )@σf to P ′ (τ )@σ s.t. σ = σx ∩ σf is not empty; 9 output P ′ (τ ); Algorithm 4 takes as input a program Π in normal form, a dataset S, a list of output predicates Q, optionally a set of output intervals X, a bounded reasoning interval domain D and returns as output a maximal set of facts (i.e., due to the union rule all intervals are merged, if possible). Only facts valid in the reasoning domain interval D are returned, and, if X is given, in particular, only those valid within non-empty intersection with some interval σx ∈ X. Defining X as a single punctual interval allows to output all valid facts at a certain time point. Line 1 initializes the derived facts with the provided dataset. Lines 2-5 apply the rules of Table 1 to derive new facts and restrict those facts to the chosen interval domain. – The Horn rule adds all heads, where the intervals of the bounded body predicates overlap. (H) If ∧n i=1 αi → β a ground instance of a rule and for each i αi @σi ∈ F , then add β@σ to F with σ = ∩n i=1 σi (D) If xσ1 α → β a ground instance of a rule, and α2 @σ2 ∈ F , then add β@σ to F where σ = σ1 + σ2 . (B) If ⊟σ1 α → β a ground instance of a rule and α2 @σ2 ∈ F , then add β@σ to F where σ = σ1 ⊕ σ2 . (T) If △unit α → β a ground instance of a rule, and α@σ1 ∈ F , then add β@σ to F where σ = [a, b), where a = conv(conv(σ1− , u), u2), b = conv(conv(σ1+ , u) + 1 u), u2), and u2 is the original unit of σ1 . (P) If x = aggr (A(τ0 , τ1 , a)) → B(τ0 , x) an instance of a rule, then apply Algo- rithm 1 to derive a set X of facts in the form β@σ, which are added to F . (A) If ⟨x⟩ = aggrT (A(τ0 , a)) → B(τ0 , x), an instance of a rule, then apply Algo- rithm 3 to derive a set X of facts of the form β@σ, which are added to F . (U) If α@σ1 , α@σ2 in F , and σ1 ∪ σ2 coincides with one single interval σ, then remove α@σ1 , α@σ2 from F and add α@σ to F . Table 1. Derivation Rules – The Diamond rule captures the semantics of the x operator. It expands (i.e., if σ1− < σ1+ ) the interval σ2 and shifts (i.e., if σ1− > 0) it into the future. – The Box rule captures the semantic of the ⊟ operator. It reduces (i.e., if σ1− < σ1+ ) the interval σ2 and shifts (i.e., if σ1− > 0) it into the future. – The Triangle rule captures the semantics of the △ operator. It expands the interval to the range of the chosen unit. – The time Point aggregation rule captures the semantics of time point ag- gregation. It applies the aggregation per time point over all facts where the time point is in the interval of the fact. – The time Axis aggregation rule captures the semantics of monotonic increas- ing and decreasing intervals. It applies the aggregation over the time axis and merges adjacent intervals following the given criteria. – The Union rule derives new intervals based on derived facts by merging overlapping intervals. In addition, it optimizes the storage size by removing the old, now merged, intervals. This rule includes the deletion of subsets. Lines 6-9 output the derived facts that are part of the list of predicates Q and intersect with the output intervals X. Due to the union rule, each interval of a fact in the output is not a subset of another interval of the same fact in the output. Note that this algorithm does not terminate, in case the time point aggregation derives no fixpoint, that is for example when a recursive sum is calculated, where additional facts are produced per application of the rule. This is a typical issue for monotonic aggregation, which is also experienced in [16]. C Appendix for Section 3 In this section we provide a non-temporal Datalog encoding of UC2 and the proofs of Theorem 1 and Theorem 2. Trade(u, id, t) → TradeInterval (u, id, t, t + 3600, 1, 0) (1) TradeInterval (u, id, x, y, xb , yb ) → IPoint(x, xb ), IPoint(y, yb ) (2) IPoint(t, c) → Int ′ (0, mmin(t), 1, c) (3a) Int ′ (x, y, 1, 0) → Int(x, y, 1, 1) (3b) Int ′ (x, y, 1, 1) → Int(x, y, 1, 0) (3c) ′ Int( , tp , , 1), IPoint(t, c), t > tp → Int (tp , mmin(t), 0, c) (4a) Int( , tp , , 0), IPoint(t, c), t > tp → Int ′ (tp , mmin(t), 1, c) (4b) Int ′ (x, y, 0, 0) → Int(x, y, 0, 1) (4c) Int ′ (x, y, 0, 1) → Int(x, y, 0, 0) (4d) Note that Rules 3b and 3c complete 4c and 4d Int(x, y, xc, yc), TradeInterval (u, id, st, et, sc, ec), (x < et), (y > st), → V (u, id, x, y, xc, yc) (5a) Int(x, y, xc, yc), TradeInterval (u, id, st, et, sc, ec), (x = et ∧ xc = 1 ∧ ec = 1), (y > st) → V (u, id, x, y, xc, yc) (5b) Int(x, y, xc, yc), TradeInterval (u, id, st, et, sc, ec), (x < et), (y = st ∧ yc = 1 ∧ sc = 1) → V (u, id, x, y, xc, yc) (5c) Int(x, y, xc, yc), TradeInterval (u, id, st, et, sc, ec), (x = et ∧ xc = 1 ∧ ec = 1), (y = st ∧ yc = 1 ∧ sc = 1), → V (u, id, x, y, xc, yc) (5d) V (u, id, x, y, xc, yc) → NumberOfTrades(u, mcount(id), x, y, xc, yc) (6) Fig. 1. UC2 in non-temporal Datalog UC2 in non-temporal Datalog. Figure 1 highlights the required rules for writing such an aggregation in non-temporal Datalog. In a first step, for all four aggregation types, we calculate all interval points at which the query answer may change. We provide the calculation of such points in Rules 1-4. Rule 1 extends the Trade facts to one-hours intervals TradeInterval (i.e., the desired time of the query) and defines that the intervals are left-closed and right-open (represented by 1 resp. 0). Rule 2 extracts all relevant interval points IPoint. Rules 3 and 4 create intervals Int between all extracted points and assign open and closed properties. Having established the relevant intervals, we continue in Rule 5 with detecting overlaps between TradeInterval and Int and assigning the values of TradeInterval to the overlapping intervals Int. In particular, this rule covers overlapping checks of all combinations of open and close intervals. Finally, Rule 6 calculates the number of trades per interval and user (so we are grouping per T radeInterval 1 3 3 3 1 N umberOf T rades 2 2 2 2 Fig. 2. Example for number of trades per hour user and time-interval). In order to support other aggregation types, we have to adapt Rule 6, where the aggregation has to be changed accordingly. Consider Figure 2 for an example of five Trade events already extended to one hour intervals. The five top-most intervals denote all TradeInterval predicates. Vertical, dashed lines show all IPoint. The intervals at the bottom of Figure 2 refer to the final result, that is, the NumberOfTrades predicates. Theorem 1 Algorithm 1 has worst-case runtime O(n2 ) and worst-case memory consumption O(n), where n is the number of facts contributing to the aggrega- tion. The output has maximum size O(n). Proof. We first show the space requirements. For this, we consider UpdateList, which is the only part where new data is added, where we show a space re- quirement of O(n). Let us start with the list which in total contains at most 2n entries. There are two different options: (1) the list is empty, then the fact is added (Line 3-6), which creates at most one entry for a single fact, or (2) the set is not empty, then the number of added facts depend on existing facts in the list. It is clear that Line 14-17 (resp. 18-20), only fire at most once, that is when the new interval to be inserted starts (resp. ends) inside an existing interval. The same applies when the interval starts or ends not in an existing interval, where Line 14-17 add the starting interval before the current entry and Line 28-30 append the ending interval after the current entry. This in total creates at most 2 new entries. It remains to show the additional firings of Line 14-17. If n entries exist in the set, there can be a maximum of n − 1 gaps. It is clear that a gap is only created when the inserted interval fits between two exist- ing intervals, requiring only 1 new entry. So, in worst case, Line 14-17 only fill up the remaining budget of 2n entries. Let us now consider changes. In worst case, when the new interval spans over the complete list, then each entry in the list requires one changes entry, that is in total at most 2n entries. In total, this yields O(2n+2n) = O(4n) = O(n) as the space bound. For runtime, we can build upon the space requirements. It is clearly visible that the runtime for mmin and mmax equals to the first part of msum and mcount, i.e., the update operation of line 11 and 9 are equal for mmax or similar for mmin. Hence, we only have to consider the else part inside the outer foreach loop. For each fact (O(n)), we update the contributor set (Line 13) and then apply updates to the aggregation value (Line 15-16). We start by showing the runtime of the update for the con- tributor set. It is clear that UpdateList iterates once over the list, which require O(2n) times in worst case. We now consider the update process and show that we will not exceed the runtime of the updating process. We have shown that the space bound of aggrValues as well as changes is O(2n). Since those sets are ordered, we can scan both sets together within a one-pass iteration by checking the boundaries of the following intervals. Hence, we do not exceed the runtime of O(2n). The same applies to the merging operations, which iterates over the list once. This concludes the argument that the algorithm takes O(2n) = O(n) time in worst case for a single added value. Now we consider that n facts contribute to the aggregation, hence the algorithm is executed n times and thus run in O(n2 ). The final flattening step loops over the aggrResult of size O(2n) and maps the result to the appropriate format of the output. Hence, the output size is also at most O(n). Theorem 2 Algorithm 1 is sound and complete. Proof. Let us start with the soundness of the algorithm, which is trivial. We have to show that the returned results are the best result per interval, where best mean the minimal one for mmin, the maximum one for mmax, the highest sum for msum or count for mcount. This means, for each interval we have in the output list, we detected the best value. This is given by the chosen aggr-function and Line 24 of UpdateList that calculates the best value for each interval. Let us now consider the completeness of the algorithm to show that all interval rules are considered. For this, we show that we do not miss any interval in the update process, when we consider a new fact. It is clear by Line 14-17 that an interval that does not exist before a list entry, is added and by Line 28-30 that an interval that is after the last list entry or together with Line 11-13 ends after an existing list entry is added. It remains to show that also existing intervals are modified correctly. This is handled by Lines 18-20 if the interval starts inside another interval and Lines 21-23 if the interval ends inside another interval. The remaining part of time point algorithm does not modify the intervals, except of the merging functions, which only combine existing intervals to maximum ones, having no impact on the covering of the intervals. Hence, the algorithm is sound and complete. D Appendix for Section 4 In this section, we show the proofs of Theorem 3 and Theorem 4. Theorem 3 Algorithm 3 has runtime O(nlog(n)) in the worst case and worst- case memory consumption O(n), where n is the number of facts contributing to the aggregation. The output has maximum size O(n). Proof. First, we show the space requirement of O(n). Each fact can be added to the tree at most once (Line 5). This state is reached, if the facts are strictly monotonically decreasing when monotonically increasing is asked. The other lines remove more facts than those added, hence the size of the list only reduces. This ends the argument of space requirement of at most O(n). This also shows the output size of O(n) of the algorithm. Now, we show the runtime complexity of O(nlog(n)). Inserting and removing facts in a B-tree has a complexity of O(log(n)). We then consider that we have to execute this for each fact contributing to the aggregation, i.e., O(n) times. This yields a runtime complexity of O(nlog(n)). Theorem 4 Algorithm 3 is sound and complete. Proof. Let us start by completeness, which follows immediately. Since all values (with their intervals) are inserted in the tree, the algorithm does not miss any case. We now show the soundness of the algorithm. We show this by induction. In the base case, we add the first interval to a group-by key, which is simply the value of the fact. We now assume that we have maximal monotonic increasing intervals and that we add another fact. Since the considered facts must have a disjunct interval by definition, there are three cases, we have to distinguish. – The added interval is not adjacent to any neighbour. Then we are done and we have the maximum monotonic increasing interval. – The interval is adjacent to the left neighbour. If the two monotonic intervals are increasing, then by Line 7-9, we merge the two neighbours creating a new bigger maximal interval over the currently maximal interval of the left neighbour and the added interval and update the bounds. – The interval is adjacent to the right neighbour. We have two cases, either the inserted interval is not adjacent with the left neighbour, or the interval is also adjacent with the left neighbour. In both cases, we assume that the current interval is already the maximal interval regarding the left boundary (i.e., it has already been merged with the left neighbour in case the intervals are adjacent and monotonic (see Line 9)) By Line 11, we merge the current maximal interval with the right interval in case they are monotonic, creating the biggest monotonic interval. Hence, the derived interval is always the maximum interval and the algorithm is sound and complete.