=Paper=
{{Paper
|id=Vol-2785/paper1
|storemode=property
|title=Symbolic Learning with Interval Temporal Logic: the Case of Regression
|pdfUrl=https://ceur-ws.org/Vol-2785/paper1.pdf
|volume=Vol-2785
|authors=Estrella Lucena-Sanchez,Guido Sciavicco,Ionel Eduard Stan
|dblpUrl=https://dblp.org/rec/conf/overlay/Lucena-SanchezS20
}}
==Symbolic Learning with Interval Temporal Logic: the Case of Regression ==
Proceedings of the 2nd Workshop on Artificial Intelligence and Formal Verification, Logics, Automata and Synthesis (OVERLAY), September 25, 2020 Symbolic Learning with Interval Temporal Logic: the Case of Regression ∗ Estrella Lucena-Sanchez1 , Guido Sciavicco2 , and Ionel Eduard Stan3 1 University of Ferrara and Unversity of Modena and Reggio Emilia, Italy 2 University of Ferrara, Italy 3 University of Ferrara and University of Parma, Italy 1 estrella.lucenasanchez@unife.it 2 guido.sciavicco@unife.it 3 ioneleduard.stan@unife.it Abstract Regression analysis is the statistical process used to estimate the relationship between a dependent variable and one or more independent variables. In machine learning, typical statistical approaches to regression such as linear regression are often replaced with symbolic learning, such as decision tree regression, to capture non-linear behaviour while keeping the interpretability of the results. For temporal series, regression is sometimes enhanced by using historical values of the independent variables. In this paper, we show how temporal regression can be handled by a symbolic learner based on interval temporal logic decision trees. 1 Introduction A multivariate time series [2] is a set of variables that change over time. Each variable of a multivariate time series is an ordered collection of N real values, and, usually, in a multivariate time series one identifies one dependent variable B, whose values we are interested to predict, and the independent variables A1 , . . . , An , whose values are used for the prediction. Multivariate time series emerge in many application contexts. The temporal history of some hospitalized patient can be described by the time series of the values of his/her temperature, blood pressure, and oxygenation; the pronunciation of a word in sign language can be described by the time series of the relative and absolute positions of the ten fingers w.r.t. some reference point; different sport activities can be distinguished by the time series of some relevant physical quantities. Having identified the dependent variable, the most relevant and interesting problem defined on a time series is forecasting, that is, the problem of predicting its future values, and it is typically solved by regression. Regression is the statistical process used to capture the parameters that influence the relationship between independent and dependent variables. In the context of temporal regression with machine learning, ∗ The authors acknowledge the partial support by the Italian INDAM GNCS project Strategic Reasoning and Automated Synthesis of Multi-Agent Systems Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 6 E. Lucena-Sanchez, G. Sciavicco, I.E. Stan temporal regression autoregression functional regr. symbolic regr. ex.: Arima ex.: Linear regr ex.: Regression trees aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | {z } via lagged transformation Figure 1: Approaches to temporal regression. there are three main categories of approach. Autoregression is a set of statistical techniques developed mainly to model the idea that past values of the independent variable may influence, in several different ways, future values. These are techniques originally developed for univariate series, and later extended to the multivariate case. Lagged variables is the machine learning generalization to the latter idea, so to say, and it consists of transforming the original data set by adding virtual variables that represent the past values. For example, for a independent variable Ai (t) (in which we make it explicit the temporal parameter), one adds the lagged variable Ai (t − 1) and Ai (t − 2), . . ., that correspond, respectively, to the value of Ai one, two, . . . , units of time before. In this way, a static learner (where each instance of the data set is independent from any other) can be used on the transformed data set, giving rise to the possibility of using functional regression algorithms, such as linear regression and neural networks, among others, and symbolic regression algorithms, such as regression tree learning algorithms. While certain classical regression algorithms, such as linear regression or decision trees are interpretable by nature (that is, they return an explicit model), the possibility of understanding the underlying process may be hampered by the presence of lagged variables. Regression trees were introduced in the CART (classification and regression trees) system in [3], but their practical introduction can be dated back to [7, 8]. The problem of extracting the optimal decision tree from a data set is NP-hard [6], which justifies the use of sub-optimal approaches. The ID3 algorithm [7] is a greedy approach to the extraction of a decision tree, later extended and generalized in the algorithm C4.5 [9]. In this paper we sketch the theoretical basis of temporal regression tree learning, taking inspiration from [4], in which the idea of temporal decision tree learning was first introduced. The underlying idea is to generalize the concept of lagged data. Instead of transforming the original time series into a static data set by simply flattening the past values of the independent variables into atemporal instances, we produce an intermediate data set in which each instance is, itself, a multivariate time series of a fixed length l (which is the amount of the lag). In this way, to predict a certain value B(t), we use the multivariate time series given by the values of the variables A1 , . . . , An at the times t − l, t − (l − 1), . . . , t − 1, t, and we label it with the value B(t), mimicking, in a way, a moving window approach. From [10, 4], we know that an interval temporal logic such as HS [5] is a very convenient language to describe time series. So, we use a greedy approach such as the one presented in [4], suitably adapted to the regression case, to learn a locally optimal regression tree. 2 Time Series and Interval Temporal Logic Let [N ] an initial subset of N of length N . An interval over [N ] is an ordered pair [x, y], where x, y ∈ [N ] and x < y, and we denote by I([N ]) the set of all intervals over [N ]. If we exclude the identity relation, there are 12 different Allen’s relations between two intervals in a linear order [1]: the six relations RA (adjacent to), RL (later than), RB (begins), RE (ends), RD (during), and RO (overlaps), depicted in Fig. 2, and their inverses, that is, RX̄ = (RX )−1 , for each X ∈ X , where X = {A, L, B, E, D, O}. Halpern and Shoham’s modal logic of temporal intervals (HS) is defined from a set of propositional letters AP, and by associating a universal modality [X] and an existential one hXi to each Allen’s relation RX . Formulas of HS are obtained by: Interval Temporal Logic Regression 7 HS Allen’s relations Graphical representation x y x0 y0 hAi [x, y]RA [x0 , y 0 ] ⇔ y = x0 x0 y0 hLi [x, y]RL [x0 , y 0 ] ⇔ y < x0 0 x0 y hBi [x, y]RB [x0 , y 0 ] ⇔ x = x0 , y 0 < y 0 x0 y hEi [x, y]RE [x0 , y 0 ] ⇔ y = y 0 , x < x0 x0 y0 hDi [x, y]RD [x0 , y 0 ] ⇔ x < x0 , y 0 < y x0 y0 hOi [x, y]RO [x0 , y 0 ] ⇔ x < x0 < y < y 0 Figure 2: Allen’s interval relations and HS modalities. ϕ ::= p | ¬ϕ | ϕ ∨ ϕ | hXiϕ | hX̄iϕ, (1) where p ∈ AP and X ∈ X . The other Boolean connectives and the logical constants, e.g., → and >, as well as the universal modalities [X], can be defined in the standard way, i.e., [X]p ≡ ¬hXi¬p. For each X ∈ X , the modality hX̄i (corresponding to the inverse relation RX̄ of RX ) is said to be the transpose of the modalities hXi, and vice versa. The semantics of HS formulas is given in terms of timelines T = hI([N ]), V i, where V : AP → 2I([N ]) is a valuation function which assigns to each atomic proposition p ∈ AP the set of intervals V (p) on which p holds. The truth of a formula ϕ on a given interval [x, y] in an interval model T is defined by structural induction on formulas as follows: T, [x, y] p if [x, y] ∈ V (p), for p ∈ AP; T, [x, y] ¬ψ if T, [x, y] 6 ψ; T, [x, y] ψ∨ξ if T, [x, y] ψ or T, [x, y] ξ; (2) T, [x, y] hXiψ if there is [w, z] s.t [x, y]RX [w, z] and T, [w, z] ψ; T, [x, y] hX̄iψ if there is [w, z] s.t [x, y]RX̄ [w, z] and T, [w, z] ψ. A time series can be described by formulas of HS, in which propositional letters represent decisions of the type Ai ./ k (./∈ {≤, =, >}) for some value k. We can arbitrarily impose that a decision Ai ./ k is true on an interval [x, y] if and only if the ratio between number of time points between x and y having a value that satisfies ./ k and the quantity y − x is at least α, for a fixed α. From there, we can lift the evaluation to a formula. For example, in Fig. 3, the series T is such that T, [3, 4] [A](A1 > 10), with α = 1. 3 Interval Temporal Logic Regression Trees Static regression learning is exemplified in Fig. 3, middle, left. If we want to construct a static regression model for the values of B, say, from the time point 3 to the time point 8, we build a data set with the values of A1 , A2 at each of such points associated with the corresponding value of B. Static lagged regression improves upon static learning by associating also the values of A1 and A2 at t − 3, t − 2, and so on; a multivariate linear regression algorithm, or a regression tree, may benefit from this historical information to build a model for B. Since a static regression tree is an NP-hard problem, the most popular solution is to use a sub-optimal greedy strategy based on the following principles [7, 8]: given a data set D = {I1 , . . . , Im }, it is defined the notion of split, by means of which D is partitioned into two sets D1 , D2 on the basis of a decision (propositional formula) of the type Ai ./ k, for some attribute Ai and value k; in other words, for each instance Ij we check: Ij Ai ./ k to decide if Ij belongs to D1 or D2 . Then, D, D1 , and D2 are compared against each other establish the amount of information conveyed by that split; in regression problems, the information can simply be the 8 E. Lucena-Sanchez, G. Sciavicco, I.E. Stan 35 120 42 30 115 41 • • 25 110 40 A2 • • • • • • • 20 105 39 • • • 15 100 38 B • • • • 10 95 37 • • 5 90 36 A1 • • • • A1 A2 B 1 2 3 4 5 6 7 8 A1 A2 B ... A1 (t − 1) A2 (t − 1) A1 (t) A2 (t) B(t) 10 110 36 ... 10 105 10 110 36 15 90 36 ... 10 110 5 100 36 25 90 38 ... 5 100 5 110 38 30 100 40 ... 5 110 15 115 40 30 105 40 ... 15 115 20 115 40 20 110 39 ... 20 115 25 105 39 A1 [(t − 2); (t − 1); t] A2 [(t − 2); (t − 1); t] B 5;10;10 110;105;110 36 10;10;15 105;110;90 36 25;5;5 95;100;110 38 5;5;15 100;110;115 40 5;15;20 110;115;115 40 15;20;25 115;115;105 39 Figure 3: A multivariate time series with three variables (top). Static regression (middle, left). Static lagged regression (middle, right). Multivariate time series regression (bottom). variance of the predicted attribute in each data set. So, a simple greedy strategy can be devised: (i) at each step, if the stopping condition is not reached, find the attribute Ai and the value k that maximize the information conveyed by a split on that decision; (ii) split the current data set on the basis of the decision taken at the previous step; (iii) perform two recursive calls on the obtained data sets. At the end of this process, the entire model is described by a tree whose edges are labeled by decisions, and each branch can be seen as a propositional formula. Here, we propose to design an algorithm capable to extract a regression tree after a transformation as in Fig. 3, bottom. For a fixed lag l, to each value B(t), we associate the finite time series described by Ai (t − l), Ai (t − l + 1), . . . for each i, and each represented by a string. We define a split based on a HS formula of the type hXi(Ai ./ k), which we call atomic HS formulas, where decisions have taken the place of propositional letters. Thus, fixed a lag l, at the beginning, the time series T with N points is replaced by a data set T = {T1 , . . . , TN −l+1 } in which each instance is, itself, a multivariate time series labelled by a value of B and encompassing l points. Each of such series is initially associated to a reference interval generically denoted [0, 1], and the main split operation now consists of checking: Tj , [x, y] hXi(Ai ./ k). By applying the same learning algorithm, then, one obtains a tree whose edges are labeled with atomic HS formulas instead of propositional formulas, and whose branches, then, can be read as formulas of HS. In this way, the obtained regression tree keeps the original non-linear (step-type) behaviour, but takes into account the recent history of a value to perform the prediction, and it does so natively. Initial experiments on real data sets seem to offer encouraging results. 4 Conclusions We have sketched the theoretical bases for a temporal regression tree learning algorithm, based on the interval temporal logic HS, which allows us to natively take into account the past values of the independent variables and their mutual relationships. Interval Temporal Logic Regression 9 References [1] J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843, 1983. [2] G. Box, G. Jenkins, and G. Reinsel. Time Series Analysis. Wiley, 1970. [3] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Taylor&Francis, 1984. [4] A. Brunello, G. Sciavicco, and I. E. Stan. Interval temporal logic decision tree learning. In Proc. of the 16th European Conference on Logics in Artificial Intelligence (JELIA), volume 11468 of Lecture Notes in Computer Science, pages 778–793. Springer, 2019. [5] J. Y. Halpern and Y. Shoham. A propositional modal logic of time intervals. Journal of the ACM, 38(4):935–962, 1991. [6] L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15–17. [7] J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986. [8] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [9] J. R. Quinlan. Simplifying decision trees. International Journal of Human-Computer Studies, 51(2):497–510, 1999. [10] G. Sciavicco, I. E. Stan, and A. Vaccari. Towards a general method for logical rule extraction from time series. In Proc. of the 8th International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC), volume 11487 of Lecture Notes in Computer Science, pages 3–12. Springer, 2019.