=Paper= {{Paper |id=Vol-2785/paper1 |storemode=property |title=Symbolic Learning with Interval Temporal Logic: the Case of Regression |pdfUrl=https://ceur-ws.org/Vol-2785/paper1.pdf |volume=Vol-2785 |authors=Estrella Lucena-Sanchez,Guido Sciavicco,Ionel Eduard Stan |dblpUrl=https://dblp.org/rec/conf/overlay/Lucena-SanchezS20 }} ==Symbolic Learning with Interval Temporal Logic: the Case of Regression == https://ceur-ws.org/Vol-2785/paper1.pdf
                                                  Proceedings of the
     2nd Workshop on Artificial Intelligence and Formal Verification, Logics, Automata and Synthesis (OVERLAY),
                                                  September 25, 2020




Symbolic Learning with Interval Temporal Logic: the
                Case of Regression
                                                           ∗


               Estrella Lucena-Sanchez1 , Guido Sciavicco2 , and Ionel Eduard Stan3
          1
              University of Ferrara and Unversity of Modena and Reggio Emilia, Italy
                                     2
                                       University of Ferrara, Italy
                       3
                         University of Ferrara and University of Parma, Italy
                                      1
                                          estrella.lucenasanchez@unife.it
                                             2
                                               guido.sciavicco@unife.it
                                            3
                                              ioneleduard.stan@unife.it



                                                     Abstract


                   Regression analysis is the statistical process used to estimate the relationship
               between a dependent variable and one or more independent variables. In machine
               learning, typical statistical approaches to regression such as linear regression are
               often replaced with symbolic learning, such as decision tree regression, to capture
               non-linear behaviour while keeping the interpretability of the results. For temporal
               series, regression is sometimes enhanced by using historical values of the independent
               variables. In this paper, we show how temporal regression can be handled by a
               symbolic learner based on interval temporal logic decision trees.


1     Introduction
A multivariate time series [2] is a set of variables that change over time. Each variable of a multivariate
time series is an ordered collection of N real values, and, usually, in a multivariate time series one identifies
one dependent variable B, whose values we are interested to predict, and the independent variables
A1 , . . . , An , whose values are used for the prediction. Multivariate time series emerge in many application
contexts. The temporal history of some hospitalized patient can be described by the time series of the
values of his/her temperature, blood pressure, and oxygenation; the pronunciation of a word in sign
language can be described by the time series of the relative and absolute positions of the ten fingers w.r.t.
some reference point; different sport activities can be distinguished by the time series of some relevant
physical quantities. Having identified the dependent variable, the most relevant and interesting problem
defined on a time series is forecasting, that is, the problem of predicting its future values, and it is typically
solved by regression.
    Regression is the statistical process used to capture the parameters that influence the relationship
between independent and dependent variables. In the context of temporal regression with machine learning,
   ∗ The authors acknowledge the partial support by the Italian INDAM GNCS project Strategic Reasoning and Automated

Synthesis of Multi-Agent Systems
                   Copyright © 2020 for this paper by its authors.
                   Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
6                                                                    E. Lucena-Sanchez, G. Sciavicco, I.E. Stan


                                             temporal regression




                       autoregression           functional regr.              symbolic regr.
                        ex.: Arima              ex.: Linear    regr       ex.: Regression trees
                                                  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                                                  |                    {z                    }
                                                        via lagged transformation


                                 Figure 1: Approaches to temporal regression.


there are three main categories of approach. Autoregression is a set of statistical techniques developed
mainly to model the idea that past values of the independent variable may influence, in several different
ways, future values. These are techniques originally developed for univariate series, and later extended to
the multivariate case. Lagged variables is the machine learning generalization to the latter idea, so to
say, and it consists of transforming the original data set by adding virtual variables that represent the
past values. For example, for a independent variable Ai (t) (in which we make it explicit the temporal
parameter), one adds the lagged variable Ai (t − 1) and Ai (t − 2), . . ., that correspond, respectively, to
the value of Ai one, two, . . . , units of time before. In this way, a static learner (where each instance
of the data set is independent from any other) can be used on the transformed data set, giving rise to
the possibility of using functional regression algorithms, such as linear regression and neural networks,
among others, and symbolic regression algorithms, such as regression tree learning algorithms. While
certain classical regression algorithms, such as linear regression or decision trees are interpretable by
nature (that is, they return an explicit model), the possibility of understanding the underlying process
may be hampered by the presence of lagged variables.
    Regression trees were introduced in the CART (classification and regression trees) system in [3], but
their practical introduction can be dated back to [7, 8]. The problem of extracting the optimal decision tree
from a data set is NP-hard [6], which justifies the use of sub-optimal approaches. The ID3 algorithm [7] is
a greedy approach to the extraction of a decision tree, later extended and generalized in the algorithm
C4.5 [9]. In this paper we sketch the theoretical basis of temporal regression tree learning, taking inspiration
from [4], in which the idea of temporal decision tree learning was first introduced. The underlying idea is
to generalize the concept of lagged data. Instead of transforming the original time series into a static data
set by simply flattening the past values of the independent variables into atemporal instances, we produce
an intermediate data set in which each instance is, itself, a multivariate time series of a fixed length l
(which is the amount of the lag). In this way, to predict a certain value B(t), we use the multivariate time
series given by the values of the variables A1 , . . . , An at the times t − l, t − (l − 1), . . . , t − 1, t, and we
label it with the value B(t), mimicking, in a way, a moving window approach. From [10, 4], we know that
an interval temporal logic such as HS [5] is a very convenient language to describe time series. So, we use
a greedy approach such as the one presented in [4], suitably adapted to the regression case, to learn a
locally optimal regression tree.


2     Time Series and Interval Temporal Logic
Let [N ] an initial subset of N of length N . An interval over [N ] is an ordered pair [x, y], where x, y ∈ [N ]
and x < y, and we denote by I([N ]) the set of all intervals over [N ]. If we exclude the identity relation,
there are 12 different Allen’s relations between two intervals in a linear order [1]: the six relations RA
(adjacent to), RL (later than), RB (begins), RE (ends), RD (during), and RO (overlaps), depicted in
Fig. 2, and their inverses, that is, RX̄ = (RX )−1 , for each X ∈ X , where X = {A, L, B, E, D, O}. Halpern
and Shoham’s modal logic of temporal intervals (HS) is defined from a set of propositional letters AP, and
by associating a universal modality [X] and an existential one hXi to each Allen’s relation RX . Formulas
of HS are obtained by:
Interval Temporal Logic Regression                                                                                          7


                 HS                 Allen’s relations                        Graphical representation
                                                                               x                  y


                                                                                                  x0        y0
                 hAi             [x, y]RA [x0 , y 0 ] ⇔ y = x0
                                                                                                       x0        y0
                 hLi             [x, y]RL [x0 , y 0 ] ⇔ y < x0
                                                                                    0
                                                                               x0 y
                 hBi             [x, y]RB [x0 , y 0 ] ⇔ x = x0 , y 0 < y
                                                                                                  0
                                                                                             x0 y
                 hEi             [x, y]RE [x0 , y 0 ] ⇔ y = y 0 , x < x0
                                                                                   x0        y0
                 hDi             [x, y]RD [x0 , y 0 ] ⇔ x < x0 , y 0 < y
                                                                                        x0                  y0
                 hOi             [x, y]RO [x0 , y 0 ] ⇔ x < x0 < y < y 0


                              Figure 2: Allen’s interval relations and HS modalities.



                                           ϕ ::= p | ¬ϕ | ϕ ∨ ϕ | hXiϕ | hX̄iϕ,                                            (1)
where p ∈ AP and X ∈ X . The other Boolean connectives and the logical constants, e.g., → and >, as
well as the universal modalities [X], can be defined in the standard way, i.e., [X]p ≡ ¬hXi¬p. For each
X ∈ X , the modality hX̄i (corresponding to the inverse relation RX̄ of RX ) is said to be the transpose
of the modalities hXi, and vice versa. The semantics of HS formulas is given in terms of timelines
T = hI([N ]), V i, where V : AP → 2I([N ]) is a valuation function which assigns to each atomic proposition
p ∈ AP the set of intervals V (p) on which p holds. The truth of a formula ϕ on a given interval [x, y] in
an interval model T is defined by structural induction on formulas as follows:

                  T, [x, y]     p          if   [x, y] ∈ V (p), for p ∈ AP;
                  T, [x, y]     ¬ψ         if   T, [x, y] 6 ψ;
                  T, [x, y]     ψ∨ξ        if   T, [x, y] ψ or T, [x, y] ξ;                                                (2)
                  T, [x, y]     hXiψ       if   there is [w, z] s.t [x, y]RX [w, z] and T, [w, z]                     ψ;
                  T, [x, y]     hX̄iψ      if   there is [w, z] s.t [x, y]RX̄ [w, z] and T, [w, z]                    ψ.
    A time series can be described by formulas of HS, in which propositional letters represent decisions of
the type Ai ./ k (./∈ {≤, =, >}) for some value k. We can arbitrarily impose that a decision Ai ./ k is
true on an interval [x, y] if and only if the ratio between number of time points between x and y having a
value that satisfies ./ k and the quantity y − x is at least α, for a fixed α. From there, we can lift the
evaluation to a formula. For example, in Fig. 3, the series T is such that T, [3, 4] [A](A1 > 10), with
α = 1.


3     Interval Temporal Logic Regression Trees
Static regression learning is exemplified in Fig. 3, middle, left. If we want to construct a static regression
model for the values of B, say, from the time point 3 to the time point 8, we build a data set with
the values of A1 , A2 at each of such points associated with the corresponding value of B. Static lagged
regression improves upon static learning by associating also the values of A1 and A2 at t − 3, t − 2, and
so on; a multivariate linear regression algorithm, or a regression tree, may benefit from this historical
information to build a model for B. Since a static regression tree is an NP-hard problem, the most popular
solution is to use a sub-optimal greedy strategy based on the following principles [7, 8]: given a data set
D = {I1 , . . . , Im }, it is defined the notion of split, by means of which D is partitioned into two sets D1 , D2
on the basis of a decision (propositional formula) of the type Ai ./ k, for some attribute Ai and value k;
in other words, for each instance Ij we check:

                                                           Ij     Ai ./ k
to decide if Ij belongs to D1 or D2 . Then, D, D1 , and D2 are compared against each other establish the
amount of information conveyed by that split; in regression problems, the information can simply be the
8                                                                           E. Lucena-Sanchez, G. Sciavicco, I.E. Stan


             35 120 42
             30 115 41                                                                       •                 •
             25 110 40 A2    •           •            •                        •             •                 •          •
             20 105 39                   •                                                                     •          •
             15 100 38 B     •                                     •           •             •
             10 95 37                    •            •
             5 90 36 A1      •                        •            •           •
             A1 A2 B
                             1           2            3            4           5             6                 7          8


                        A1   A2     B           ...       A1 (t − 1)    A2 (t − 1)     A1 (t)         A2 (t)       B(t)
                        10   110    36          ...           10           105          10             110          36
                        15    90    36          ...           10           110           5             100          36
                        25    90    38          ...           5            100           5             110          38
                        30   100    40          ...           5            110          15             115          40
                        30   105    40          ...           15           115          20             115          40
                        20   110    39          ...           20           115          25             105          39

                                     A1 [(t − 2); (t − 1); t]     A2 [(t − 2); (t − 1); t]       B
                                            5;10;10                    110;105;110               36
                                            10;10;15                    105;110;90               36
                                             25;5;5                     95;100;110               38
                                             5;5;15                    100;110;115               40
                                            5;15;20                    110;115;115               40
                                            15;20;25                   115;115;105               39


Figure 3: A multivariate time series with three variables (top). Static regression (middle, left). Static
lagged regression (middle, right). Multivariate time series regression (bottom).


variance of the predicted attribute in each data set. So, a simple greedy strategy can be devised: (i) at
each step, if the stopping condition is not reached, find the attribute Ai and the value k that maximize
the information conveyed by a split on that decision; (ii) split the current data set on the basis of the
decision taken at the previous step; (iii) perform two recursive calls on the obtained data sets. At the end
of this process, the entire model is described by a tree whose edges are labeled by decisions, and each
branch can be seen as a propositional formula.
    Here, we propose to design an algorithm capable to extract a regression tree after a transformation as
in Fig. 3, bottom. For a fixed lag l, to each value B(t), we associate the finite time series described by
Ai (t − l), Ai (t − l + 1), . . . for each i, and each represented by a string. We define a split based on a HS
formula of the type hXi(Ai ./ k), which we call atomic HS formulas, where decisions have taken the place
of propositional letters. Thus, fixed a lag l, at the beginning, the time series T with N points is replaced
by a data set T = {T1 , . . . , TN −l+1 } in which each instance is, itself, a multivariate time series labelled by
a value of B and encompassing l points. Each of such series is initially associated to a reference interval
generically denoted [0, 1], and the main split operation now consists of checking:

                                             Tj , [x, y]        hXi(Ai ./ k).
    By applying the same learning algorithm, then, one obtains a tree whose edges are labeled with atomic
HS formulas instead of propositional formulas, and whose branches, then, can be read as formulas of HS.
In this way, the obtained regression tree keeps the original non-linear (step-type) behaviour, but takes into
account the recent history of a value to perform the prediction, and it does so natively. Initial experiments
on real data sets seem to offer encouraging results.


4     Conclusions
We have sketched the theoretical bases for a temporal regression tree learning algorithm, based on the
interval temporal logic HS, which allows us to natively take into account the past values of the independent
variables and their mutual relationships.
Interval Temporal Logic Regression                                                                       9


References
 [1] J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM,
     26(11):832–843, 1983.
 [2] G. Box, G. Jenkins, and G. Reinsel. Time Series Analysis. Wiley, 1970.

 [3] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees.
     Taylor&Francis, 1984.
 [4] A. Brunello, G. Sciavicco, and I. E. Stan. Interval temporal logic decision tree learning. In Proc. of
     the 16th European Conference on Logics in Artificial Intelligence (JELIA), volume 11468 of Lecture
     Notes in Computer Science, pages 778–793. Springer, 2019.

 [5] J. Y. Halpern and Y. Shoham. A propositional modal logic of time intervals. Journal of the ACM,
     38(4):935–962, 1991.
 [6] L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information
     Processing Letters, 5(1):15–17.

 [7] J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
 [8] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
 [9] J. R. Quinlan. Simplifying decision trees. International Journal of Human-Computer Studies,
     51(2):497–510, 1999.

[10] G. Sciavicco, I. E. Stan, and A. Vaccari. Towards a general method for logical rule extraction from
     time series. In Proc. of the 8th International Work-Conference on the Interplay Between Natural and
     Artificial Computation (IWINAC), volume 11487 of Lecture Notes in Computer Science, pages 3–12.
     Springer, 2019.