Proceedings of the
     2nd Workshop on Artificial Intelligence and Formal Verification, Logics, Automata and Synthesis (OVERLAY),
                                                  September 25, 2020


A kernel function for Signal Temporal Logic formulae∗

                   Luca Bortolussi1,2 , Giuseppe Maria Gallo1 , and Laura Nenzi1,3
          1
              Department of Mathematics and Geoscience, University of Trieste, Italy
                2
                  Modelling and Simulation Group, Saarland University, Germany
                                      3
                                        TU Wien, Austria


                                                     Abstract


                  We discuss how to define a kernel for Signal Temporal Logic (STL) formulae.
               Such a kernel allows us to embed the space of formulae into a Hilbert space, and
               opens up the use of kernel-based machine learning algorithms in the context of
               STL. We show an application of this idea to a regression problem in formula space
               for probabilistic models.


1     Introduction
Signal Temporal Logic (STL) [14] is gaining momentum as a requirement specification language for complex
systems and, in particular, Cyber-Physical Systems [4]. STL has been applied in several flavours, from
Runtime-monitoring [4] to control synthesis [11] and falsification problems [10], and recently also within
learning algorithms, trying to find a maximally discriminating formula between sets of trajectories [5, 3, 16].
In these applications, a central role is played by the real-valued quantitative semantics [9], measuring
robustness of satisfaction.
   Most of the applications of STL have been applied to deterministic (hybrid) systems, with less emphasis
on non-deterministic or stochastic ones [2]. Another area in which formal methods are providing interesting
tools is in logic-based distances between models, like bisimulation metrics for Markov models [1], which
are typically based on a branching logic. In fact, extending these ideas to linear time logic is hard [8], and
typically requires statistical approximations. Finally, another relevant problem is how to measure the
distance between two logic formulae, thus giving a metric structure to the formula space, a task relevant
for learning which received little attention for STL, with the notable exception of [13].
    In this work, we tackle the metric, learning, and model distance problems from a different perspective
than the classical one, which is based on some form of comparison of the languages of formulae. The
   ∗ This research has been partially supported by the Austrian FWF projects ZK-35 and by the Italian PRIN project

“SEDUCE” n. 2017TWRCNB.
                   Copyright © 2020 for this paper by its authors.
                   Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
30


starting point is to consider an STL formula as a function mapping a real-valued trajectory (signal) into a
number or into another trajectory. As signals are functions, STL formulae should be properly considered
as functionals, in the sense of Functional Analysis (FA) [7]. This point of view gives us a large bag of
FA tools to manipulate formulae. What we explore here is the definition of a suitable inner product in
the form of a kernel [19] between STL formulae, capable of capturing the notion of semantic similarity of
two formulae. This will endow the space of formulae with the structure of a Hilbert space, defining a
metric from the inner product. Moreover, having a kernel opens the use of kernel-based machine learning
techniques [18].
     A crucial aspect is that kernels for functionals are typically defined by integrating over the support
space, with respect to a given measure. However, in trajectory space, there is no canonical measure (unless
one discretizes time and maps signals to Rn ), which introduces a degree of freedom on which measure
to use. We decide to work with probability measures on trajectories, i.e. stochastic processes, and we
build one that favours “simple” trajectories, with a small total variation. This encodes the idea that two
formulae differing on simple signals should have a larger distance than two formulae differing only on
complex trajectories. As we will see in the experiments, this choice allows the effective use of this kernel to
perform regression on the formula space for approximating the satisfaction probability and the expected
robustness of several stochastic processes, different than the one used to build the kernel.


1.1     Background
Signal Temporal Logic. (STL) [14] is a linear time temporal logic suitable to monitor properties of
continuous trajectories. A trajectory is a function ξ : I → D with I a time domain in R≥0 and D ⊆ Rn
the state space. We define the trajectory space T as the set of all possible continuous functions ξ : I → D.
The syntax of STL is:
                                      ϕ := tt|µ|¬ ϕ|ϕ1 ∧ ϕ2 |ϕ1 U[a,b] ϕ2

where tt is the Boolean true constant, µ is an atomic predicate, negation ¬ and conjunction ∧ are the
standard Boolean connectives and U[a,b] is the until operator, with a, b ∈ R and a < b. As customary, we
can derive the disjunction operator ∨ and the future eventually F[t1 ,t2 ] and always G[t1 ,t2 ] operators from
the until temporal modality. The logic has two semantics: a Boolean semantics, ξ |= ϕ, with the meaning
that the trajectory ξ satisfies the formula ϕ and a quantitative semantics, ρ(ϕ, ξ), that can be used to
measure the quantitative level of satisfaction of a formula for a given trajectory. The function ρ is also
called the robustness function. The robustness is compatible with the Boolean semantics since it satisfies
the soundness property: if ρ(ϕ, ξ, t) > 0 then (ξ, t) |= ϕ; if ρ(ϕ, ξ, t) < 0 then (ξ, t) 6|= ϕ. Furthermore, it
satisfies also the correctness property, which shows that ρ measures how robust is the satisfaction of a
trajectory with respect to perturbations. We refer the reader to [9] for more details.
   Given a stochastic process M = (T , A, µ), where T is a trajectory space and µ is a probability measure
                                                                                         R
on a σ-algebra A of T , we define the expected robustness as hρ(ϕ, t)i := E[ρ(ϕ, ξ, t)] = ξ∈T ρ(ϕ, ξ, t)dµ(ξ).
The qualitative counterpart of the expected robustness is the satisfaction probability S(ϕ, t), i.e. the
probability that a trajectory generated by the stochastic process M satisfies the formula ϕ at the time
                  R
t: E[s(ϕ, ξ, t)] = ξ∈T s(ϕ, ξ, t)dµ(ξ) where s(ϕ, ξ, t) = 1 if (ξ, t) |= ϕ and 0 otherwise. The satisfaction
probability S(ϕ, t) is the probability that a trajectory generated by the stochastic process M satisfies the
formula ϕ at the time t.
                                                                                                                    31


Kernel Functions. A kernel k(x, z), x, z ∈ X, defines an integral linear operator on functions X → R,
which intuitively can be thought of as a scalar product on a possibly infinite feature space F : k(x, z) =
hφ(x), φ(z)i, with φ : X → F being the eigenfunctions of the linear operator, spanning a Hilbert space,
see [18]. Knowledge of the kernel allows us to perform approximation and learning tasks over F without
explicitly constructing it.
     One application is kernel regression, with the goal of estimating the function f (x) = Ey [y|x] , x ∈ X,
from a finite amount of observations {x1 , ..., xN } ⊂ X, where each observation xi has an associated
response yi ∈ R, and {(x1 , y1 ), ..., (xN , yN )} is the training set. There exist several methods that address
this problem, exploiting the kernel function k : X × X → R as a similarity measure between a generic
x ∈ X and the observations {x1 , ..., xN } of the training set. In the experiments, we compare different
regression models used to compute the expected robustness and the probability satisfaction.


2      A kernel for Signal Temporal Logic
If we endow an arbitrary space with a kernel function, we can apply different kinds of regression methods.
Even for a non-metric space such as the STL formulae one, with a kernel we could perform operations
that are very expensive, such as the estimation of the satisfaction probability and the expected robustness
for a stochastic model of any formula ϕ, without running additional simulations. The idea behind our
definition is to exploit the robustness to project an STL formula to a Hilbert space, and then to compute
the scalar product in that space. In fact, the more similar the two projections will be, and the higher the
scalar product will result. In addition, the function that we will define will be a kernel by construction.

STL kernel. Let us fix a formula ϕ ∈ P in the STL formulae space. Consider the robustness ρ(ϕ, · , · ) :
T × I → R, I ⊂ R is a bounded interval, and T is the trajectory space of continuous functions. We
observe that there is a map h : P → C(T × I) defined by h(ϕ)(w, t) = ρ(ϕ, w, t). It can be proved that
h(P) ⊆ L2 (T × I) [6]. Then, we can use the dot product in L2 as a kernel for P. In such a way, we will
obtain a kernel that returns a high positive value for formulae that agree on high-probability trajectories
and high negative values for formulae that, on average, disagree. Fixing a probability measure µ0 on T
(see below), we can then define the STL-kernel as:
                                      Z     Z                                    Z     Z
 0
k (ϕ, ψ) = hh(ϕ), h(ψ)iL2 (T ×I) =                 h(ϕ)(ξ, t)h(ψ)(ξ, t)dtdµ0 =               ρ(ϕ, ξ, t)ρ(ψ, ξ, t)dtdµ0
                                      ξ∈T    t∈I                                 ξ∈T   t∈I


The function k 0 can be showed to satisfy the finitely positive semi-definite property and then it can be
proved that is a kernel itself [6].
     One desirable property of our kernel is that k(ϕ, ϕ) ≥ k(ϕ, ψ) , ∀ϕ, ψ ∈ P. In fact, given a formula ϕ,
no formula should be more similar to ϕ then ϕ itself. This property can be enforced by redefining the
kernel as follows:
                                                          k 0 (ϕ, ψ)
                                          k(ϕ, ψ) = p                     .
                                                     k 0 (ϕ, ϕ)k 0 (ψ, ψ)

The base measure µ0 . In order to make our kernel meaningful and not too expensive to compute, we
endow the trajectory space T with a probability distribution such that more complex trajectories are less
probable. We use the total variation [17] of a trajectory and the number of changes in its monotonicity as
indicators of its "complexity". We define the probability measure µ0 by providing an algorithm sampling
32


from piece-wise linear functions, a dense subset of T , that we use for Monte Carlo approximation of k
(algorithm reported in [6]). In the next section, we show that using this simple measure still allows us to
make predictions with remarkable accuracy for other stochastic processes on T .


3      Experimental Results
Kernel Regression on µ0 . To show the goodness of our kernel definition, we use it to predict the
expected robustness and the satisfaction probability of STL formulae w.r.t. the stochastic process µ0
defined on T . We use a training set composed of 400 formulae sampled randomly according to a syntax
tree random growing scheme [6], and approximate expected robustness and satisfaction probability using
a set of 100 000 trajectories sampled according to µ0 . We compare the following regression models:
Nadaraya-Watson estimator, K-Nearest Neighbors regression, Support Vector Regression (SVR) and Kernel
Ridge Regression (KRR) [15]. We obtain the lowest Mean Squared Error (MSE) on expected robustness,
equal to 0.29, using an SVR with a Gaussian kernel and σ = 0.5. On the other hand, the best performances
in predicting the satisfaction probability were given by the KRR, with an MSE equal to 0.00036.

Kernel Regression on other stochastic processes The last aspect that we investigate is whether
the definition of our kernel w.r.t. the fixed measure µ0 can be used for making predictions of the average
robustness also for other stochastic processes, i.e., while taking expectations w.r.t. other probability
measures µ on T . We compare this with a kernel defined w.r.t µ itself. We used three different stochastic
models: Immigration, Isomerization and Polymerase, simulated using the Python library StochPy [12].


Figure 1: From left to right, trajectories generated by the Immigration model, by the Isomerization model
and by the Polymerase model.

     As it can be seen from Figure 2 (left), our base kernel is the best performing one. This can be explained
by the fact that the measure µ0 is broad in terms of coverage of the trajectory space, meaning that different
kinds of behaviours tend to be covered. This allows to better distinguish among STL formulae, compared
to models that tend to focus the probability mass on narrower regions of T , such as the Isomerization
model (which has the worst performance). Also in this case, we obtained the best results using SVR and
KRR. However, given the sparseness of SVR, it’s more convenient to use it, since we need to evaluate a
lower number of kernels to perform the regression. Interestingly, the minimum MSE is obtained using the
Gaussian kernel with exactly the same σ parameter as for the regression task on µ0 , hinting for some
intrinsic robustness to hyperparameter’s choice that has to be investigated in greater detail. In Figure
2 (right) we show the predictions for the expected robustness over the three stochastic models that we
took as examples, using the best regression model that we have found so far, which is the SVR with
a Gaussian kernel having σ = 0.22. Note that to compute the kernel by Monte Carlo approximation,
                                                                                                           33


Figure 2: (left) MSE as a function of the bandwidth σ of the Gaussian kernel, for the prediction of
the expected robustness. We compare the performances on different stochastic models, using both the
kernel evaluated according to the base measure µ0 (base kernel), and a custom kernel computed using the
trajectories generated by the stochastic model itself. Both the axis are in logarithmic scale. (center, right)
predictions of the expected robustness for formulae ψ1 = Fx ≥ 1.5 and ψ2 = (¬ (((Gx ≥ −1.4) ∨ (Gx ≥
2.7)) ∨ x ≤ 0.7)) ∨ x ≤ 0, over three different trajectory spaces. The predictions are made using SVR on
a Gaussian kernel, with the best performing bandwidth σ, which is σ = 0.22.


we have to sample only once the required trajectories for µ0 . We also need to estimate the expected
robustness transition probability for the formulae comprising the training set. However, kernel regression
permits us to avoid further simulations of the model µ for novel formulae φ.


4    Conclusions
We defined a kernel for STL, fixing a base measure over trajectories, and we showed that we can use
exactly the same kernel across different stochastic models for computing a very precise approximation
of the expected robustness of new formulae, with only the knowledge of the expected robustness of a
fixed set of training formulae. Our STL-kernel, however, can also be used for other tasks. For instance,
computing STL-based distances among stochastic models, resorting to a dual kernel construction, and
building non-linear embeddings of formulae into finite dimensional real spaces with kernel-PCA techniques.
Another direction for future work is to refine the quantitative semantics in such a way that equivalent
formulae have the same robustness, e.g. using ideas like in [13].


References
 [1] G. Bacci, G. Bacci, K. G. Larsen, and R. Mardare. Complete axiomatization for the bisimilarity
     distance on markov chains. In Proc. of CONCUR, pages 21:1–21:14, 2016.

 [2] E. Bartocci, L. Bortolussi, L. Nenzi, and G. Sanguinetti. System design of stochastic models using
     robustness of temporal properties. Theor. Comput. Sci., 587:3–25, 2015.

 [3] E. Bartocci, L. Bortolussi, and G. Sanguinetti. Data-driven statistical learning of temporal logic
     properties. In Proc. of FORMATS, pages 23–37, 2014.

 [4] E. Bartocci, J. Deshmukh, A. Donzé, G. Fainekos, O. Maler, D. Ničković, and S. Sankaranarayanan.
     Specification-based monitoring of cyber-physical systems: a survey on theory, tools and applications.
     In Lectures on Runtime Verification, pages 135–175. Springer, 2018.
34


 [5] G. Bombara, C.-I. Vasile, F. Penedo, H. Yasuoka, and C. Belta. A Decision Tree Approach to Data
     Classification Using Signal Temporal Logic. In Proc. of HSCC, pages 1–10, 2016.

 [6] L. Bortolussi, G. M. Gallo, and L. Nenzi. A behavioral kernel function for signal temporal logic
     extended version. arXiv preprint arXiv: 3364984, 2020.

 [7] H. Brezis. Functional analysis, Sobolev spaces and partial differential equations. Springer Science &
     Business Media, 2010.

 [8] P. Daca, T. A. Henzinger, J. Kretínský, and T. Petrov. Linear distances between markov chains. In
     J. Desharnais and R. Jagadeesan, editors, 27th International Conference on Concurrency Theory,
     CONCUR 2016, August 23-26, 2016, Québec City, Canada, volume 59 of LIPIcs, pages 20:1–20:15.
     Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.

 [9] A. Donzé, T. Ferrere, and O. Maler. Efficient robust monitoring for STL. In International Conference
     on Computer Aided Verification, pages 264–279. Springer, 2013.

[10] G. Fainekos, B. Hoxha, and S. Sankaranarayanan. Robustness of specifications and its applications
     to falsification, parameter mining, and runtime monitoring with s-taliro. In B. Finkbeiner and
     L. Mariani, editors, Runtime Verification - 19th International Conference, RV 2019, Porto, Portugal,
     October 8-11, 2019, Proceedings, volume 11757 of Lecture Notes in Computer Science, pages 27–47.
     Springer, 2019.

[11] I. Haghighi, N. Mehdipour, E. Bartocci, and C. Belta. Control from signal temporal logic specifications
     with smooth cumulative quantitative semantics. In 58th IEEE Conference on Decision and Control,
     CDC 2019, Nice, France, December 11-13, 2019, pages 4361–4366. IEEE, 2019.

[12] T. R. Maarleveld, B. G. Olivier, and F. J. Bruggeman. Stochpy: a comprehensive, user-friendly tool
     for simulating stochastic biological processes. PloS one, 8(11):e79345, 2013.

[13] C. Madsen, P. Vaidyanathan, S. Sadraddini, C.-I. Vasile, N. A. DeLateur, R. Weiss, D. Densmore,
     and C. Belta. Metrics for signal temporal logic formulae. In 2018 IEEE Conference on Decision and
     Control (CDC), pages 1542–1547. IEEE, 2018.

[14] O. Maler and D. Nickovic. Monitoring temporal properties of continuous signals. In Proc. FORMATS,
     2004.

[15] K. P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.

[16] L. Nenzi, S. Silvetti, E. Bartocci, and L. Bortolussi. A robust genetic algorithm for learning temporal
     specifications from data. In A. McIver and A. Horváth, editors, Quantitative Evaluation of Systems -
     15th International Conference, QEST 2018, Beijing, China, September 4-7, 2018, Proceedings, volume
     11024 of Lecture Notes in Computer Science, pages 323–338. Springer, 2018.

[17] L. Pallara, D Ambrosio and N. Fusco. Functions of bounded variation and free discontinuity problems.
     Oxford University Press, Oxford, 2000.

[18] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[19] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge Univ Pr, 2004.