=Paper= {{Paper |id=Vol-1745/paper2 |storemode=property |title=Learning with Safety Requirements: State of the Art and Open Questions |pdfUrl=https://ceur-ws.org/Vol-1745/paper2.pdf |volume=Vol-1745 |authors=Francesco Leofante,Luca Pulina,Armando Tacchella |dblpUrl=https://dblp.org/rec/conf/aiia/LeofantePT16 }} ==Learning with Safety Requirements: State of the Art and Open Questions== https://ceur-ws.org/Vol-1745/paper2.pdf
                    Learning with safety requirements:
                    state of the art and open questions

                Francesco Leofante1 , Luca Pulina2 , and Armando Tacchella1
      1
           DIBRIS — Università di Genova — Via all’Opera Pia, 13; 16145 Genova; Italia
                francesco.leofante@edu.unige.it – armando.tacchella@unige.it
      2
           POLCOMING — Università di Sassari — Viale Mancini 5; 07100 Sassari; Italy
                                         lpulina@uniss.it




          Abstract. A standing challenge for the adoption of machine learning (ML) tech-
          niques in safety-related applications is that misbehaviors can be hardly ruled out
          with traditional analytical or probabilistic techniques. In previous contributions
          of ours, we have shown how to deal with safety requirements considering both
          Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), two of
          the most effective and well-known ML techniques. The key to provide safety
          guarantees for MLPs and SVMs is to combine Satisfiability Modulo Theory
          (SMT) solvers with suitable abstraction techniques. The purpose of this paper
          is to provide an overview of problems and related solution techniques when it
          comes to guarantee that the input-output function of trained computational mod-
          els will behave according to specifications. We also summarize experimental re-
          sults showing what can be effectively assessed in practice using state-of-the-art
          SMT solvers. Our empirical results are the starting point to introduce some open
          questions that we believe should be considered in future research about learning
          with safety requirements.


1   Introduction
Machine learning (ML) techniques are adopted in a wide range of research and en-
gineering domains. For instance, artificial agents that act in physical domains can be
equipped with learning capabilities in order to acquire the dynamics of interest, thus
saving developers from the burden of devising explicit models — see [1–3] for exam-
ples. However, in spite of some exceptions, ML techniques are confined to systems
which comply to the lowest safety integrity levels, achievable with standard industrial
best practices [4]. The main reason is the absence of effective safety assurance meth-
ods for systems using ML techniques, because traditional analytical and probabilistic
methods can be ineffective in safety-related applications — see, e.g., [5]. In practice,
guaranteeing safety amounts to controlling undesirable behaviors when the agent acts
based on a model of the environment. In some cases of interest, a further challenge is re-
lated to the cost of sampling, i.e., acquiring new data for learning, so that the challenge
of ensuring safety, must also be coupled with the paucity of samples.
    In previous contributions of ours we have shown effective approaches to learning
with safety requirements. In particular, in [6] we tackled the problem of ensuring that
a trained Multi-Layer Perceptron (MLP) emits outputs which are guaranteed to stay


                                                11
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  within range of values considered to be safe. We called this global safety in [7], where
  we introduced two additional problems, namely local safety and stability. In the former
  case, we wish to guarantee that an input close to a training input sample, will elicit an
  output close to the corresponding training output sample. In the latter case, we wish to
  ensure that if the input is contained within some bounded region, also the output will
  be contained within a corresponding bounded set. Since effective learning with MLPs
  requires a large number of samples and their empirical tuning is often fraught with
  difficulties, in [8] we considered safety requirements in conjunction with kernel-bases
  methods. In order to reduce the number of samples we considered Active Learning
  (AL) — see, e.g., [9] — and we focused on Support Vector Regression (SVR) — see,
  e.g., [10] for a tutorial introduction.

      In all the contributions above, the problem of providing safety guarantees is en-
  coded to formulas that can be solved automatically by Satisfiability Modulo Theory
  (SMT) solvers [11]. Intuitively, verification of MLPs/SVRs via SMT involves abstrac-
  tion of the learned model into a set of constraints expressed in linear arithmetic over real
  numbers. The abstraction process is guaranteed to be conservative, i.e., it provides an
  over approximation of the concrete MLP/SVR. In this way, when the SMT solver proves
  that the response of the abstract MLP/SVR cannot exceed a stated bound as long as the
  input values satisfy given preconditions, we can certify that the concrete MLP/SVR has
  the same property. On the other hand, if a counterexample is found, then it is either an
  artefact of the abstraction, or a true counterexample proving that the MLP/SVR is not
  satisfactory in terms of safety. The former case calls for a refinement of the abstraction,
  whereas the second may entail further training of the MLP/SVR.

      The experimental assessment that we present considers SMT encodings obtained
  from different verification problems. In the case of MLPs, the encodings are related to
  control subsystems in the humanoids James [12] and iCub [13]. In the case of James,
  measuring external forces using a single force/torque sensor placed along the kinematic
  chain of the arm requires compensation of the manipulator dynamic, i.e., the contri-
  bution of internal forces must be subtracted from sensor readings [14]. MLP-based
  estimation of internal forces relies on angular positions and velocities of two shoulder
  and two elbow joints, and takes into account components from gravity, Coriolis forces,
  and manipulator inertia. In the case of iCub, we consider MLP-based extrapolation of
  forward kinematics of its arms. In particular, we consider angular positions of three
  shoulder, one elbow and three wrist joints as inputs, and end-effector positions as out-
  puts. Finally, in the case of SVRs, we considered an artificial physical domain simulated
  with V-REP [15], wherewith we simulate the task of applying the right amount of force
  to send a ball into a goal in the presence of obstacles. We further assume that acquiring
  input samples is expensive, so that active learning instead of batch learning is required.

      The remainder of this paper is organized as follows. Section 2 gives background
  notions on MLPs, SVRs, and SMT solvers. Section 3 summarizes the abstractions in-
  troduced in [6–8] and introduces global, local and stability requirements. The state of
  the art is presented in Section 4. Concluding remarks and open questions are presented
  in Section 5.


                                             12
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  2     Preliminaries
  2.1   Neural networks for regression
  A Multi-Layer Perceptron (MLP) is a function ν : Rn → Rm obtained by feeding
  n inputs through a network of neurons, i.e., computational units arranged in multiple
  layers ending with m outputs, where each neuron is characterized by an activation
  function σ : R → R. We consider networks with only three cascaded layers, namely
  input, hidden, and output layer, so that there are always n neurons in the input layer, and
  m neurons in the output layer. Let x = (x1 , . . . , xn ) be the input signal to the network
  ν. We assume that input neurons compute the activation function σ(r) = r, i.e., their
  task is simply to pass forward the components of the input signal. Let h denote the
  number of hidden neurons, then the input of the j-th hidden neuron is
                                  n
                                  X
                           rj =         aji xi + bj      j = {1, . . . , h}                   (1)
                                  i=1

  where aji is the weight of the connection from the i-th neuron in the input layer (1 ≤
  i ≤ n) to the j-th neuron in the hidden layer, and the constant bj is the bias of the j-th
  neuron. We denote with A = {a11 , a12 , . . . a1n , . . . , ah1 , ah2 , . . . ahn } and with B =
  {b1 , . . . , bh } the corresponding sets of weights. The activation function σh computed
  by hidden neurons is a non-constant, bounded and continuous function of its input.
  According to [16] this guarantees that, in principle, ν will be capable of approximating
  any continuous mapping f : Rn → Rm arbitrarily close. The input received by an
  output neuron is
                               h
                               X
                        sk =         ckj σh (rj ) + dk     k = {1, . . . , m}                 (2)
                               j=1

  where ckj denotes the weight of the connection from the j-th neuron in the hidden layer
  to the k-th neuron in the output layer, while dk represents the bias of the k-th output
  neuron – as before, C and D denote the corresponding sets of weights. The output
  signal of the MLP is a vector ν(x) = (σo (s1 ), . . . , σo (sm )). The activation function of
  output neurons can be chosen as σo = σh , so that each output of ν is constrained to the
  range of σh , or as σo (r) = r, so that ν(x) = (s1 , . . . , sm ).
      The key aspect of MLPs is that the sets of parameters A, B, C, D are not manually
  determined but are computed using an iterative optimization procedure known as train-
  ing. Let R be a training set comprised of t pairs R = {(x1 , y1 ), . . . (xl , yl )} where for
  each 1 ≤ i ≤ l, the vector xi is some input signal (pattern) and yi is the correspond-
  ing output signal (label). If we assume that R is generated by some unknown function
  f : Rn → Rm , then we can see training as a way of extrapolating the unknown function
  f from the signals in the training set. The goal of training is to determine the parameters
  A, B, C, D which maximize similarity between ν(x) and y for each pair (x, y) ∈ R.

  2.2   Kernel-based methods for regression
  Support Vector Regression (SVR) is a supervised learning paradigm, whose mathemat-
  ical foundations are derived from Support Vector Machine (SVM) [17–19]. Suppose


                                                 13
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  we are given a set R = {(x1 , y1 ) . . . (xl , yl )} ⊂ X × R, where X is the input space
  with X = Rd . In the context of ε-SVR [20] for the linear case, our aim is to compute a
  function ν(x) which, in case of ε-SVRs, takes the form
                               ν(x) = w · x + b with w ∈ X , b ∈ R                              (3)
  such that ν(x) deviates at most ε from the targets yi ∈ T with kwk as small as possible.
  A solution to this problem is presented in [19], where slack variables ξi , ξi∗ are intro-
  duced to handle otherwise unfeasible constraints. The formulation of the optimization
  problem stated above then becomes:
                                                         Xl
                                               1
                               minimize          kwk2 +C     (ξi + ξi∗ )
                                               2         i=1
                               subject to yi − w · xi − b ≤ ε + ξi                              (4)
                                            w · xi + b − yi ≤ ε + ξi∗
                                            ξi , ξi∗ ≥ 0
  The constant C > 0 determines the trade-off between the “flatness” of f , i.e., how small
  is kwk, and how much we want to tolerate deviations greater than ε.
      In order to solve the above stated optimization problem one can use the standard
  dualization method using Lagrange multipliers. The Lagrange function writes

                      Xl                Xl
            1
     L :=     kwk2 +C     (ξi + ξi∗ ) −     (ηi ξi + ηi∗ ξi∗ )
            2         i=1               i=1
                 l
                 X                                        l
                                                          X
             −         αi (ε + ξi − yi + w · xi + b) −          αi∗ (ε + ξi∗ − yi − w · xi − b) (5)
                 i=1                                      i=1

  where ηi , ηi∗ , αi , αi∗ are Lagrange multipliers. Solving the dual problem allows to rewrite
  (3) as
                                           l
                                           X
                                  ν(x) =         (αi − αi∗ )xi · x + b                          (6)
                                           i=1
  The algorithm can be modified to handle non-linear cases. This could be done by map-
  ping the input samples xi into some feature space F by means of a map Φ : X → F.
  However this approach can easily become computationally unfeasible for high dimen-
  sionality of the input data. A better solution can be obtained by applying the so-called
  kernel trick. Observing eq. (6) it is possible to see that the SVR algorithm only de-
  pends on dot products between patterns xi . Hence it suffices to know the function
  K(x, x0 ) = Φ(x) · Φ(x0 ) rather than Φ explicitly. By introducing this idea the opti-
  mization problem can be rewritten, leading to the non-linear version of (6):
                                         l
                                         X
                                ν(x) =     (αi − αi∗ )K(xi , x) + b                             (7)
                                         i=1

  where K(·) is called kernel function as long as it fulfills some additional conditions —
  see, e.g., [10] for a full characterization of K.


                                                  14
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  2.3   Satisfiability Modulo Theory
  A Constraint Satisfaction Problem (CSP) [21] is defined over a constraint network, i.e.,
  a finite set of variables, each ranging over a given domain, and a finite set of constraints.
  Intuitively, a constraint represents a combination of values that a certain subset of vari-
  ables is allowed to take. In this paper we are concerned with CSPs involving linear
  arithmetic constraints over real-valued variables. From a syntactical point of view, the
  constraint sets that we consider are built according to the following grammar:

             set → { constraint }∗ constraint
                                                           value → var | - var | const
     constraint → ({ atom }∗ atom)
                                                        equation → var = value - value
          atom → bound | equation
                                                                 | var = value + value
        bound → value relop value
                                                                 | var = const · value
          relop → < | ≤ | = | ≥ | >
  where const is a real value, and var is the name of a (real-valued) variable. Formally, this
  grammar defines a fragment of linear arithmetic over real numbers known as Quantifier-
  Free Linear Arithmetic over Reals (QF LRA) [11].
      From a semantical point of view, let X be a set of real-valued variables, and µ be
  an assignment of values to the variables in X, i.e., a partial function µ : X → R. For
  all µ, we assume that µ(c) = c for every c ∈ R, and that relopµ , +µ , −µ , ·µ denote the
  standard interpretations of relational and arithmetic operators over real numbers. Linear
  constraints are thus interpreted as follows:
    – A constraint (a1 . . . an ) is true exactly when at least one of the atoms ai with i ∈
      {1, . . . , n} is true.
    – Given x, y ∈ X ∪ R, an atom x relop y is true exactly when µ(x) relopµ µ(y)
      holds.
    – Given x ∈ X, y, z ∈ X ∪ R, and c ∈ R
       • x = y z with ∈ {+, −} is true exactly when µ(x) =µ µ(y) µ µ(z),
       • x = c · y is true exactly when µ(x) =µ µ(c) ·µ µ(y),
  Given an assignment, a constraint is thus a function that returns a Boolean value. A con-
  straint is satisfied if it is true under a given assignment, and it is violated otherwise. A
  constraint network is satisfiable if it exists an assignment that satisfies all the constraints
  in the network, and unsatisfiable otherwise.
      While different approaches are available to solve CSPs, this work is confined to
  Satisfiability Modulo Theories (SMT) [11]. SMT is the problem of deciding the satis-
  fiability of a first-order formula with respect to some decidable theory T . In particular,
  SMT generalizes the Boolean satisfiability problem (SAT) by adding background theo-
  ries such as the theory of real numbers, the theory of integers, and the theories of data
  structures (e.g., lists, arrays and bit vectors). The idea behind SMT is that the satisfi-
  ability of a constraint network can be decided in two steps. The first one is to convert
  each arithmetic constraint to a Boolean constraint, while the second one is to check
  the consistency of the assignment in the corresponding background theory. Checking
  that the Boolean assignment is feasible in the underlying mathematical theory can be
  performed by a specialized reasoning procedure (SMT solvers) – any procedure that


                                              15
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  computes feasibility of a set of linear inequalities in the case of QF LRA. If the consis-
  tency check fails, then a new Boolean assignment is requested and the SMT solver goes
  on until either an arithmetically consistent Boolean assignment is found, or no more
  Boolean assignments can be found.


  3     SMT-based abstraction and verification techniques

  3.1    Abstraction for MLPs and SVRs

  Research on learning with safety requirements has the objective of verifying trained
  MLPs/SVRs, i.e., guarantee that, given some input preconditions, the mapping ν re-
  spects stated output postconditions. In principle, this check can be automated in a suit-
  able formal logic L by proving

                                  |=L ∀x (π(x) → ϕ(ν(x)))                                 (8)

  where π and ϕ are formulas in L expressing preconditions on input signals and post-
  conditions on output signals, respectively. Most often, activation functions for MLPs
  are non-linear and transcendental and so are kernels for SVRs, which makes (8) unde-
  cidable if L is chosen to be expressive enough to model such functions – see, e.g., [22].
  In [6, 8] this problem is tackled introducing an abstraction framework. Given ν and
  properties π, ϕ, a corresponding abstract model ν̃ and abstract properties π̃, ϕ̃ are de-
  fined so that
                                |=L0 ∀x0 (π̃(x0 ) → ϕ̃(ν̃(x0 )))                        (9)
  is decidable, and (8) holds whenever (9) holds. If this is not the case, then there is some
  abstract input x0 such that π̃(x) is satisfied, but the abstract output ν̃(x0 ) violates ϕ̃.
  There are two cases:

      – If a concrete input x can be extracted from x0 such that also ν(x) violates ϕ, then
        x0 is a realizable counterexample, and ν fails to uphold (8).
      – On the other hand, if no concrete input x can be extracted such that ν(x) violates
        ϕ, then x0 is a spurious counterexample.

  Spurious counterexamples arise because an abstract MLP/SVR ν̃ is “more permissive”
  than its concrete counterpart ν. In this case, a refinement is needed, i.e., a new ab-
  straction “tighter” than ν̃ must be computed and checked. We call this abstraction-
  refinement loop Counter-Example Triggered Abstraction Refinement (CETAR) in [6].
  Briefly stated, the main difference between CETAR and classical Counter-Example
  Guided Abstraction Refinement (CEGAR) is that, in the latter counterexamples are used
  as a basis to refine the abstraction, whereas in the former counterexamples are merely
  triggering the refinement, without being involved in computing the refinement. While
  there is no theoretical guarantee that the CETAR loop terminates, in practice it is ex-
  pected that a network is declared safe or a realizable counterexample is found within a
  small number of refinement steps.
      Details about the definition of ν̃ for MLPs in such a way that L0 =QF LRA, as well as
  details of preconditions and postconditions to be checked, with related abstractions, are


                                             16
F.Leofante et al. Learning with safety requirements: state of the art and open questions



                                                        µ




                                   p



                             Fig. 1. Interval abstraction of a Gaussian kernel.



  provided in [6, 7]. In the following, for the sake of explanation, we briefly summarize
  how to encode SVR verification in a corresponding QF LRA problem as shown in [8].
  In particular, our experimental analysis focuses on radial basis function (RBF) kernels

                                                            2           1
                               K(xi , x) = e−γkx−xi k with γ =                                     (10)
                                                                       2σ 2
  Given a SVR, we consider its input domain to be defined as I = D1 × . . . × Dn , where
  Di = [ai , bi ] with ai , bi ∈ R, and the output domain to be defined as O = [c, d] with
  c, d ∈ R. A concrete domain D = [a, b] is abstracted to [D] = {[x, y] | a ≤ x ≤ y ≤
  b}, i.e., the set of intervals inside D, where [x] is a generic element.
       To abstract the RBF kernel K, we observe that (10) corresponds to a Gaussian
  distribution G(µ, σ) with mean µ = xi — the i-th support vector of ς — and variance
  σ 2 = 2γ 1
             . Given a real-valued abstraction parameter p, let us consider the interval [x, x+
  p]. It is possible to show that the maximum and minimum of G(µ, σ) restricted to [x, x+
  p] lie in the external points of such interval, unless µ ∈ [x, x + p] in which case the
  maximum of G lies in x = µ. The abstraction K̃p is complete once we consider two
  points x0 , x1 with x0 < µ < x1 defined as the points in which G(xi )  G(µ) with
  i ∈ 0, 1, where by “a  b” we mean that a is at least one order of magnitude smaller
  than b 3 . We define the abstract kernel function for the i-th support vector as
                  
                  
                  [min(G(x), G(x + p)), G(µ)]                if µ ∈ [x, x + p] ⊂ [x0 , x1 ]
  K̃p ([x, x+p]) = [min(G(x), G(x + p)), max(G(x), G(x + p))] if µ ∈
                                                                   / [x, x + p] ⊂ [x0 , x1 ]
                  
                  
                   [0, G(x0 )]                                otherwise
                                                                                    (11)
  According to eq. (11), p controls to what extent K̃p over-approximates K since large
  values of p correspond to coarse-grained abstractions and small values of p to fine-
  grained ones. A pictorial representation of eq. (11) is given in figure 1. The abstract
   3
       Given a Gaussian distribution, 99.7% of its samples lie within ±3σ. In this case however, our
       RBF formulation does not include any normalization factor and stopping at ±3σ would not
       be sufficient to include all relevant samples. The heuristics described in the text was therefore
       adopted to solve the problem.



                                                   17
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  SVR ς˜p is defined rewriting equation (7) as
                                     l
                                     X
                               ς˜p =   (αi − αi∗ )K̃p (xi , x) + b                            (12)
                                       i

  where we abuse notation and use conventional arithmetic symbols to denote their inter-
  val arithmetic counterparts.
      It is easy to see that every abstract SVR ς˜p can be modeled in QF LRA. Let us con-
  sider a concrete SVR ς : [0, 1] → (0, 1), i.e., a SVR trained on T = {(x1 , y1 ) . . . (xl , yl )}
  with xi ∈ [0, 1] ⊂ R and yi ∈ (0, 1) ⊂ R. The abstract input domain is defined by:

                                       xi ≥ 0        xi ≤ 1                                   (13)

  The coefficients and the intercept seen in equation (12) carry over to ς˜p and are defined
  by constraints of the form

                              αi − αi ∗ = hαi − αi ∗i         b = hbi                         (14)

  where the notation h·i represents the actual values obtained by training ς. To complete
  the encoding, we must provide constraints that correspond to (11) for all the support
  vectors of ς. Assuming p = 0.5, the encoding for the i-th kernel writes
                     if (x ≤ x0 )
                            then (Ki ≥ 0) (Ki ≤ G(x0 ))
                     ...
                     if (0 < x)(x ≤ 0.5)
                            then (Ki ≥ min(G(0), G(0.5)) (Ki ≤ max(G(0), G(0.5))
                     ...
                     if (x1 < x)
                            then (Ki ≥ 0) (Ki ≤ G(x0 ))
      The expression “if t1 . . . tm then a1 . . . an ” is an abbreviation for the set of con-
  straints
                                       (¬t1 . . . ¬tm a1 )
                                               ...                                        (15)
                                       (¬t1 . . . ¬tm an )
  where t1 , . . . , tm and a1 , . . . an are atoms expressing bound on variables, and ¬ is
  Boolean negation (e.g., ¬(x < y) is (x ≥ y)). Finally, the output of the SVR is just

             f (x) = hα1 − α1∗ iKi (x1 , x) + . . . + hαn − αn∗ iKn (xn , x) + hbi            (16)

  with x1 , . . . , xn being the support vectors of ς

  3.2   Problems in safety related learning
  As mentioned previously, verification of MLPs and SVRs amounts to answer a logi-
  cal query. In this subsection, we summarize the three different verification problems –
  namely, global safety, local safety, and stability – as introduced in [7], as well as the
  different ways of defining preconditions π and postconditions ϕ.


                                                18
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  Global safety This property was considered in [6] for the first time, and requires that a
  trained model ν : I → O with n inputs and m outputs fulfills the condition
                           ∀x ∈ I, ∀k ∈ {1, . . . , m} : νk (x) ∈ [lk , hk ]                (17)

  where lk , hk ∈ Ek are constants chosen to define an interval wherein the k-th compo-
  nent of ν(x) is to range – we call them safety thresholds after [6]. It is easy to see how
  (17) can be expressed using the notation of (8) by defining
    – π(x) as a predicate which is true exactly when x ∈ I, and
    – ϕ(y) as a predicate which is true exactly when y ∈ B where B = [l1 , h1 ] × . . . ×
      [lm , hm ].
  We say that an MLP/SVR ν is globally safe if condition (8) holds when π and ϕ are
  defined as above. This notion of (global) safety is representative of all the cases in
  which an out-of-range response is unacceptable, such as, e.g., minimum and maximum
  reach of an industrial manipulator, lowest and highest percentage of a component in
  a mixture, and minimum and maximum dose of a drug that can be administered to a
  patient.

  Local safety The need to check for global safety in MLPs can be mitigated using
  bounded activation functions in the output neurons to “squash” their response within
  an acceptable range, modulo rescaling. In principle, the same trick could be applied to
  SVRs as well, even if it is much less commonly adopted. A more stringent, yet neces-
  sary, requirement is thus represented by “local” safety. Informally speaking, we can say
  that an MLP/SVR ν trained on a dataset R of t patterns is “locally safe” whenever given
  an input pattern x∗ 6= x for all (x, y) ∈ R it turns out that ν(x∗ ) is “close” to yj as long
  as x∗ is “close” to xj and (xj , yj ) ∈ R for some j ∈ {1, . . . , t}. Local safety cannot be
  guaranteed by design, because the range of acceptable values varies from point to point.
  Moreover, it is relevant in all those applications where the error of an MLP/SVR must
  never exceeds a given bound on yet-to-be-seen inputs, and where the response of an
  MLP must be stable with respect to small perturbations in its input in the neighborhood
  of “known” patterns.

  Stability While local safety allows to analyze the response of a network even in the
  presence of bounded-output MLPs/SVRs, it suffers from two important limitations.
    – It does not take into account the whole input domain, i.e., if a pattern falls out
      the union of the input polytopes, then nothing can be said about the quality of the
      response.
    – It is tied to a specific training set, i.e., changing the set of patterns used to train the
      network can change the response of the safety check.
  These two limitations are important in practice, in particular when the availability of
  training data is scarce so the chance of hitting “unseen” patterns is high, or when the
  training set is very sparse so the chance of having large regions of uncovered input
  space is high. One way to overcome these limitations is to keep the locality of the
  check, i.e., small input variations must correspond to small output variations, but drop


                                                19
F.Leofante et al. Learning with safety requirements: state of the art and open questions

                  Family           Overall          Time (s)         Hardness
                                  N       #                    EA      ME     MH
               ICUB LOGI         720    504         68706.83   132     330     42
               ICUB RAMP          36      35         3980.89    15      20      –
               ICUB TANH         720    581         45618.63   257     296     28
               JAMES LOGI        720    489         84699.24    70     378     41
               JAMES RAMP         36      34         3101.20    10      23      1
               JAMES TANH        720    559         86560.52   131     371     57

  Table 1. Encoding-centric view of the results. The table consists of seven columns where we
  report for each family their name in alphabetical order (column “Family”), the number of encod-
  ings included in the family, and the number of encodings solved (group “Overall”, columns “N”,
  “#”, respectively), the CPU time taken to solve the encodings (column “Time”), the number of
  easy, medium and medium-hard encodings (group “Hardness”, columns “EA”, “ME”, “MH”).



  the dependency from the training set, i.e., check the whole input space. Formally, given
  a trained model ν : I → O we define stability as follows
                             ∀x1 , x2 ∈ I : ||x1 − x2 ||≤ δ
                                                                                            (18)
                                            → ||ν(x1 ) − ν(x2 )∗ ||≤ 

  where δ,  ∈ R+ are two arbitrary constants which define the amount of variation that
  we allow in the input and the corresponding perturbation of the output. We observe
  that condition (18) requires non linear arithmetic to express the scalar product of the
  difference vectors. In other words, once fixed x1 , we have that x2 must be contained
  in an hypersphere of radius δ centered in x1 – and similarly for ν(x1 ), ν(x2 ) and .
  An overapproximation of such hypersphere, and thus a consistent abstraction of (18), is
  any convex polyhedra in which the hypersphere is contained. The degree of abstraction
  can be controlled by setting the number of facets of the polyhedra, and such an abstract
  condition can be expressed in QF LRA.
      As reported in [7] stability turns out to be the hardest condition to check for SMT
  solvers, so in the following we will focus on this one.


  4     State of the art
  4.1   Experiments with batch learning and MLPs
  Considering the experimental results reported in [7], stability turned out to be the hard-
  est condition to check for SMT solvers, and in Table 1 we report a summary of that
  experimental analysis. We consider both James and iCub case studies in the case of
  MLPs with different activation functions, namely logistic (logi), bounded ramp (ramp),
  and hyperbolic tangent (tanh) functions. In order to avoid confining the results to a sin-
  gle SMT solver, the number of encodings solved and the cumulative time taken for each
  family is computed considering the “SOTA solver”, i.e., the ideal SMT solver that al-
  ways fares the best time among all the solvers considered. An encoding is thus solved
  if at least one of the solvers solves it, and the time taken is the best among CPU times


                                               20
F.Leofante et al. Learning with safety requirements: state of the art and open questions




  Fig. 2. Experimental setup in V-REP. In a field delimited by fences, a green ball is to be thrown
  from one side into the goal (brown door) on the opposite side. An obstacle (grey wall) may be
  placed inside the field in such a position that all or some straight shots will always be ineffective.


  of the solvers that solved the encoding. The solvers involved in the experimentation
  were the participants to the QF LRA track of the SMT Competition 2011 [23], namely
  CVC [24] (version 4 1.0rc.2026), MATH SAT [25] (version 5), SMTI NTERPOL [26] (ver-
  sion 2.0pre), YICES [27] (v. 1.0.29, with smtlib2 parser), and Z3 [28] (v. 3.0). In Table 1,
  the encodings are classified according to their hardness with the following criteria: easy
  encodings are those solved by all the solvers, medium encodings are those non easy
  encodings that could still be solved by at least two solvers, medium-hard encodings are
  those solved by one reasoner only, and hard encodings are those that remained unsolved.
  Looking at the results, we observe that 2202 encodings out of 2952 were solved. With
  “only” 75% of the encodings solved, we confirm that this suite contains the most diffi-
  cult formulas of the whole test set. In particular, 615 encodings are easy, 1418 medium,
  while 169 are medium-hard. For more details about the results, we refer the reader
  to [7].


  4.2   Experiments with active learning and SVRs

  Experimental Setup A pictorial representation of the case study we consider is shown
  in Figure 2. In this domain setup, a ball has to be thrown in such a way that it goes
  through a goal, possibly avoiding obstacles present in the environment. The learning
  agent interacting with the domain is allowed to control the force f = (fx , fy ) applied
  to the ball when it is thrown. Given that the coordinate origin is placed at the cen-
  ter of the wall opposing the goal — see Figure 2 — we chose fx ∈ [−20, 20]N and
  fy ∈ [10, 20]N , respectively. The range for fx is chosen in such a way that the whole
  field, including side walls, can be targeted, and the range for fy is chosen so that the ball
  can arrive at the goal winning pavement’s friction both in straight and kick shots. The
  collision between the side walls and the ball is completely elastic. In principle, the ob-
  stacle can be placed everywhere in the field, but we consider only three configurations,
  namely (a) no obstacle is present, (b) the obstacle occludes straight line trajectories
  — the case depicted in Figure 2 — and (c) the obstacle partially occludes the goal, so
  that not all straight shots may go through it. The domain of Figure 2 is implemented


                                                  21
F.Leofante et al. Learning with safety requirements: state of the art and open questions

   Algorithm Mean STD Median IQR
                                                     Algorithm Mean STD Median IQR
            Straight and Kick shots
                                                                   Case a
    Random      97.6 13.68      100.0    –
                                                      Random   97.6 13.68 100.0   –
     QBC       88.27 24.28      100.0    –
                                                       QBC     88.27 24.28 100.0  –
      DB       95.53 19.49      100.0    –
                                                        DB     95.53 19.49 100.0  –
                 Straight shots
                                                                   Case b
    Random     40.06 11.7364 40.0      5.0
                                                      Random   64.07 36.18 69.0  71.0
     QBC       15.05 3.7275     15.0   3.0
                                                       QBC     67.22 43.44 100.0  –
      DB       14.75 0.8275     15.0   1.0
                                                        DB     63.63 39.49 100.0  –
                   Kick shots
                                                                   Case c
    Random      16.3 18.29      12.0   8.25
                                                      Random   91.94 23.26 100.0  –
     QBC        15.9 25.12       3.0  19.25
                                                       QBC     85.46 24.67 100.0  –
      DB        9.28 13.51       7.0   5.0
                                                        DB     93.13 23.35 100.0  –


  Table 2. Evaluation of the active learning algorithms. Left: statistics on the number of AL runs
  needed when no obstacle is present (case a), using different subsets of the feasible trajectories.
  “Mean” and “STD” are the sample mean and standard deviation, respectively; “Median” is the
  50%-percentile, and IQR is the difference between 75%-percentile and 25%-percentile. When
  the median coincides with the maximum number of allowable runs, the IQR value cannot be
  computed (denoted with “–”). Right: Statistics on the number of AL runs needed for cases a, b
  and c,



  using the V-REP simulator [15]. Concrete SVRs are trained using the Python library
  SCIKIT- LEARN , which in turn is based on LIBSVM [29]. Verification of abstract SVRs
  is carried out using Mathsat [30]. The Python library GPy [31] is used to train Gaussian
  process regression.

  Results We train SVRs on a dataset of n samples built as T = {(fi , yi ), . . . , (fn , yn )}
  where fi = (fxi , fyi ) is the force vector applied to the ball to throw it, and yi is the
  observed distance from the center of the gate, which is also the target of the regression.
  Obviously, the regression problem could be extended, e.g., by including the initial posi-
  tion of the ball, but our focus here is on getting the simplest example that could lead to
  evaluation of different active learning paradigms and verification thereof. Preliminary
  tests were performed in an obstacle-free environment, using specific subsets of all the
  feasible trajectories. Three algorithms were compared, namely a random strategy for
  adding new samples to the learning set, the query by committee algorithm and the one
  proposed by Demir and Bruzzone [32] — listed in Table 2 as “Random”, “QBC” and
  “DB”, respectively. Each test episode consists in adding samples to the model, and it
  ends when either the Root Mean Squared Error (RMSE) is smaller than a threshold
  (here 0.2), or when at most 100 additional samples are asked for. Table 2 suggests that
  when the learning problem involves one single class of shots (e.g., only straight shots),
  the algorithms provide better results. Indeed, when both straight and kick shots are pos-
  sible, all three algorithms require more than 100 additional samples in at least 50% of
  the runs in order to reach the required RMSE. We conducted further experiments, this
  time considering the three cases (a), (b), and (c) described before, and the full set of
  feasible trajectories in each case. The results obtained, shown in Table 2 (right), con-
  firm the observation made on Table 2 (left). The best results are indeed obtained for
  case (b), wherein the obstacle is placed in such a way that only kick shots are feasible,
  thus turning the learning problem into a single-concept one.


                                                22
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  5     Open questions

  In this work we have summarized the state of the art in the topic of learning with safety
  requirements. The promise of this research is to make data-driven models fit enough
  to be deployed in safety-related context, e.g., adaptive agents operating in physical do-
  mains. However, several questions should be answered before the techniques surveyed
  in this paper can be pushed in practical applications with confidence in their effective-
  ness. Currently, we can consider the following issues:

      – SMT solvers could deal with the majority of encodings in a reasonable amount of
        time; still, there might be room for improvement and scalability with respect to the
        size of the learning problems must be assessed.
      – Safety requirements have been considered for batch learning with MLPs and active
        learning with SVRs; it would be worth investigating whether active learning with
        MLPs and batch learning with SVRs can give improved results in terms of learning
        and verification.
      – There are learning techniques which are both very effective and natively provide
        (statistical) bounds on their prediction; Gaussian process regression (GPR) [33] is
        one such technique, with a fairly considerable record of success stories; it would
        be interesting to compare the level of confidence in the prediction given by GPR
        on one side, and the strict guarantees provided by MLPs/SVRs checked with SMT
        solvers on the other.
      – Applications in which learning techniques ought to provide safety requirements are
        diverse, and include learning through reinforcement with continuous rather than
        discrete state-action-space representations — see [34] for a discussion. One im-
        portant issue is thus the feasibility of techniques that we surveyed in such fields,
        wherein learning is part of a larger system devoted to program adaptive agents.

  Our future works will focus on addressing the above open issues and whether the pro-
  posed methodologies can scale to real-world problems, possibly posing verification of
  learning models as a challenge problem for the SMT research community.


  References
   1. Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from
      demonstration. Robotics and autonomous systems 57(5) (2009) 469–483
   2. Nguyen-Tuong, D., Peters, J.: Model learning for robot control: a survey. Cognitive process-
      ing 12(4) (2011) 319–340
   3. Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Reinforcement Learn-
      ing. Springer (2012) 579–610
   4. Smith, D., Simpson, K.: Functional Safety – A Straightforward Guide to applying IEC 61505
      and Related Standards (2nd Edition). Elsevier (2004)
   5. Kurd, Z., Kelly, T., Austin, J.: Developing artificial neural networks for safety critical sys-
      tems. Neural Computing & Applications 16(1) (2007) 11–19
   6. Pulina, L., Tacchella, A.: An Abstraction-Refinement Approach to Verification of Artificial
      Neural Networks. In: 22nd International Conference on Computer Aided Verification (CAV
      2010). Volume 6174 of Lecture Notes in Computer Science, Springer (2010) 243–257



                                                23
F.Leofante et al. Learning with safety requirements: state of the art and open questions

   7. Pulina, L., Tacchella, A.: Challenging smt solvers to verify neural networks. AI Communi-
      cations 25(2) (2012) 117–135
   8. Leofante, F., Tacchella, A.: Learning in Physical Domains: Mating Safety Requirements and
      Costly Sampling. In: AI*IA 2016, Advances in Artificial Intelligence - 15th International
      Conference of the Italian Association for Artificial Intelligence, Genova, Italy, November 29
      - December 1, 2016, Proceedings. (2016) To appear.
   9. Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55-66)
      (2010) 11
  10. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Statistics and computing
      14(3) (2004) 199–222
  11. Barrett, C.W., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo theories. Hand-
      book of satisfiability 185 (2009) 825–885
  12. Jamone, L., Metta, G., Nori, F., Sandini, G.: James: A humanoid robot acting over an un-
      structured world. In: Humanoid Robots, 2006 6th IEEE-RAS International Conference on,
      IEEE (2007) 143–150
  13. Metta, G., Natale, L., Nori, F., Sandini, G., Vernon, D., Fadiga, L., Von Hofsten, C.,
      Rosander, K., Lopes, M., Santos-Victor, J., et al.: The icub humanoid robot: An open-systems
      platform for research in cognitive development. Neural Networks 23(8) (2010) 1125–1134
  14. Fumagalli, M., Gijsberts, A., Ivaldi, S., Jamone, L., Metta, G., Natale, L., Nori, F., Sandini,
      G.: Learning to Exploit Proximal Force Sensing: a Comparison Approach. From Motor
      Learning to Interaction Learning in Robots (2010) 149–167
  15. Rohmer, E., Singh, S.P., Freese, M.: V-rep: A versatile and scalable robot simulation frame-
      work. In: Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference
      on, IEEE (2013) 1321–1326
  16. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Net-
      works 4(2) (1991) 251–257
  17. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers.
      In: Proceedings of the fifth annual workshop on Computational learning theory, ACM (1992)
      144–152
  18. Guyon, I., Boser, B., Vapnik, V.: Automatic capacity tuning of very large vc-dimension
      classifiers. Advances in neural information processing systems (1993) 147–147
  19. Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3) (1995) 273–297
  20. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc.,
      New York, NY, USA (1995)
  21. Mackworth, A.K.: Consistency in networks of relations. Artificial intelligence 8(1) (1977)
      99–118
  22. Franzle, M., Herde, C., Teige, T., Ratschan, S., Schubert, T.: Efficient solving of large non-
      linear arithmetic constraint systems with complex boolean structure. Journal on Satisfiability,
      Boolean Modeling and Computation 1 (2007) 209–236
  23. Bruttomesso, R., Deters, M., , Griggio, A.: The 2011 smt competition (2011)
  24. Barrett, C., Tinelli, C.: Cvc3. In: International Conference on Computer Aided Verification,
      Springer (2007) 298–302
  25. Cimatti, A., Griggio, A., Schaafsma, B.J., Sebastiani, R.: The mathsat5 smt solver. In: Tools
      and Algorithms for the Construction and Analysis of Systems. Springer (2013) 93–107
  26. Christ, J., Hoenicke, J., Nutz, A.: Smtinterpol: An interpolating smt solver. In: International
      SPIN Workshop on Model Checking of Software, Springer (2012) 248–254
  27. Dutertre, B., De Moura, L.: A fast linear-arithmetic solver for dpll (t). In: International
      Conference on Computer Aided Verification, Springer (2006) 81–94
  28. De Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: International conference on Tools
      and Algorithms for the Construction and Analysis of Systems, Springer (2008) 337–340



                                                24
F.Leofante et al. Learning with safety requirements: state of the art and open questions

  29. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions
      on Intelligent Systems and Technology (TIST) 2(3) (2011) 27
  30. Cimatti, A., Griggio, A., Schaafsma, B., Sebastiani, R.: The MathSAT5 SMT Solver. In
      Piterman, N., Smolka, S., eds.: Proceedings of TACAS. Volume 7795 of LNCS, Springer
      (2013)
  31. GPy: GPy: A gaussian process framework in python. http://github.com/
      SheffieldML/GPy (since 2012)
  32. Demir, B., Bruzzone, L.: A multiple criteria active learning method for support vector re-
      gression. Pattern Recognition 47(7) (2014) 2558–2567
  33. Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: Advances in Neu-
      ral Information Processing Systems 8, NIPS, Denver, CO, November 27-30, 1995. (1995)
      514–520
  34. Pathak, S., Pulina, L., Metta, G., Tacchella, A.: How to abstract intelligence?(if verification
      is in order). In: Proc. 2013 AAAI Fall Symp. How Should Intelligence Be Abstracted in AI
      Research. (2013)




                                                25