An Adversarial Attacker for Neural Networks in Regression Problems
                     Kavya Gupta1,2∗ , Beatrice Pesquet-Popescu2 , Fateh Kaakai2 ,
                         Jean-Christophe Pesquet1 and Fragkiskos D. Malliaros1
                               1
                                 Université Paris-Saclay, CentraleSupélec, Inria
                            Centre de Vision Numérique, Gif-sur-Yvette, France
                              2
                                Air Mobility Solutions BL, Thales LAS France
        kavya.gupta100@gmail.com, beatrice.pesquet@thalesgroup.com, fateh.kaakai.e@thalesdigital.io,
                     {jean-christophe.pesquet, fragkiskos.malliaros}@centralesupelec.fr

                                Abstract                            works investigating the adversaries in the case of regression
                                                                    tasks.
        Adversarial attacks against neural networks and
        their defenses have been mostly investigated in                          0             Speed
        classification scenarios. However, adversarial at-                       1       Flight Distance
        tacks in a regression setting remain understudied,                       2      Departure Delay
        although they play a critical role in a large por-                       3          Initial ETE
        tion of safety-critical applications. In this work, we                   4       Latitude Origin
        present an adversarial attacker for regression tasks,                    5      Longitude Origin               continuous
        derived from the algebraic properties of the Jaco-                       6       Altitude Origin
                                                                      Input
        bian of the network. We show that our attacker suc-                      7     Latitude Destination
                                                                                 8    Longitude Destination
        cessfully fools the neural network, and we measure
                                                                                 9     Altitude Destination
        its effectiveness in reducing the estimation perfor-                    10      Arrival Time Slot          7 slots (categorical)
        mance. We present a white-box adversarial attacker                      11     Departure Time Slot         7 slots (categorical)
        to support engineers in designing safety-critical re-                   12      Aircraft Category         6 classes (categorical)
        gression machine learning models. We present our                        13      Airline Company          19 classes (categorical)
        results on various open-source and real industrial           Output      3      Refinement ETE                  continuous
        tabular datasets. In particular, the proposed adver-
        sarial attacker outperforms attackers based on ran-         Table 1: Input and output variables description for Industrial dataset
        dom perturbations of the inputs. Our analysis relies        – A safety critical application.
        on the quantification of the fooling error as well as
        various error metrics. A noteworthy feature of our
        attacker is that it allows us to optimally attack a            Current advances in the adversarial machine learning field
        subset of inputs, which may be helpful to analyse           evolve around the issue of designing attacks and defenses
        the sensitivity of some specific inputs.                    with focus on the use of neural networks in image analysis
                                                                    and computer vision [Goodfellow et al., 2014], [Kurakin et
                                                                    al., 2016]. Much less works concern tabular data. However,
1       Introduction                                                most machine learning tasks in the industry rely on tabular
Adversarial machine learning has received an increased at-          data, e.g., fraud detection, product failure prediction, anti-
tention in the past decade. For all machine learning models,        money laundering, recommendation systems, click-through
defense against adversarial attacks is important in terms of        rate prediction, or flight arrival time prediction.
safety. Adversarial attacks in classification constitute mali-         In this paper, we focus on generating adversarial attacks
cious attempts to trick a model classifier. They play a critical    for neural networks in the specific scenario when i) a regres-
role in real-world application domains such as spam/malware         sion task is performed and ii) tabular data are employed. Our
detection, autonomous systems [Huang and Wang, 2018],               contributions are the following:
[Eykholt et al., 2018], [Ren et al., 2019], medical systems            • We propose a simple, novel and flexible method for gen-
[Finlayson et al., 2018] etc. Adversarial attacks cause vulner-           erating adversarial attacks for regression tasks (a white
ability in model deployment and specially needs to be taken               box attack).
into account in deployment of security-critical AI applica-
tions. Despite the newfound interest of the research com-              • We show that the proposed attacker allows us to opti-
munity in trustworthy and explainable AI, there are only few              mally attack on any given subset of input features.
    ∗                                                                  • We explore various error metrics which are useful for
    Contact Author                                                        analysing these adversarial attacks.
    Copyright © 2021 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International (CC       • Our proposed adversarial attacker is generalised for any
BY 4.0).                                                                  `p norm on input and output perturbations.
    • We evaluate our results on open-source regression             ensemble attack method combining multiple models to con-
      datasets and an industrial dataset (output and input fea-     struct adversarial attacks. [Rony et al., 2020] proposed a
      tures described in the Table 1) which lies in the domain      method to generate minimally perturbed adversarial examples
      of safety critical applications.                              based on Augmented Lagrangian for various distance metrics.
                                                                    In [Balda et al., 2018], authors propose a general framework
   In Section 2, we give a brief overview of existing works. In
                                                                    for generation of adversarial examples in both classification
Section 3, we formulate the problem and present our method
                                                                    and regression tasks for applications in image domain. Sim-
for generating adversarial examples in regression tasks. In
                                                                    ilar to our proposed approach, the technique is based on the
Section 4, we perform numerical experiments on four datasets
                                                                    Jacobian of the neural network. Most of the methods in the
to demonstrate the effectiveness of the proposed attacker.
                                                                    literature about adversarial example generation belong to the
Some concluding remarks are given in Section 5.
                                                                    class of white box attackers, i.e., the attacker has access to
                                                                    the information related to the trained neural network model
2    Related Work                                                   including the model architecture and its parameters. A black
In [Szegedy et al., 2013] the concept of adversarial attacks        box attacker is introduced in [Su et al., 2019]. Such attackers
was first proposed to fool DNNs. Adding a subtle pertur-            do not know the model but can interact with it. A byproduct
bation to the input of the neural network produces an incor-        of black-box attack is grey-box attack where attackers might
rect output, while human eyes cannot recognize the difference       have limited information regarding the model. To the best of
in the modification of the input data. Even though different        our knowledge the only work dealing with adversarial attacks
models have different architectures and might use different         in white box settings for tabular data has been proposed in
training data, the same kind of adversarial attack strategies       [Ballet et al., 2019] and this work handles only classification
can be used to attack related models. These attacks pose a          tasks.
huge threat to the performance of DNNs. [Szegedy et al.,               In regression tasks there are no natural margins as in the
2013] paper proposed L-BFGS to construct adversarial at-            case of classification tasks, and adversarial learning in regres-
tacks and since then there has been plethora of works in-           sion setting is hindered with difficulties to define the adver-
troducing various adversarial attacks and their defenses for        sarial attacks, its success, and evaluation metrics. Despite the
DNNs.                                                               number of works in adversarial attack generation, there are
   [Goodfellow et al., 2014] proposed a simpler and faster          few articles dealing with regression tasks.[Tong et al., 2018]
method to construct adversarial attacks (FGSM). The gen-            looked at adversarial attacks in the setting of an ensemble of
erated images are misclassified by adding perturbations and         multiple learners, investigating the interactions between these
linearizing the cost function in the gradient direction. This is    linear learners and an attacker in regression setting, modeled
a non-iterative attack, hence it has a lower computation cost       as a Multi-Learner Stackelberg Game (MLSG). However, the
than the previous method. The Fast Gradient Sign Method             investigated linear case is not able to capture the larger class
(FGSM) is an `∞ bounded attack and is often prone to label          of non-linear models. The focus only on specific applications
leaking.                                                            of regression is a common. [Ghafouri et al., 2018] exam-
   It may be difficult for FGSM to control the perturbation         ined an important problem: selecting an optimal threshold
level in constructing attacks. [Kurakin et al., 2016] proposed      for each sensor against an adversary for regression tasks in
an optimized FGSM, termed Iterative Gradient Sign Method            cyber-physical systems. [Deng et al., 2020] introduced the
(IGSM), which adds perturbations in multiple smaller steps          concept of adversarial threshold which is related to a devi-
and clips the results after each iteration ensuring that the per-   ation between the original prediction and the prediction of
turbations are restricted to the neighborhood of the exam-          adversarial example, i.e., an acceptable error range in driv-
ple. [Dong et al., 2018] added momentum to IGSM attacks.            ing models. In a regression context, [Nguyen and Raff, 2018]
[Papernot et al., 2016] proposed the Jacobian-based Saliency        introduced a defense that is generically useful to reduce the
Map Attack (JSMA), which is based on the `0 sparsity mea-           effectiveness of adversarial attacks. They consider adversar-
sure. The basic idea is to construct a saliency map with the        ial attacks as a potential symptom of numerical instability in
gradients and model the gradients based on the impact of each       the learned function. In the next section, we propose a gen-
pixel of the image.                                                 eral white-box adversarial attacker based on Jacobian of the
   [Moosavi-Dezfooli et al., 2016] proposed a non-targeted          learned function for regression tasks in tabular data domain.
attack method based on the `2 -norm, called DeepFool. It tries
to find the decision boundary that is the closest to the sample     3     Proposed Method
in the image space, and then use the classification boundary to
fool the classifier. FGSM, JSMA, and DeepFool are designed          3.1    Objective
to generate adversarial attacks corresponding to single image       The problem of adversarial attacks is closely related to the
to fool the trained classifier model. [Moosavi-Dezfooli et al.,     robustness issue for a neural network, i.e. its sensitivity to
2017] proposed a universal image-agnostic perturbation at-          perturbations. Let T : RN0 → RNm be the considered neural
tack method which fools classifier by adding a single pertur-       network having N0 scalar inputs and Nm scalar outputs. If
bation to all images in the dataset. [Carlini and Wagner, 2017]     x ∈ RN0 is a given vector of inputs for some data for which
proposed a powerful attack based on L-BFGS. The attack can          y is the associated target output, the network has been trained
be generated according to `1 , `2 , and `∞ norm which can           to produce an output T (x) close to y. If the input is now per-
be targeted or non-targeted. [Liu et al., 2016] proposed an         turbed by an additive vector e ∈ RN0 , the perturbed output
is T (x + e). Attacking the network then amounts to finding             propagation techniques. We will make a second approx-
a perturbation e of preset magnitude which makes the output             imation, that is y ' T (x). Based on these two approxi-
of the network to maximally deviate from a reference output.            mations and after the variable change e0 = δ −1 Σ−1/2 e,
This reference output may be the model output T (x) or the              Problem (1) simplifies to
ground truth output y. Since our purpose is to develop an
approach which remains efficient even if the accuracy of the                           minimize
                                                                                         0
                                                                                                kJ(x)Σ1/2 e0 kq ,               (4)
                                                                                         e ∈Bp
network is not very high, we choose y as the reference out-
put when available. In this context, the measures of deviation          where Bp is the closed `p ball centered at 0 and with
and of magnitude of the perturbation play an important role in          unit radius. Note that the optimal cost value in (4) is the
terms of mathematical formulation of the problem. As a stan-            subordinate norm of matrix J(x)Σ1/2 when the input
dard choice, the measure of perturbation magnitude will be              space is equipped with the `p norm and the output space
here an `p -norm where p ∈ [1, +∞]. For measuring the out-              with the `q one. We recall that this subordinate norm is
put deviation, we will similarly consider an `q -norm where
                                                                        defined, for every matrix M ∈ RNm ×N0 , as
q ∈ [1, +∞]. It must be emphasized that this choice makes
sense when dealing with regression problems. In this context,                                              kM zkq
the `2 or the `1 norms are indeed frequently used as loss func-                      kM kp,q =       sup          .             (5)
                                                                                                 z∈RN0 \{0} kzkp
tions for training. On the other hand, the `+∞ norm is also a
popular measure when dealing with reliability issues.                   Problem (4) is thus equivalent to find a vector eb0
3.2    Optimization formulation                                         for which the value of the cost function is equal to
                                                                        kJ(x)Σ1/2 kp,q . For values of (p, q) listed below the ex-
In the described setting, the design of the attacker can be for-        pression of such vector has an explicit form.
mulated as the problem of finding the “worst pertubation” eb
such that                                                                 – If p = q = 2, eb0 is any unit `2 norm eigenvector
                                                                            of Σ1/2 J(x)> J(x)Σ1/2 associated with the maxi-
                eb ∈ Argmax kT (x + e) − ykq ,                 (1)          mum eigenvalue of this matrix. This vector can be
                     e∈Cp,δ
                                                                            computed by perfoming a singular value decompo-
where Cp,δ is the closed and convex set defined as                          sition of J(x)Σ1/2 .
            Cp,δ = {e ∈ RN0 | kΣ−1/2 ekp ≤ δ}.                 (2)        – If p = 2 and q = +∞, eb0 is any unit `2 norm vector
                                                                            colinear with a row of J(x)Σ1/2 having maximum
Σ ∈ RN0 ×N0 is symmetric positive definite matrix. δ is a                   `2 norm.
parameter which controls the maximum allowed perturbation                 – If p = +∞ and q = +∞, eb0 is a unit norm vector
and Σ is a weighting matrix typically corresponding to the co-              whose elements are equal to ((i) )1≤i≤N0 where,
variance matrix of the inputs. For instance, if we assume that              for every i ∈ {1, . . . , N0 }, i ∈ {−1, 0, 1} is the
it is a diagonal matrix, it simply introduces a normalization
                                                                            sign of the i-th element of a row of J(x)Σ1/2 with
of the perturbation components with respect to the standard
                                                                            maximum `1 norm.
deviations of the associated inputs.
   For standard choices of activation functions, T is a continu-          – If p = 1 and q = 1, eb0 is a vector which has
ous function. By virtue of Weierstrass theorem, the existence               only one nonzero component equal to ±1, the in-
of a solution (not necessarily unique) to Problem (1) is then               dex of this component corresponds to the column
ensured. Although Cp,δ is a relatively simple convex set, this              of J(x)Σ1/2 with maximum `1 norm.
problem appears as a difficult non-convex problem due to the              – If p = 1 and q = 2, eb0 is a vector with only one
fact that i) T is a complex nonlinear operator, ii) we maxi-                nonzero component equal to ±1. The index of this
mize an `q measure which, in addition, leads to a nonsmooth                 component corresponds to a column of J(x)Σ1/2
cost function when q = 1 or q = +∞. A further difficulty is                 with maximum `2 norm.
that we usually need to attack a large dataset to evaluate the            – If p = 1 and q = +∞, eb0 is again a vector with
robustness of a network and the provided optimization algo-                 only one nonzero component equal to ±1. The in-
rithm should therefore be fast.                                             dex of this component corresponds to a column of
3.3    Algorithm                                                            J(x)Σ1/2 where is located an element of maximum
                                                                            absolute value.
We propose to implement a two-step approach.
                                                                      • Step 2. In the previous optimization step, the optimal so-
  • Step 1. We first perform a linearization based on the
    following first-order Taylor expansion:                             lution is not unique. Indeed if eb = δΣ1/2 eb0 is a solution
                                                                        to Problem (4), then −b e is also a solution. In addition,
                    T (x + e) ' T (x) + J(x)e,                 (3)      there may exist other reasons for the multiplicity of the
                        Nm ×N0                                          solutions. For example, there may be several maximum
      where J(x) ∈ R             is the Jacobian of the network
      at x1 . Note that J(x) can be computed by classical back-         norm rows for matrix J(x)Σ1/2 . Among all the possible
                                                                        choices, we propose to choose the solution eb leading to
   1                                                                    the maximum deviation w.r.t. the ground truth, that is
     We assume that J(x) is defined at x, see [Bolte and Pauwels,
2020] for a justification of this assumption in the nonsmooth case.     such that kT (x + eb) − ykq is maximum. This requires
                                                    Figure 1: Network Architecture.


      to perform a search on a small number of possible can-         4     Numerical Results
      didates. Note that no approximation error is involved in       4.1    Dataset and architecture description
      this step. If the ground truth for the output is not avail-
      able, it can be replaced by the model output.                  Open Source Datasets
                                                                     We run our experiments on three open source regression
                                                                     datasets. The Combined Cycle Power Plant [Tüfekci, 2014]
  • Post-optimization. If 1 < q < +∞ and T is assumed                dataset has 4 features with 9,568 instances. The task is to pre-
    to be differentiable, e 7→ kT (x + e) − ykqq is a dif-           dict the net hourly electrical energy output using hourly aver-
    ferentiable function. A further refinement consists of           age ambient variables. The Red Wine Quality dataset [Cortez
    minimizing this function over Cp,δ by using a projected          et al., 2009] contains 1,599 total samples and each instance
    gradient algorithm with Armijo search for the stepsize.          has 11 features. The features are physicochemical and sen-
    The previous estimates of eb can then be used to initialize      sory measurements for wine. The output variable is a quality
    the algorithm. According to our numerical tests, imple-          score ranging from 0 to 10, where 10 represents for best qual-
    menting this strategy when q = 2 only brings a marginal          ity and 0 for least quality. For the Abalone dataset, the task
    improvement. Moreover, this approach cannot be used              is to model an Abalone’s age based purely on its physical
    when q = 1 or q = +∞.                                            measurements. This would allow Abalone’s age estimation
                                                                     without cutting its shell. There are in total 4,177 instances
                                                                     with 8 input variables including one categorical variable. The
                                                                     datasets are divided with a ratio of 4:1 between training and
3.4    Attacking a group of inputs
                                                                     testing data. The categorical attributes are dealt with by us-
                                                                     ing one hot encoding based on the number of categories. The
                                                                     input attributes are normalised by removing their mean and
It can also be interesting to attack only a selected subset of
                                                                     scaling to unit variance.
inputs. It may help in identifying the more sensitive inputs
                                                                        We train fully connected networks for the estimation of
of the network. Also, for some inputs like unsorted cate-
                                                                     variables from the datasets. The network architecture for the
gorical ones, attacks are often meaningless since they intro-
                                                                     dataset are given below. The values represent the number
duce a main change in the informative contents of the dataset,
                                                                     of hidden neurons in the layers. Activation function at each
which can be easily detected. Our proposed approach can be
                                                                     layer is ReLU except for the last layer.
adapted to generate such partial attacks. In Problem (4), it is
indeed sufficient to replace matrix Σ1/2 by DΣ1/2 D, where              • Combined cycle Power Plant dataset - (10, 6, 1)
D a masking diagonal matrix whose diagonal elements are                 • Red Wine Quality dataset - (100, 100, 100, 10, 1)
equal to 1 when the input is attacked and 0 otherwise. The              • Abalone Data set - (256, 256, 256, 256, 1)
optimal solutions eb0 and eb = δDΣ1/2 Db  e0 = δDΣ1/2 eb0 have
then their components equal to 0 for the non-attacked inputs.        Industrial Dataset – safety critical Application
Note that the naive approach which would consist in solving          An industrial application dataset is also considered with
(4) and setting to zero the resulting perturbation components        2,219,097 training, 739,639 validation, and 739,891 test sam-
for non-attacked inputs would be suboptimal.                         ples. The description of the input/output variables of the
                                                                                                                                         K
                                                                                                                                     1 X
                               Mean Accuracy Error                                           MAE           =                             kT (xk + ek ) − yk kq
                                                                                                                                     K
                                                                                                                                        k=1
                                                                                                                                     K
                                                                                                                                1    X
                                      Fooling Error                                            E           =                               kT (xk + ek ) − T (xk )kq
                                                                                                                                K
                                                                                                                                     k=1
                                                                                                                          K+
                                                                                                                   2 X kT (xk + ek ) − yk kq − kT (xk ) − yk kq
           Symmetric Mean Accuracy Percentage Error                                      SMAPE             =
                                                                                                                  K+   kT (xk + ek ) − yk kq + kT (xk ) − yk kq
                                                                                                                          k=1

Table 2: Error metrics used for evaluation. The mean value computed for SMAPE is limited to the K+ positive values of the elements in the
summation. ek is the error generated by the adversarial on the k-th sample in the dataset of length K.

   Noise        MAEstd           MAEgauss        MAEuni         MAEbin         MAE adv        Egauss          Euni          Ebin         Eadv        SMAPEgauss   SMAPEuni   SMAPE bin   SMAPEadv
                                                                                      Combined Cycle Power Plant Dataset
       −1                 −3              −3             −3             −3             −3            −3            −3
 1 × 10        6.4 × 10         6.5 × 10       6.5 × 10       6.5 × 10       10.3 × 10     1.3 × 10       1.3 × 10      1.4 × 10−3    4.0 × 10  −3
                                                                                                                                                        0.33        0.34       0.36        0.62
 2 × 10−1      6.4 × 10−3       6.8 × 10−3     6.8 × 10−3     6.9 × 10−3     14.2 × 10−3 2.5 × 10−3 2.5 × 10−3 2.7 × 10−3             8.0 × 10−3        0.50        0.52       0.56        0.87
                                                                                          Red Wine Quality Dataset
 1 × 10−1          0.47            0.46           0.47           0.47            0.58          0.04           0.05         0.043         0.12           0.34        0.33       0.29        0.41
 2 × 10−1          0.47            0.47           0.48           0.48            0.66          0.09           0.09          0.09         0.21           0.49        0.51       0.48        0.56
                                                                                             Abalone age dataset
 5 × 10−2          1.68            1.68           1.68           1.68            2.04          0.02           0.02          0.03         0.36           0.05        0.04       0.05        0.38
 1 × 10−1          1.68            1.68           1.68           1.68            2.40          0.05           0.05          0.05         0.72           0.09        0.08       0.09        0.58
                                                                                              Industrial Dataset
 1 × 10−1      9.2 × 10−3        9.6 × 10−3     9.6 × 10−3     9.6 × 10−3    20.9 × 10−3 2.6 × 10−3 2.6 × 10−3 2.7 × 10−3             11.8 × 10−3       0.45        0.46       0.47        0.96
 2 × 10−1      9.2 × 10−3      . 10.7 × 10−3   10.7 × 10−3    10.7 × 10−3    32.5 × 10−3 5.1 × 10−3 5.2 × 10−3 5.4 × 10−3             24.0 × 10−3       0.65        0.66       0.67        1.24


Table 3: Comparison on evaluation metrics random attacker vs. proposed adversarial attacker with variation in perturbation level (`2 attack).

      Noise               Eadv                Espec         SMAPEadv              SMAPEspec
                             Combined Cycle Power Plant Dataset                                           ized by removing their mean and scaling to unit variance. For
  1 × 10    −1
                      4.0 × 10  −3
                                         3.9 × 10   −3
                                                                0.62                  0.60                models, we build fully connected networks with ReLU acti-
  2 × 10−1            8.0 × 10−3         7.7 × 10−3             0.87                  0.84                vation function on all the hidden layers except the last one.
                                 Red Wine Quality Dataset
  1 × 10−1                 0.12               0.017             0.41                  0.12                The network architecture is shown in the Figure 1.
  2 × 10−1                 0.21                0.03             0.56                  0.17
                                    Abalone age dataset
  5 × 10−2                 0.36                0.11             0.38                  0.12
                                                                                                          4.2       Experimental setup
  1 × 10−1                 0.72                0.23             0.58                  0.21
                                     Industrial Dataset                                                   We first train our networks without any constraints using the
  1 × 10−1            11.8 × 10 −3
                                         9.1 × 10   −3
                                                                0.91                  0.49                network architecture presented in the previous section with
  2 × 10−1            24.0 × 10−3       17.5 × 10−3             1.24                  0.72                the aim of reducing the prediction/performance loss on the
                                                                                                          train dataset. This will be refered to as a standard training
Table 4: Results of proposed adversarial techniques. Standard train-
ing vs Spectral Normalisation training on `2 attacks.
                                                                                                          procedure.
                                                                                                             To understand and analyze the performance of the pro-
                                                                                                          posed adversarial attacker, we calculate the three error met-
       Noise                  Eadv                Einp          SMAPEadv          SMAPEinp
                                 Combined Cycle Power Plant Dataset
                                                                                                          rics described in Table 2. We compare the proposed adversar-
   1 × 10     −1
                          4.0 × 10  −3
                                             3.4 × 10   −3
                                                                    0.62              0.58                ial attacker with random noise attackers generated by i.i.d.
   2 × 10−1               8.0 × 10−3         7.0 × 10−3             0.87              0.82                perturbations. We use three additive noise distributions—
                                     Red Wine Quality Dataset
   1 × 10−1                    0.12                0.13             0.41              0.44
                                                                                                          Gaussian, uniform and binary, for comparisons. The output
   2 × 10−1                    0.21                0.22             0.56              0.60                of these attackers have been normalized so as to meet the de-
                                        Abalone age dataset
                                                                                                          sired bound on the norm of the perturbation. The metrics are
   5 × 10−2                    0.36                0.36             0.38              0.38
   1 × 10−1                    0.72                0.71             0.58              0.59                computed on the test samples where K is the total number
                                         Industrial Dataset                                               of samples in the test set. The results on the 4 datasets for
   1 × 10−1               12.0 × 10 −3
                                            12.0 × 10    −3
                                                                    0.91              0.96
   2 × 10−1               24.0 × 10−3       24.0 × 10−3             1.24              1.24
                                                                                                          varying noise levels are shown in Table 3. We also show the
                                                                                                          histograms of (kT (xk +ek )−yk kq −kT (xk )−yk kq )1≤k≤K in
Table 5: Results of proposed adversarial techniques. Standard train-                                      Figures 2, 3, 4, and 5, where (ek )1≤k≤K have been generated
ing attacking all inputs vs standard training attacking few inputs on                                     from various noise distributions and the proposed adversarial
`2 attacks.                                                                                               attacker.
                                                                                                             For safety critical tasks, Lipschitz and performance targets
                                                                                                          can be specified as engineering requirements, prior to net-
dataset is given in Table 1. The variable to be predicted is the                                          work training. Such a design approach has proven to make
Estimation of Arrival time (ETE) of a flight, given variables                                             the network more stable and robust to adversarial attacks. Im-
including the distance and speed, and also an initial estimate                                            posing a Lipschitz target can be done either by controlling the
of ETE. The dataset is related to flight control, an activity                                             Lipschitz constant for each layer or for the whole network
area where safety is critical. The input attributes are normal-                                           depending on the application at hand. One such method for
  Noise      MAEstd        MAEgauss       MAEuni        MAEbin       MAE adv        Egauss                Euni           Ebin           Eadv        SMAPEgauss   SMAPEuni   . SMAPE bin   SMAPEadv
                                                                                             `2 attacks
      −1           −3             −3            −3            −3            −3               −3                  −3             −3             −3
 1 × 10     9.2 × 10       9.6 × 10     9.6 × 10      9.6 × 10      20.9 × 10     2.6 × 10        2.6 × 10            2.7 × 10       11.8 × 10         0.45        0.46        0.47         0.96
 2 × 10−1   9.2 × 10−3    10.7 × 10−3   10.7 × 10−3   10.7 × 10−3   32.5 × 10−3   5.1 × 10−3      5.2 × 10−3          5.4 × 10−3     24.0 × 10−3       0.65        0.66        0.67         1.24
                                                                                           `1 attacks
 1 × 10−1   9.2 × 10−3    9.2 × 10−3    9.2 × 10−3    9.2 × 10−3    18.5 × 10−3   8.4 × 10−4
                                                                                                  8.1 × 10−4          7.2 × 10−4      9.5 × 10−3       0.22        0.21        0.20         0.87
 2 × 10−1   9.2 × 10−3    9.4 × 10−3    9.3 × 10−3    9.3 × 10−3    28.1 × 10−3   1.7 × 10−3      1.6 × 10−3          1.4 × 10−3     20.0 × 10−3       0.35        0.34        0.32         1.15
                                                                                          `∞ attacks
 1 × 10−1   9.2 × 10−3    10.5 × 10−3   11.1 × 10−3   13.0 × 10−3   31.1 × 10−3   4.7 × 10−3
                                                                                                  6.0 × 10−3           9.9 × 10−3    22.0 × 10−3       0.63        0.71        0.87         1.22
 2 × 10−1   9.2 × 10−3    13.5 × 10−3   15.3 × 10−3   22.0 × 10−3   52.5 × 10−3   9.4 × 10−3 12.0 × 10−3              19.0 × 10−3    45.0 × 10−3       0.84        0.92        1.08         1.47


                         Table 6: Comparison on industrial dataset for `2 , `1 and `∞ attacks with variation in perturbation levels.


controlling the Lipschitz has been presented in [Serrurier et                                                • In the considered examples, we observe that categorical
al., 2020] using Hinge regularization. In the experiments, we                                                  data have little effect when attacking the trained model
train our networks while using a spectral normalisation tech-                                                  as shown in Table 5. The E and SMAPE measures do
nique [Miyato et al., 2018] which has been proven to be very                                                   not show major differences.
effective in controlling Lipschitz properties in GANs.
   Given an m layer fully connected architecture and a Lip-                                           5           Conclusion
schitz target L, we can√ constrain the spectral norm of each                                          In this article, we have introduced a novel easily imple-
layer to be less than m L. This ensures that the upper bound                                          mentable Jacobian-based adversarial attacker for estimation
on the global Lipschitz constant is less than L. We keep the                                          problems. These regression tasks cover a major portion of
network architectures exactly the same for both training pro-                                         safety critical applications. Yet there is lack of works study-
cedures. The performance of adversarial attacker on standard                                          ing and analysing adversarial attacks in this context, as op-
and spectrally normalized trained model in terms of Fooling                                           posed to classification tasks. The present study contributes
Error (E) and Symmetric Mean Accuracy Percentage error                                                to filling this gap. We have presented error metrics which
(SMAPE) for various datasets and varying perturbation mag-                                            help in analysing the effectiveness of the attacker. Our at-
nitude is given in Table 4.                                                                           tacker is versatile in the sense that it can handle any measure
   All the previous results have been obtained with attack and                                        (`1 , `2 , `∞ ) on input or output perturbations according to the
noise addition on all the input features present in the datasets.                                     target application. Our attacker is also successful in handling
As pointed in Section 3.4, the introduced adversarial attacker                                        attacks focused on subsets of inputs. This feature may be use-
is capable of attacking a group of inputs. While generating                                           ful when handling specific tabular datasets and may also be
an adversarial attack we avoid attacking the categorical input                                        insightful when information is available related to the sensi-
variables [Ballet et al., 2019], hence in Abalone and indus-                                          tivity or ability to control some inputs. Our tests concentrated
trial datasets we attack only the continuous variables. For the                                       on fully connected networks, but it is worth pointing out that
Combined Power plant dataset, we attack 3 out of 4 contin-                                            the proposed approach can be applied to any network archi-
uous variables since it does not contain any categorical vari-                                        tecture.
ables. Similarly, for the Red-wine dataset we attack 8 contin-
uous variables out of 11. The performance of the adversarial
attacker, when attacking only few inputs, is shown in Table 5.                                        References
   As emphasized in Section 3.3, our adversarial attacker is                                          [Balda et al., 2018] Emilio Rafael Balda, Arash Behboodi,
applicable for various measures of perturbation on input and                                            and Rudolf Mathar. Perturbation analysis of learning al-
output deviations. The previous results have been obtained                                              gorithms: A unifying perspective on generation of adver-
for the value p = q = 2 termed as `2 attacks here. We further                                           sarial examples. arXiv preprint arXiv:1812.07385, 2018.
show results for p = q = 1 termed as `1 attacks and for                                               [Ballet et al., 2019] Vincent Ballet, Xavier Renard, Jonathan
p = q = +∞ termed as `∞ attacks in Table 6.                                                             Aigrain, Thibault Laugel, Pascal Frossard, and Marcin De-
4.3       Result analysis                                                                               tyniecki. Imperceptible adversarial attacks on tabular data.
                                                                                                        arXiv preprint arXiv:1911.03274, 2019.
Some general conclusions can be drawn from the experi-
ments.                                                                                                [Bolte and Pauwels, 2020] Jérôme Bolte and Edouard
                                                                                                        Pauwels. Conservative set valued fields, automatic differ-
  • We observe that the proposed adversarial attacker per-                                              entiation, stochastic gradient methods and deep learning.
    forms better than all the three random noise attackers                                              Mathematical Programming, pages 1–33, 2020.
    for the three quantitative measures we have defined. In
    addition, the histograms in Figures 2, 3, 4, and 5 show                                           [Carlini and Wagner, 2017] Nicholas Carlini and David
    that the error may be increased or reduced by random                                                Wagner. Towards evaluating the robustness of neural
    attackers, while this shortcoming does not happen with                                              networks. In 2017 ieee symposium on security and privacy
    our adversarial attacker. This observation is verified on                                           (sp), pages 39–57. IEEE, 2017.
    the norms - `2 , `1 and `∞ norms in Table 6.                                                      [Cortez et al., 2009] Paulo Cortez, António Cerdeira, Fer-
  • Spectral normalisation has been proven to robustify the                                             nando Almeida, Telmo Matos, and José Reis. Model-
    trained models. As in Table 4, we see that the Fooling                                              ing wine preferences by data mining from physicochem-
    error (E) and SMAPE are reduced in all the cases when                                               ical properties. Decision support systems, 47(4):547–553,
    compared to the standard trained model.                                                             2009.
                                                              Gaussian                                                                       Uniform                                                                         Binary                                                                    Adversarial
                                                                                                                                                                                       160
                         160
                                                                                                                                                                                                                                                                      175
                                                                                                        140
                                                                                                                                                                                       140
                         140
                                                                                                                                                                                                                                                                      150
                                                                                                        120
                                                                                                                                                                                       120
                         120
                                                                                                                                                                                                                                                                      125
                                                                                                        100
                                                                                                                                                                                       100
                         100

                                                                                                                                                                                                                                                                      100
                                                                                                             80
                 Count


                                                                                                Count


                                                                                                                                                                               Count


                                                                                                                                                                                                                                                              Count
                                                                                                                                                                                            80
                              80

                                                                                                                                                                                                                                                                           75
                                                                                                             60                                                                             60
                              60


                                                                                                             40                                                                                                                                                            50
                              40                                                                                                                                                            40


                              20                                                                             20                                                                             20                                                                             25


                              0                                                                              0                                                                              0                                                                              0
                                          0.0050.000 0.005 0.010 0.015 0.020 0.025                                       0.0050.000 0.005 0.010 0.015 0.020 0.025                                       0.0050.000 0.005 0.010 0.015 0.020 0.025                                       0.0050.000 0.005 0.010 0.015 0.020 0.025
                                                         Error                                                                          Error                                                                          Error                                                                          Error


Figure 2: Error distribution of random attacks and proposed adversarial attack on Combined cycle Power Plant dataset for perturbation level
of 2 × 10−1 for `2 attack.
                                                              Gaussian                                                                       Uniform                                                                         Binary                                                                    Adversarial
                                                                                                             70
                                                                                                                                                                                            70                                                                             80

                              60
                                                                                                             60                                                                                                                                                            70
                                                                                                                                                                                            60

                              50                                                                                                                                                                                                                                           60
                                                                                                             50
                                                                                                                                                                                            50

                                                                                                                                                                                                                                                                           50
                              40
                                                                                                             40
                                                                                                                                                                                            40
                      Count


                                                                                                     Count


                                                                                                                                                                                    Count


                                                                                                                                                                                                                                                                   Count
                                                                                                                                                                                                                                                                           40
                              30                                                                             30
                                                                                                                                                                                            30
                                                                                                                                                                                                                                                                           30

                              20                                                                             20                                                                             20
                                                                                                                                                                                                                                                                           20


                              10                                                                             10                                                                             10
                                                                                                                                                                                                                                                                           10


                              0                                                                              0                                                                              0                                                                              0
                                   0.5            0.0            0.5            1.0           1.5                 0.5            0.0            0.5            1.0           1.5                 0.5            0.0            0.5            1.0           1.5                 0.5            0.0            0.5            1.0          1.5
                                                                Error                                                                          Error                                                                          Error                                                                          Error


Figure 3: Error distribution of random attacks and proposed adversarial attack on Red-wine dataset for perturbation level of 2 × 10−1 for `2
attack.
                                                              Gaussian                                                                       Uniform                                                                         Binary                                                                    Adversarial


                                                                                                        200                                                                            250
                                                                                                                                                                                                                                                                      300
                         200


                                                                                                                                                                                                                                                                      250
                                                                                                                                                                                       200
                                                                                                        150
                         150
                                                                                                                                                                                                                                                                      200
                                                                                                                                                                                       150
                 Count


                                                                                                Count


                                                                                                                                                                               Count


                                                                                                                                                                                                                                                              Count


                                                                                                        100                                                                                                                                                           150
                         100
                                                                                                                                                                                       100

                                                                                                                                                                                                                                                                      100

                                                                                                             50
                              50
                                                                                                                                                                                            50
                                                                                                                                                                                                                                                                           50


                              0                                                                              0                                                                              0                                                                              0
                                   0.2      0.0         0.2      0.4      0.6         0.8     1.0                 0.2      0.0         0.2      0.4      0.6         0.8     1.0                 0.2      0.0         0.2      0.4      0.6         0.8     1.0                 0.2      0.0         0.2      0.4      0.6         0.8    1.0
                                                                Error                                                                          Error                                                                          Error                                                                          Error


Figure 4: Error distribution of random attacks and proposed adversarial attack on Abalone dataset for perturbation level of 1 × 10−1 for `2
attack.
                                                              Gaussian                                                                       Uniform                                                                         Binary                                                                    Adversarial

                                                                                                     70000                                                                          70000
                      70000
                                                                                                                                                                                                                                                                   80000
                                                                                                     60000                                                                          60000
                      60000


                                                                                                     50000                                                                          50000
                      50000                                                                                                                                                                                                                                        60000


                                                                                                     40000                                                                          40000
                      40000
              Count


                                                                                             Count


                                                                                                                                                                            Count


                                                                                                                                                                                                                                                           Count


                                                                                                                                                                                                                                                                   40000
                                                                                                     30000                                                                          30000
                      30000


                      20000                                                                          20000                                                                          20000
                                                                                                                                                                                                                                                                   20000

                      10000                                                                          10000                                                                          10000


                              0                                                                              0                                                                              0                                                                              0
                                   0.02    0.01 0.00          0.01 0.02    0.03       0.04    0.05                0.02    0.01 0.00          0.01 0.02    0.03       0.04    0.05                0.02    0.01 0.00          0.01 0.02    0.03       0.04    0.05                0.02    0.01 0.00          0.01 0.02    0.03       0.04   0.05
                                                                 Error                                                                          Error                                                                          Error                                                                          Error


Figure 5: Error distribution of random attacks and proposed adversarial attack on industrial dataset for perturbation level of 2 × 10−1 for `2
attack.
[Deng et al., 2020] Yao Deng, Xi Zheng, Tianyi Zhang,            [Nguyen and Raff, 2018] Andre T Nguyen and Edward Raff.
   Chen Chen, Guannan Lou, and Miryung Kim. An analysis             Adversarial attacks, regression, and numerical stability
   of adversarial attacks and defenses on autonomous driving        regularization. arXiv preprint arXiv:1812.02885, 2018.
   models. In 2020 IEEE International Conference on Per-         [Papernot et al., 2016] Nicolas Papernot, Patrick McDaniel,
   vasive Computing and Communications (PerCom), pages              Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Anan-
   1–10. IEEE, 2020.                                                thram Swami. The limitations of deep learning in adversar-
[Dong et al., 2018] Yinpeng Dong, Fangzhou Liao, Tianyu             ial settings. In 2016 IEEE European symposium on secu-
   Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li.              rity and privacy (EuroS&P), pages 372–387. IEEE, 2016.
   Boosting adversarial attacks with momentum. In Proceed-       [Ren et al., 2019] Kui Ren, Qian Wang, Cong Wang, Zhan
   ings of the IEEE conference on computer vision and pat-          Qin, and Xiaodong Lin. The security of autonomous driv-
   tern recognition, pages 9185–9193, 2018.                         ing: Threats, defenses, and future directions. Proceedings
[Eykholt et al., 2018] Kevin Eykholt, Ivan Evtimov, Ear-            of the IEEE, 108(2):357–372, 2019.
   lence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao,           [Rony et al., 2020] Jérôme Rony, Eric Granger, Marco Ped-
   Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust
                                                                    ersoli, and Ismail Ben Ayed. Augmented lagrangian ad-
   physical-world attacks on deep learning visual classifica-
                                                                    versarial attacks. arXiv preprint arXiv:2011.11857, 2020.
   tion. In Proceedings of the IEEE Conference on Computer
   Vision and Pattern Recognition, pages 1625–1634, 2018.        [Serrurier et al., 2020] Mathieu Serrurier, Franck Mamalet,
[Finlayson et al., 2018] Samuel G Finlayson, Hyung Won              Alberto González-Sanz, Thibaut Boissin, Jean-Michel
   Chung, Isaac S Kohane, and Andrew L Beam. Adversar-              Loubes, and Eustasio del Barrio. Achieving robustness
   ial attacks against medical deep learning systems. arXiv         in classification using optimal transport with hinge regu-
   preprint arXiv:1804.05296, 2018.                                 larization, 2020.
[Ghafouri et al., 2018] Amin Ghafouri, Yevgeniy Vorobey-         [Su et al., 2019] Jiawei Su, Danilo Vasconcellos Vargas, and
   chik, and Xenofon Koutsoukos. Adversarial regression for         Kouichi Sakurai. One pixel attack for fooling deep neural
   detecting attacks in cyber-physical systems. arXiv preprint      networks. IEEE Transactions on Evolutionary Computa-
   arXiv:1804.11022, 2018.                                          tion, 23(5):828–841, 2019.
[Goodfellow et al., 2014] Ian J Goodfellow, Jonathon             [Szegedy et al., 2013] Christian      Szegedy,       Wojciech
   Shlens, and Christian Szegedy. Explaining and harnessing         Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian
   adversarial examples. arXiv preprint arXiv:1412.6572,            Goodfellow, and Rob Fergus. Intriguing properties of
   2014.                                                            neural networks. arXiv preprint arXiv:1312.6199, 2013.
[Huang and Wang, 2018] Yonghong Huang and Shih-han               [Tong et al., 2018] Liang Tong, Sixie Yu, Scott Alfeld, et al.
   Wang. Adversarial manipulation of reinforcement learn-           Adversarial regression with multiple learners. In Inter-
   ing policies in autonomous agents. In 2018 International         national Conference on Machine Learning, pages 4946–
   Joint Conference on Neural Networks (IJCNN), pages 1–8.          4954. PMLR, 2018.
   IEEE, 2018.                                                   [Tüfekci, 2014] Pınar Tüfekci. Prediction of full load elec-
[Kurakin et al., 2016] Alexey Kurakin, Ian Goodfellow,              trical power output of a base load operated combined cy-
   Samy Bengio, et al. Adversarial examples in the physi-           cle power plant using machine learning methods. Inter-
   cal world, 2016.                                                 national Journal of Electrical Power and Energy Systems,
[Liu et al., 2016] Yanpei Liu, Xinyun Chen, Chang Liu,              60:126 – 140, 2014.
   and Dawn Song. Delving into transferable adversar-
   ial examples and black-box attacks.         arXiv preprint
   arXiv:1611.02770, 2016.
[Miyato et al., 2018] Takeru Miyato, Toshiki Kataoka,
   Masanori Koyama, and Yuichi Yoshida. Spectral normal-
   ization for generative adversarial networks. arXiv preprint
   arXiv:1802.05957, 2018.
[Moosavi-Dezfooli et al., 2016] Seyed-Mohsen Moosavi-
   Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep-
   fool: a simple and accurate method to fool deep neural
   networks. In Proceedings of the IEEE conference on com-
   puter vision and pattern recognition, pages 2574–2582,
   2016.
[Moosavi-Dezfooli et al., 2017] Seyed-Mohsen Moosavi-
   Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal
   Frossard. Universal adversarial perturbations. In Pro-
   ceedings of the IEEE conference on computer vision and
   pattern recognition, pages 1765–1773, 2017.