An Adversarial Attacker for Neural Networks in Regression Problems

An Adversarial Attacker for Neural Networks in Regression Problems KavyaGupta kavya.gupta100@gmail.com CentraleSupélec Université Paris-Saclay Inria Centre de Vision Numérique

Gif-sur-Yvette France

Air Mobility Solutions BL Thales LAS France BeatricePesquet-Popescu beatrice.pesquet@thalesgroup.com Air Mobility Solutions BL Thales LAS France FatehKaakai fateh.kaakai.e@thalesdigital.io Air Mobility Solutions BL Thales LAS France Jean-ChristophePesquet jean-christophe.pesquet@centralesupelec.fr CentraleSupélec Université Paris-Saclay Inria Centre de Vision Numérique

Gif-sur-Yvette France

FragkiskosDMalliaros fragkiskos.malliaros@centralesupelec.fr CentraleSupélec Université Paris-Saclay Inria Centre de Vision Numérique

Gif-sur-Yvette France

An Adversarial Attacker for Neural Networks in Regression Problems C6EBAF79FE1C608F3888A4D2457EE333 GROBID - A machine learning software for extracting information from scholarly documents

Adversarial attacks against neural networks and their defenses have been mostly investigated in classification scenarios. However, adversarial attacks in a regression setting remain understudied, although they play a critical role in a large portion of safety-critical applications. In this work, we present an adversarial attacker for regression tasks, derived from the algebraic properties of the Jacobian of the network. We show that our attacker successfully fools the neural network, and we measure its effectiveness in reducing the estimation performance. We present a white-box adversarial attacker to support engineers in designing safety-critical regression machine learning models. We present our results on various open-source and real industrial tabular datasets. In particular, the proposed adversarial attacker outperforms attackers based on random perturbations of the inputs. Our analysis relies on the quantification of the fooling error as well as various error metrics. A noteworthy feature of our attacker is that it allows us to optimally attack a subset of inputs, which may be helpful to analyse the sensitivity of some specific inputs.

Introduction

Adversarial machine learning has received an increased attention in the past decade. For all machine learning models, defense against adversarial attacks is important in terms of safety. Adversarial attacks in classification constitute malicious attempts to trick a model classifier. They play a critical role in real-world application domains such as spam/malware detection, autonomous systems [Huang and Wang, 2018], [Eykholt et al., 2018], [Ren et al., 2019], medical systems [Finlayson et al., 2018] etc. Adversarial attacks cause vulnerability in model deployment and specially needs to be taken into account in deployment of security-critical AI applications. Despite the newfound interest of the research community in trustworthy and explainable AI, there are only few works investigating the adversaries in the case of regression tasks. Current advances in the adversarial machine learning field evolve around the issue of designing attacks and defenses with focus on the use of neural networks in image analysis and computer vision [Goodfellow et al., 2014], [Kurakin et al., 2016]. Much less works concern tabular data. However, most machine learning tasks in the industry rely on tabular data, e.g., fraud detection, product failure prediction, antimoney laundering, recommendation systems, click-through rate prediction, or flight arrival time prediction.

In this paper, we focus on generating adversarial attacks for neural networks in the specific scenario when i) a regression task is performed and ii) tabular data are employed. Our contributions are the following:

• We propose a simple, novel and flexible method for generating adversarial attacks for regression tasks (a white box attack). • We show that the proposed attacker allows us to optimally attack on any given subset of input features. • We explore various error metrics which are useful for analysing these adversarial attacks. • Our proposed adversarial attacker is generalised for any p norm on input and output perturbations.

• We evaluate our results on open-source regression datasets and an industrial dataset (output and input features described in the Table 1) which lies in the domain of safety critical applications.

In Section 2, we give a brief overview of existing works. In Section 3, we formulate the problem and present our method for generating adversarial examples in regression tasks. In Section 4, we perform numerical experiments on four datasets to demonstrate the effectiveness of the proposed attacker. Some concluding remarks are given in Section 5.

Related Work

In [Szegedy et al., 2013] the concept of adversarial attacks was first proposed to fool DNNs. Adding a subtle perturbation to the input of the neural network produces an incorrect output, while human eyes cannot recognize the difference in the modification of the input data. Even though different models have different architectures and might use different training data, the same kind of adversarial attack strategies can be used to attack related models. These attacks pose a huge threat to the performance of DNNs. [Szegedy et al., 2013] paper proposed L-BFGS to construct adversarial attacks and since then there has been plethora of works introducing various adversarial attacks and their defenses for DNNs.

[ Goodfellow et al., 2014] proposed a simpler and faster method to construct adversarial attacks (FGSM). The generated images are misclassified by adding perturbations and linearizing the cost function in the gradient direction. This is a non-iterative attack, hence it has a lower computation cost than the previous method. The Fast Gradient Sign Method (FGSM) is an ∞ bounded attack and is often prone to label leaking.

It may be difficult for FGSM to control the perturbation level in constructing attacks. [Kurakin et al., 2016] proposed an optimized FGSM, termed Iterative Gradient Sign Method (IGSM), which adds perturbations in multiple smaller steps and clips the results after each iteration ensuring that the perturbations are restricted to the neighborhood of the example. [Dong et al., 2018] added momentum to IGSM attacks. [Papernot et al., 2016] proposed the Jacobian-based Saliency Map Attack (JSMA), which is based on the 0 sparsity measure. The basic idea is to construct a saliency map with the gradients and model the gradients based on the impact of each pixel of the image.

[ Moosavi-Dezfooli et al., 2016] proposed a non-targeted attack method based on the 2 -norm, called DeepFool. It tries to find the decision boundary that is the closest to the sample in the image space, and then use the classification boundary to fool the classifier. FGSM, JSMA, and DeepFool are designed to generate adversarial attacks corresponding to single image to fool the trained classifier model. [Moosavi-Dezfooli et al., 2017] proposed a universal image-agnostic perturbation attack method which fools classifier by adding a single perturbation to all images in the dataset. [Carlini and Wagner, 2017] proposed a powerful attack based on L-BFGS. The attack can be generated according to 1 , 2 , and ∞ norm which can be targeted or non-targeted. [Liu et al., 2016] proposed an ensemble attack method combining multiple models to construct adversarial attacks. [Rony et al., 2020] proposed a method to generate minimally perturbed adversarial examples based on Augmented Lagrangian for various distance metrics. In [Balda et al., 2018], authors propose a general framework for generation of adversarial examples in both classification and regression tasks for applications in image domain. Similar to our proposed approach, the technique is based on the Jacobian of the neural network. Most of the methods in the literature about adversarial example generation belong to the class of white box attackers, i.e., the attacker has access to the information related to the trained neural network model including the model architecture and its parameters. A black box attacker is introduced in [Su et al., 2019]. Such attackers do not know the model but can interact with it. A byproduct of black-box attack is grey-box attack where attackers might have limited information regarding the model. To the best of our knowledge the only work dealing with adversarial attacks in white box settings for tabular data has been proposed in [Ballet et al., 2019] and this work handles only classification tasks.

In regression tasks there are no natural margins as in the case of classification tasks, and adversarial learning in regression setting is hindered with difficulties to define the adversarial attacks, its success, and evaluation metrics. Despite the number of works in adversarial attack generation, there are few articles dealing with regression tasks. [Tong et al., 2018] looked at adversarial attacks in the setting of an ensemble of multiple learners, investigating the interactions between these linear learners and an attacker in regression setting, modeled as a Multi-Learner Stackelberg Game (MLSG). However, the investigated linear case is not able to capture the larger class of non-linear models. The focus only on specific applications of regression is a common. [Ghafouri et al., 2018] examined an important problem: selecting an optimal threshold for each sensor against an adversary for regression tasks in cyber-physical systems. [Deng et al., 2020] introduced the concept of adversarial threshold which is related to a deviation between the original prediction and the prediction of adversarial example, i.e., an acceptable error range in driving models. In a regression context, [Nguyen and Raff, 2018] introduced a defense that is generically useful to reduce the effectiveness of adversarial attacks. They consider adversarial attacks as a potential symptom of numerical instability in the learned function. In the next section, we propose a general white-box adversarial attacker based on Jacobian of the learned function for regression tasks in tabular data domain.

3 Proposed Method

Objective

The problem of adversarial attacks is closely related to the robustness issue for a neural network, i.e. its sensitivity to perturbations. Let T : R N0 → R Nm be the considered neural network having N 0 scalar inputs and N m scalar outputs. If x ∈ R N0 is a given vector of inputs for some data for which y is the associated target output, the network has been trained to produce an output T (x) close to y. If the input is now perturbed by an additive vector e ∈ R N0 , the perturbed output is T (x + e). Attacking the network then amounts to finding a perturbation e of preset magnitude which makes the output of the network to maximally deviate from a reference output. This reference output may be the model output T (x) or the ground truth output y. Since our purpose is to develop an approach which remains efficient even if the accuracy of the network is not very high, we choose y as the reference output when available. In this context, the measures of deviation and of magnitude of the perturbation play an important role in terms of mathematical formulation of the problem. As a standard choice, the measure of perturbation magnitude will be here an p -norm where p ∈ [1, +∞]. For measuring the output deviation, we will similarly consider an q -norm where q ∈ [1, +∞]. It must be emphasized that this choice makes sense when dealing with regression problems. In this context, the 2 or the 1 norms are indeed frequently used as loss functions for training. On the other hand, the +∞ norm is also a popular measure when dealing with reliability issues.

Optimization formulation

In the described setting, the design of the attacker can be formulated as the problem of finding the "worst pertubation" e such that e ∈ Argmax e∈C p,δ

T (x + e) − y q ,(1)

where C p,δ is the closed and convex set defined as

C p,δ = {e ∈ R N0 | Σ −1/2 e p ≤ δ}.(2)

Σ ∈ R N0×N0 is symmetric positive definite matrix. δ is a parameter which controls the maximum allowed perturbation and Σ is a weighting matrix typically corresponding to the covariance matrix of the inputs. For instance, if we assume that it is a diagonal matrix, it simply introduces a normalization of the perturbation components with respect to the standard deviations of the associated inputs.

For standard choices of activation functions, T is a continuous function. By virtue of Weierstrass theorem, the existence of a solution (not necessarily unique) to Problem (1) is then ensured. Although C p,δ is a relatively simple convex set, this problem appears as a difficult non-convex problem due to the fact that i) T is a complex nonlinear operator, ii) we maximize an q measure which, in addition, leads to a nonsmooth cost function when q = 1 or q = +∞. A further difficulty is that we usually need to attack a large dataset to evaluate the robustness of a network and the provided optimization algorithm should therefore be fast.

Algorithm

We propose to implement a two-step approach.

• Step 1. We first perform a linearization based on the following first-order Taylor expansion:

T (x + e) T (x) + J(x)e,(3)

where J(x) ∈ R Nm×N0 is the Jacobian of the network at x1 . Note that J(x) can be computed by classical back-propagation techniques. We will make a second approximation, that is y T (x). Based on these two approximations and after the variable change e = δ −1 Σ −1/2 e, Problem (1) simplifies to minimize e ∈Bp J(x)Σ 1/2 e q , (4)

where B p is the closed p ball centered at 0 and with unit radius. Note that the optimal cost value in (4) is the subordinate norm of matrix J(x)Σ 1/2 when the input space is equipped with the p norm and the output space with the q one. We recall that this subordinate norm is defined, for every matrix M ∈ R Nm×N0 , as

M p,q = sup z∈R N 0 \{0} M z q z p .(5)

Problem ( 4) is thus equivalent to find a vector e for which the value of the cost function is equal to J(x)Σ 1/2 p,q . For values of (p, q) listed below the expression of such vector has an explicit form.

-If p = q = 2, e is any unit 2 norm eigenvector of Σ 1/2 J(x) J(x)Σ 1/2 associated with the maximum eigenvalue of this matrix. This vector can be computed by perfoming a singular value decomposition of J(x)Σ 1/2 . -If p = 2 and q = +∞, e is any unit 2 norm vector colinear with a row of J(x)Σ 1/2 having maximum 2 norm. -If p = +∞ and q = +∞, e is a unit norm vector whose elements are equal to ( (i) ) 1≤i≤N0 where, for every i ∈ {1, . . . , N 0 }, i ∈ {−1, 0, 1} is the sign of the i-th element of a row of J(x)Σ 1/2 with maximum 1 norm.

-If p = 1 and q = 1, e is a vector which has only one nonzero component equal to ±1, the index of this component corresponds to the column of J(x)Σ 1/2 with maximum 1 norm. -If p = 1 and q = 2, e is a vector with only one nonzero component equal to ±1. The index of this component corresponds to a column of J(x)Σ 1/2 with maximum 2 norm. -If p = 1 and q = +∞, e is again a vector with only one nonzero component equal to ±1. The index of this component corresponds to a column of J(x)Σ 1/2 where is located an element of maximum absolute value.

•

Step 2. In the previous optimization step, the optimal solution is not unique. Indeed if e = δΣ 1/2 e is a solution to Problem (4), then − e is also a solution. In addition, there may exist other reasons for the multiplicity of the solutions. For example, there may be several maximum norm rows for matrix J(x)Σ 1/2 . Among all the possible choices, we propose to choose the solution e leading to the maximum deviation w.r.t. the ground truth, that is such that T (x + e) − y q is maximum. This requires to perform a search on a small number of possible candidates. Note that no approximation error is involved in this step. If the ground truth for the output is not available, it can be replaced by the model output.

• Post-optimization. If 1 < q < +∞ and T is assumed to be differentiable, e → T (x + e) − y q q is a differentiable function. A further refinement consists of minimizing this function over C p,δ by using a projected gradient algorithm with Armijo search for the stepsize. The previous estimates of e can then be used to initialize the algorithm. According to our numerical tests, implementing this strategy when q = 2 only brings a marginal improvement. Moreover, this approach cannot be used when q = 1 or q = +∞.

Attacking a group of inputs

It can also be interesting to attack only a selected subset of inputs. It may help in identifying the more sensitive inputs of the network. Also, for some inputs like unsorted categorical ones, attacks are often meaningless since they introduce a main change in the informative contents of the dataset, which can be easily detected. Our proposed approach can be adapted to generate such partial attacks. In Problem (4), it is indeed sufficient to replace matrix Σ 1/2 by DΣ 1/2 D, where D a masking diagonal matrix whose diagonal elements are equal to 1 when the input is attacked and 0 otherwise. The optimal solutions e and e = δDΣ 1/2 D e = δDΣ 1/2 e have then their components equal to 0 for the non-attacked inputs. Note that the naive approach which would consist in solving (4) and setting to zero the resulting perturbation components for non-attacked inputs would be suboptimal.

Numerical Results

Dataset and architecture description

Open Source Datasets We run our experiments on three open source regression datasets. The Combined Cycle Power Plant [Tüfekci, 2014] dataset has 4 features with 9,568 instances. The task is to predict the net hourly electrical energy output using hourly average ambient variables. The Red Wine Quality dataset [Cortez et al., 2009] contains 1,599 total samples and each instance has 11 features. The features are physicochemical and sensory measurements for wine. The output variable is a quality score ranging from 0 to 10, where 10 represents for best quality and 0 for least quality. For the Abalone dataset, the task is to model an Abalone's age based purely on its physical measurements. This would allow Abalone's age estimation without cutting its shell. There are in total 4,177 instances with 8 input variables including one categorical variable. The datasets are divided with a ratio of 4:1 between training and testing data. The categorical attributes are dealt with by using one hot encoding based on the number of categories. The input attributes are normalised by removing their mean and scaling to unit variance.

We train fully connected networks for the estimation of variables from the datasets. The network architecture for the dataset are given below. The values represent the number of hidden neurons in the layers. Activation function at each layer is ReLU except for the last layer.

• Combined cycle Power Plant dataset -(10, 6, 1) 6.5 × 10 −3 6.5 × 10 −3 6.5 × 10 −3 10.3 × 10 −3 1.3 × 10 −3 1.3 × 10 −3 1.4 × 10 −3 4.0 × 10 −3 0.33 0.34 0.36 0.62 2 × 10 −1 6.4 × 10 −3 6.8 × 10 −3 6.8 × 10 −3 6.9 × 10 −3 14.2 × 10 −3 2.5 × 10 −3 2.5 × 10 −3 2.7 × 10 −3 8.0 9.6 × 10 −3 9.6 × 10 −3 9.6 × 10 −3 20.9 × 10 −3 2.6 × 10 −3 2.6 × 10 −3 2.7 × 10 −3 11.8 × 10 −3 0.45 0.46 0.47 0.96 2 × 10 −1 9.2 × 10 −3 . 10.7 × 10 −3 10.7 × 10 −3 10.7 × 10 −3 32.5 × 10 −3 5.1 × 10 −3 5.2 × 10 −3 5.4 × 10 −3 24.0 × 10 −3 0.65 0.66 0.67 1.24 12.0 × 10 −3 12.0 × 10 −3 0.91 0.96 2 × 10 −1 24.0 × 10 −3 24.0 × 10 −3 1.24 1.24 dataset is given in Table 1. The variable to be predicted is the Estimation of Arrival time (ETE) of a flight, given variables including the distance and speed, and also an initial estimate of ETE. The dataset is related to flight control, an activity area where safety is critical. The input attributes are normal-ized by removing their mean and scaling to unit variance. For models, we build fully connected networks with ReLU activation function on all the hidden layers except the last one. The network architecture is shown in the Figure 1.

= 1 K K k=1 T (x k + e k ) − y k q Fooling Error E = 1 K K k=1 T (x k + e k ) − T (x k ) q Symmetric Mean Accuracy Percentage Error SMAPE = 2 K + K+ k=1 T (x k + e k ) − y k q − T (x k ) − y k q T (x k + e k ) − y k q + T (x k ) − y k q×

Experimental setup

We first train our networks without any constraints using the network architecture presented in the previous section with the aim of reducing the prediction/performance loss on the train dataset. This will be refered to as a standard training procedure.

To understand and analyze the performance of the proposed adversarial attacker, we calculate the three error metrics described in Table 2. We compare the proposed adversarial attacker with random noise attackers generated by i.i.d. perturbations. We use three additive noise distributions-Gaussian, uniform and binary, for comparisons. The output of these attackers have been normalized so as to meet the desired bound on the norm of the perturbation. The metrics are computed on the test samples where K is the total number of samples in the test set. The results on the 4 datasets for varying noise levels are shown in Table 3. We also show the histograms of ( T (x k +e k )−y k q − T (x k )−y k q ) 1≤k≤K in Figures 2, 3, 4, and 5, where (e k ) 1≤k≤K have been generated from various noise distributions and the proposed adversarial attacker.

For safety critical tasks, Lipschitz and performance targets can be specified as engineering requirements, prior to network training. Such a design approach has proven to make the network more stable and robust to adversarial attacks. Imposing a Lipschitz target can be done either by controlling the Lipschitz constant for each layer or for the whole network depending on the application at hand. One such method for 2 attacks 1 × 10 −1 9.2 × 10 −3 9.6 × 10 −3 9.6 × 10 −3 9.6 × 10 −3 20.9 × 10 −3 2.6 × 10 −3 2.6 × 10 −3 2.7 × 10 −3 11.8 × 10 −3 0.45 0.46 0.47 0.96 2 × 10 −1 9.2 × 10 −3 10.7 × 10 −3 10.7 × 10 −3 10.7 × 10 −3 32.5 × 10 −3 5.1 × 10 −3 5.2 × 10 −3 5.4 × 10 −3 24.0 × 10 −3 0.65 0.66 0.67 1.24

1 attacks 1 × 10 −1 9.2 × 10 −3 9.2 × 10 −3 9.2 × 10 −3 9.2 × 10 −3 18.5 × 10 −3 8.4 × 10 −4 8.1 × 10 −4 7.2 × 10 −4 9.5 × 10 −3 0.22 0.21 0.20 0.87 2 × 10 −1 9.2 × 10 −3 9.4 × 10 −3 9.3 × 10 −3 9.3 × 10 −3 28.1 × 10 −3 1.7 × 10 −3 1.6 × 10 −3 1.4 × 10 −3 20.0 × 10 −3 0.35 0.34 0.32 1.15

∞ attacks 1 × 10 −1 9.2 × 10 −3 10.5 × 10 −3 11.1 × 10 −3 13.0 × 10 −3 31.1 × 10 −3 4.7 × 10 −3 6.0 × 10 −3 9.9 × 10 −3 22.0 × 10 −3 0.63 0.71 0.87 1.22 2 × 10 −1 9.2 × 10 −3 13.5 × 10 −3 15.3 × 10 −3 22.0 × 10 −3 52.5 × 10 −3 9.4 × 10 −3 12.0 × 10 −3 19.0 × 10 −3 45.0 × 10 −3 0.84 0.92 1.08 1.47

Table 6: Comparison on industrial dataset for 2, 1 and ∞ attacks with variation in perturbation levels.

controlling the Lipschitz has been presented in [Serrurier et al., 2020] using Hinge regularization. In the experiments, we train our networks while using a spectral normalisation technique [Miyato et al., 2018] which has been proven to be very effective in controlling Lipschitz properties in GANs.

Given an m layer fully connected architecture and a Lipschitz target L, we can constrain the spectral norm of each layer to be less than m √ L. This ensures that the upper bound on the global Lipschitz constant is less than L. We keep the network architectures exactly the same for both training procedures. The performance of adversarial attacker on standard and spectrally normalized trained model in terms of Fooling Error (E) and Symmetric Mean Accuracy Percentage error (SMAPE) for various datasets and varying perturbation magnitude is given in Table 4.

All the previous results have been obtained with attack and noise addition on all the input features present in the datasets. As pointed in Section 3.4, the introduced adversarial attacker is capable of attacking a group of inputs. While generating an adversarial attack we avoid attacking the categorical input variables [Ballet et al., 2019], hence in Abalone and industrial datasets we attack only the continuous variables. For the Combined Power plant dataset, we attack 3 out of 4 continuous variables since it does not contain any categorical variables. Similarly, for the Red-wine dataset we attack 8 continuous variables out of 11. The performance of the adversarial attacker, when attacking only few inputs, is shown in Table 5.

As emphasized in Section 3.3, our adversarial attacker is applicable for various measures of perturbation on input and output deviations. The previous results have been obtained for the value p = q = 2 termed as 2 attacks here. We further show results for p = q = 1 termed as 1 attacks and for p = q = +∞ termed as ∞ attacks in Table 6.

Result analysis

Some general conclusions can be drawn from the experiments.

• We observe that the proposed adversarial attacker performs better than all the three random noise attackers for the three quantitative measures we have defined. In addition, the histograms in Figures 2, 3, 4, and 5 show that the error may be increased or reduced by random attackers, while this shortcoming does not happen with our adversarial attacker. This observation is verified on the norms -2 , 1 and ∞ norms in Table 6. • Spectral normalisation has been proven to robustify the trained models. As in Table 4, we see that the Fooling error (E) and SMAPE are reduced in all the cases when compared to the standard trained model.

• In the considered examples, we observe that categorical data have little effect when attacking the trained model as shown in Table 5. The E and SMAPE measures do not show major differences.

Conclusion

In this article, we have introduced a novel easily implementable Jacobian-based adversarial attacker for estimation problems. These regression tasks cover a major portion of safety critical applications. Yet there is lack of works studying and analysing adversarial attacks in this context, as opposed to classification tasks. The present study contributes to filling this gap. We have presented error metrics which help in analysing the effectiveness of the attacker. Our attacker is versatile in the sense that it can handle any measure ( 1 , 2 , ∞ ) on input or output perturbations according to the target application. Our attacker is also successful in handling attacks focused on subsets of inputs. This feature may be useful when handling specific tabular datasets and may also be insightful when information is available related to the sensitivity or ability to control some inputs. Our tests concentrated on fully connected networks, but it is worth pointing out that the proposed approach can be applied to any network architecture.

Figure 1 :1Figure 1: Network Architecture.

Table 1 :1Input and output variables description for Industrial dataset -A safety critical application.0Speed1Flight Distance2Departure Delay3Initial ETE4Latitude Origin5Longitude OrigincontinuousInput6 7Altitude Origin Latitude Destination8Longitude Destination9Altitude Destination10Arrival Time Slot7 slots (categorical)11Departure Time Slot7 slots (categorical)12Aircraft Category6 classes (categorical)13Airline Company19 classes (categorical)Output3Refinement ETEcontinuous

Table 2 :2Error metrics used for evaluation. The mean value computed for SMAPE is limited to the K+ positive values of the elements in the summation. e k is the error generated by the adversarial on the k-th sample in the dataset of length K.NoiseMAEstdMAEgaussMAEuniMAEbinMAE advEgaussEuniEbinEadvSMAPEgauss SMAPEuni SMAPE bin SMAPEadvCombined Cycle Power Plant Dataset1 × 10 −1 6.4 × 10 −3

Table 3 :3Comparison on evaluation metrics random attacker vs. proposed adversarial attacker with variation in perturbation level ( 2 attack).NoiseE advEspecSMAPE advSMAPEspecCombined Cycle Power Plant Dataset1 × 10 −14.0 × 10 −33.9 × 10 −30.620.602 × 10 −18.0 × 10 −37.7 × 10 −30.870.84Red Wine Quality Dataset1 × 10 −10.120.0170.410.122 × 10 −10.210.030.560.17Abalone age dataset5 × 10 −20.360.110.380.121 × 10 −10.720.230.580.21Industrial Dataset1 × 10 −111.8 × 10 −39.1 × 10 −30.910.492 × 10 −124.0 × 10 −317.5 × 10 −31.240.72

Table 4 :4Results of proposed adversarial techniques. Standard training vs Spectral Normalisation training on 2 attacks.NoiseE advE inpSMAPE advSMAPE inpCombined Cycle Power Plant Dataset1 × 10 −14.0 × 10 −33.4 × 10 −30.620.582 × 10 −18.0 × 10 −37.0 × 10 −30.870.82Red Wine Quality Dataset1 × 10 −10.120.130.410.442 × 10 −10.210.220.560.60Abalone age dataset5 × 10 −20.360.360.380.381 × 10 −10.720.710.580.59Industrial Dataset1 × 10 −1

Table 5 :5Results of proposed adversarial techniques. Standard training attacking all inputs vs standard training attacking few inputs on 2 attacks.We assume that J(x) is defined at x, see [Bolte and Pauwels,] for a justification of this assumption in the nonsmooth case.

Perturbation analysis of learning algorithms: A unifying perspective on generation of adversarial examples Balda arXiv:1812.07385 arXiv:1911.03274 Imperceptible adversarial attacks on tabular data WagnerCarlini IEEE 2018. 2018. 2019. 2019. 2020. 2020. 2017. 2017 arXiv preprint 2017 ieee symposium on security and privacy (sp) An analysis of adversarial attacks and defenses on autonomous driving models Cortez IEEE International Conference on Pervasive Computing and Communications (PerCom) IEEE 2009. 2009. 2020. 2020. 2020 47 Modeling wine preferences by data mining from physicochemical properties Adversarial manipulation of reinforcement learning policies in autonomous agents Dong arXiv:1804.05296 arXiv:1802.05957 Proceedings of the IEEE conference on computer vision and pattern recognition TakeruMiyato ToshikiKataoka MasanoriKoyama YuichiYoshida the IEEE conference on computer vision and pattern recognition IEEE 2018. 2018. 2018. 2018. 2018. 2018. 2018. 2018. 2014. 2018. 2018. 2016. 2016. 2016. 2016. 2018. 2018 arXiv preprint Spectral normalization for generative adversarial networks Deepfool: a simple and accurate method to fool deep neural networks Moosavi-Dezfooli Proceedings of the IEEE conference on computer vision and pattern recognition the IEEE conference on computer vision and pattern recognition 2016. 2016 The security of autonomous driving: Threats, defenses, and future directions Moosavi-Dezfooli arXiv:1812.02885 Proceedings of the IEEE conference on computer vision and pattern recognition the IEEE conference on computer vision and pattern recognition IEEE 2017. 2017. 2018. 2016. 2016. 2019. 2019 108 arXiv preprint Proceedings of the IEEE Rony arXiv:2011.11857 Achieving robustness in classification using optimal transport with hinge regularization 2020. 2020. 2020. 2020 arXiv preprint Augmented lagrangian adversarial attacks One pixel attack for fooling deep neural networks Su Christian Szegedy Wojciech 2019. 2019. 2013 23 Adversarial regression with multiple learners IlyaZaremba JoanSutskever DumitruBruna IanErhan RobGoodfellow ;Fergus Tong arXiv:1312.6199 International Conference on Machine Learning PMLR 2013. 2018. 2018 arXiv preprint Intriguing properties of neural networks Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods PınarTüfekci Tüfekci International Journal of Electrical Power and Energy Systems 60 2014. 2014