The Dilemma Between Data Transformations and Adversarial Robustness for Time Series Application Systems Sheila Alemany, Niki Pissinou School of Computing and Information Sciences, Florida International University salem010@fiu.edu, pissinou@fiu.edu Abstract are incomplete and instantaneous representations of infor- mation, trained machine learning models contain many ar- Adversarial examples, or nearly indistinguishable inputs cre- eas within it with low confidence. These low confidence ar- ated by an attacker, significantly reduce machine learning ac- eas of knowledge can be mapped similarly to how a human curacy. Theoretical evidence has shown that the high intrinsic dimensionality of datasets facilitates an adversary’s ability to can be less sure of a correct answer for unfamiliar contexts. develop effective adversarial examples in classification mod- Adversaries exploit these low-confidence areas and create els. Adjacently, the presentation of data to a learning model a minor input change possible to skew the model’s recom- impacts its performance. For example, we have seen this mendations or decisions to be wrong or inaccurate. Despite through dimensionality reduction techniques used to aid with these observations (Ilyas et al. 2019; Goodfellow, McDaniel, the generalization of features in machine learning applica- and Papernot 2018), the existence of adversarial examples tions. Thus, data transformation techniques go hand-in-hand remains an open problem (Shafahi et al. 2019; Hendrycks with state-of-the-art learning models in decision-making ap- et al. 2021). However, these proposed theories continuously plications such as intelligent medical or military systems. approach similar conclusions: the vulnerability of ML mod- With this work, we explore how data transformations tech- els is highly correlated to how the data is represented. niques such as feature selection, dimensionality reduction, or trend extraction techniques may impact an adversary’s abil- In practice, data is repeatedly being transformed with a ity to create effective adversarial samples on a recurrent neu- growing list of pre-processing techniques to optimize ML ral network. Specifically, we analyze it from the perspective models (Aleman et al. 2018; Naranjo and Santos 2019; of the data manifold and the presentation of its intrinsic fea- Huang and Zhou 2019), and these techniques transform the tures. Our evaluation empirically shows that feature selection way data is presented to an intelligent system. Thus, based and trend extraction techniques may increase the RNN’s vul- on existing work, we hypothesize that data transformations nerability. A data transformation technique reduces the vul- may directly impact the adversary’s ability to create adver- nerability to adversarial examples only if it approximates the sarial samples due to manipulations in representing the in- dataset’s intrinsic dimension, minimizes codimension, and trinsic features of data. Motivated by the direct impact that maintains higher manifold coverage. this may have on currently deployed systems, we explore how five widely-applied data transformation techniques af- 1 Introduction fect the robustness1 of recurrent neural networks. We consider techniques that span three different data As the application of ML grows in industries that require transformation categories: dimensionality reduction (Prin- explainable and reliable ML models, there is a significant cipal component analysis (Shlens 2014)), feature selection concern on the immense fragility in neural networks when (random forest (Golay and Kanevski 2017) and low vari- given a varying size set of imperceptibly perturbed inputs, ance (Bramer and Devedic 2004)), and trend extraction (can- adversarial examples (Biggio and Roli 2018; Su, Vargas, dlestick charting (Chmielewski et al. 2015) and exponential and Sakurai 2019; Elsayed et al. 2018). To address this moving average (Klinker 2011)). Our empirical evaluation issue, many pioneering works have focused on solutions aims to identify whether data transformation techniques in that increase the models’ robustness to maintain high accu- the three categories can impact the efficiency of an adver- racy assuming the existence of these adversarial examples sarial attack. To better understand this, we design our exper- (Biggio and Roli 2018; Ilyas et al. 2019; Goodfellow, Mc- iments to explore the following questions: Daniel, and Papernot 2018; Hendrycks et al. 2021). The so- lutions proposed in these works have observed adversarial 1. Could data transformations contribute to any adversary’s examples from the perspective of the abstractions created ability to more easily construct adversarial examples by the machine learning models. But, since these datasets (i.e., make the ML model more vulnerable to attacks)? 1 Copyright © 2022 for this paper by its authors. Use permitted un- In this work, robustness refers to the adversary’s decreased ca- der Creative Commons License Attribution 4.0 International (CC pacity to attack more efficiently or induce inaccurate results using BY 4.0). ”harder-to-detect” perturbations. 2. Is the dimensionality reduction technique, PCA, con- Data dimensionality has been referred to as a “curse” sistent as a strategy to increase robustness, as seen in due to substantial computational complexity yielding diffi- Bhagoji et al. (2018), when given a time series dataset, culties when abstracting properties in data that do not oc- recurrent neural network, and varying selected principal cur in lower-dimensional data (Van Der Maaten, Postma, components? and Van den Herik 2009; Ilyas et al. 2019; Bhagoji et al. 3. What representations of data contribute to ML models 2018). Resulting in data transformations techniques often that are least susceptible to adversarial examples and how being used in learning systems to improve upon these bur- can we use them to ensure best practices when manipu- dens (Cheng and Lu 2018). Naturally, data transformations lating data? have influenced the field of adversarial ML due to the con- nection between adversarial vulnerability in deep learning Overall, in this work, we expand the empirical understand- and the high dimensionality of data. These techniques in- ing of how data transformation techniques may impact the crease robustness by modifying the input such that the im- robustness of a recurrent neural network given the Carlini pact of gradient-based attacks is reduced, either through ad- & Wagner (Carlini and Wagner 2017b) evasion attack on a versarial pre-training (Hendrycks, Lee, and Mazeika 2019), multi-variate time series dataset (Banos et al. 2015). This feature squeezing (Xu, Evans, and Qi 2018), dimensional- benefits ML practitioners as they can use the presented re- ity reduction with PCA (Bhagoji et al. 2018), or identifying sults to move towards better data practices when manipulat- and removing the least “robust features” which contribute ing data increasingly used in deployed intelligent systems. the most to a model’s vulnerability (Ilyas et al. 2019). Thus, This is the first work exploring whether certain data trans- they are defenses that focus on executing certain transfor- formations (outside of dimensionality reduction) may im- mations at the beginning of the ML pipeline, such that when pact robustness in time series ML models to the best of our the adversary gains perfect knowledge of the trained model, knowledge. it is more difficult for an adversary to optimize its attack. Carlini and Wagner (2017a) showed how certain previ- 2 Related Work ously described techniques, including (Bhagoji et al. 2018), were not a consistent defense. For example, they were able Many pioneering works have established a foundation for to show how using PCA in the training data did not increase the seemingly inherent vulnerability to adversarial exam- the robustness of a convolutional neural network, only the ples. Szegedy et al. (2014) argued the existence of low- fully-connected network. Other works had inconsistencies probability adversarial “pockets” that an adversary can take in their presented results when tested on other datasets. Ob- advantage of. Feinman et al. (2017) established that adver- serving these inconsistencies and how the representation of sarial samples lie furthest away from the data manifold2 and data highly influences abstractions, we hypothesize that dif- are restricted in the direction normal to the data manifold ferent data transformations may individually impact the rep- such that the adversarial examples cross the decision axis resentation of the intrinsic features and hence, uniquely im- (the optimal boundary between the data manifolds captured pact an adversary’s ability to attack the model. during model training time) and result in an incorrect output (Khoury and Hadfield-Menell 2018). Shafahi et al. (2019) and Ilyas et al. (2019) proposed 3 Data Transformation Techniques that the vulnerabilities to adversarial examples stem from Our comparative review includes data transformation tech- the foundational characteristic in ML that the training data niques during the pre-processing stage of the ML pipeline. accurately and adequately represents the underlying and It is not exhaustive. We have strictly focused on linear data abstracted phenomena through the learning process. Such transformation techniques that have been commonly used high dimensional abstractions3 allow adversaries to exploit in a variety of applications (Aleman et al. 2018; Bhagoji through minor and specific details that a trained ML model et al. 2018; Carlini and Wagner 2017a). For brevity, we can overlook. Similarly, Amsaleg et al. (2020) showed that assume the reader understands the each technique. Future the intrinsic dimensionality of datasets and an adversary’s work can be focused on non-linear dimensionality reduc- ability to develop effective adversarial examples are directly tion techniques. We keep both works separate as non-linear proportional in classification models. This is so as a higher transformations may impact the complexity of data mani- intrinsic dimensionality results in higher model complexity. folds differently than linear ones. In all cases, the quality of the abstractions is limited to how the data is presented to the model (i.e., does the data have Dimensionality Reduction Dimensionality reduction is bias? Is it missing values? Does it contain noise? etc.). This the transformation of high-dimensional data into a signif- is because ML learning/generalization and adversarial ex- icant representation of low dimensionality (Cheng and Lu ample creation remains a classic optimization problem. 2018). Principal component analysis (PCA) is by far one of the more popular unsupervised tools due to its simple, 2 Data manifold is defined as the geometry of the data which non-parametric method for extracting relevant information contains a topological space that locally resembles the Euclidean from overwhelming datasets (Shlens 2014). For this work, space near each data value. we consider using 27%, 50%, and 81% of the principal com- 3 Highly dimensionality of a model is not only correlated to the ponents to approximate the feature counts around the 25, 50, model architecture/parameters but also the dataset being used (Su, and 75 quartiles. We explore in Section 6 how the selected Vargas, and Sakurai 2019). principal components in varying extremes can significantly change the data manifold in ways which impact robustness. Knowledge We use a white-box attack where the adver- sary has full access to the trained neural network model, the Feature Selection Feature selection is a data transforma- defense used, along with the data distribution at test time. tion technique that has been used for decades to represent We consider this attack because white-box attacks are more particular relationships in data by eliminating features that powerful than black-box attacks, as a white-box attack can may be irrelevant or redundant (Dash and Liu 1997) based reach a 100% success rate. Additionally, we consider eva- on a varying set size of heuristics. These techniques compare sion attacks where the adversaries can attack only during to dimensionality reductions methods in that they do not model deployment, meaning that they tamper with the input map onto a lower-dimensional space. For this work, we have data after the deep learning model is trained. selected random forest selection (Golay and Kanevski 2017) and low variance selection (Bramer and Devedic 2004) due Capabilities For the attack method, we use the iterative to their high usage for their low computational requirements. optimization-based method of Carlini and Wagner (2017b). For random forest selection, we set the feature importance We selected this attack model due to its high success at craft- measure threshold to be the mean of all importance val- ing effective adversarial samples with the lowest distortion ues, as it is standard in practice (Golay and Kanevski 2017). (Carlini and Wagner 2017b). Specifically, we have used the For low variance selection, the selected features contributed Carlini & Wagner l∞ implementation from the Adversarial 91.1% of the total variance in the data, as it is said to be Robustness Toolbox by IBM Research (Nicolae et al. 2018). the best heuristic to approximate the most significant infor- Some minor hyperparameters were modified to create ad- mation of a dataset (Van Der Maaten, Postma, and Van den versarial attacks that reduced the accuracy of our model are Herik 2009). Although random forest selection considers the the learning rate and confidence, set to 0.01 and 0.5, respec- relationship of features with the target variable and low vari- tively. ance selection does not, both techniques chose 9 overlapping Goal To create effective adversarial examples, we use the features. Thus, we expect their impact on data manifolds to l∞ distortion metric to measure the similarity between the be similar even with their varying heuristics for feature se- benign and potential adversarial examples since the l∞ -ball lection. around each data point has recently been studied as an op- Trend Extraction Up-to-date works have focused on im- timal, natural notion for adversarial perturbations (Goodfel- age recognition tasks concerning robustness, but time series low, Shlens, and Szegedy 2014; Carlini and Wagner 2017b). data is also highly used in ML applications. As a result, we For this work, we used the untargeted attack and considered have analyzed the impact of data transformation techniques 0 < ϵ ≤ 1 (Tjeng, Xiao, and Tedrake 2019). Although tar- meant to extract trends in time series data, such as candle- geted attacks are more powerful concerning the attack suc- stick charting (Chmielewski et al. 2015) and exponential cess rate, we are considering an untargeted attack since these moving average (EMA) (Klinker 2011). attacks require a more limited perturbation budget that al- These techniques were selected as they are used in pre- lows for an adversary to efficiently deploy the attack un- diction tasks in areas such as financial markets (Naranjo and detected (Carlini and Wagner 2017b). We can visualize the Santos 2019), IoT (Aleman et al. 2018), and object tracking perturbation under this distance metric by viewing a series (Huang and Zhou 2019). These techniques affect the data of data points. There is a maximum perturbation budget of ϵ, manifold by smoothing the trends in time series data, simi- where the sum of all perturbations is allowed to be changed larly to feature squeezing for image recognition (Xu, Evans, by up to ϵ, with no limit on the number of modified val- and Qi 2018), by artificially reducing the distance between ues. Since perturbation budget has to remain less than some temporally adjacent points that provide better estimation of small ϵ, even if all values are modified, the trends in time their distance along the manifold. For this work, to ensure series data will appear visually identical. we are similarly comparing both trend extraction techniques, both were assigned the same value for the time window. The 5 Experimental Methods time window value of 20 was a selected hyperparameter that We compare our evaluation results with previous works would not reduce the dimensionality of the dataset enough to that have completed similar tests with the computer vision hinder the model accuracy for the candlestick charting tech- datasets, CIFAR-10 (Krizhevsky 2009) and MNIST (LeCun, nique but would cause a significant enough change to the Cortes, and Burges 2010), to check for overall consistency feature trends given the EMA technique. in the impact done by data transformation techniques. 4 Threat Model Dataset The focus of related adversarial evaluation is largely centered around image recognition tasks. However, As per Carlini et al. (2019), we define the adversary’s knowl- there are high dimensional time series datasets that have edge, capabilities, and goals to ensure analysis for worst- received little attention in the adversarial ML field and case robustness. We did not implement any additional de- the need for evaluation on other datasets is crucial for fenses as our goal for this work is to explore the impact the advancement of the area (Carlini and Wagner 2017a). of these techniques for small perturbation budgets that are As a result, we have used the MHealth (Mobile Health) difficult to detect using the current state-of-the-art defenses Dataset4 which contains body motion and vital signs record- (Tjeng, Xiao, and Tedrake 2019). Considering the attack success rate with incorporated defenses and data transfor- 4 Dataset available on the UCI ML Repository at https://archive. mation techniques is left for future work. ics.uci.edu/ml/datasets/MHEALTH+Dataset 75 0 10 0 0 1 1 40 1 50 2 2 2 3 5 3 20 3 25 4 4 4 Component 1 Component 1 Component 1 0 5 5 5 6 6 0 6 0 25 7 7 7 8 8 20 8 50 9 5 9 9 10 40 75 11 12 10 100 60 75 50 25 0 25 50 75 10 5 0 5 10 40 30 20 10 0 10 20 30 Component 2 Component 2 Component 2 (a) MHealth Dataset (b) MNIST Dataset (c) CIFAR-10 Dataset Figure 1: Visualization of datasets using T-SNE to observe the relationships between the points in high-dimensional space using 1000 randomly selected points from each dataset. MHealth shows that various clusters can be easily identified, such as the points in classes 1, 2, and 3, similar to MNIST. Yet, there are clusters such as for classes 8 and 12, where the points are more scattered, similar to CIFAR-10. ing of individuals while performing several physical activ- tangent function in these hidden vectors as it is a standard ities (Banos et al. 2015). This highly volatile dataset con- activation function among recurrent neural networks (Chol- tains 22 total features which map to one of 12 potential let et al. 2015). The dropout values were set to 0.1, depict- physical activities and we selected the data corresponding ing that 10% of each input was ignored to prevent the model to subject1 with a total of 160,860 timestamps. from overfitting to the training data. Lastly, we are not con- Figure 1b shows that the MNIST dataset contains the cerned about our network’s simple linear structure because most well-defined classes meaning points corresponding to it is claimed that the network’s simple structure architecture the same class are clustered together more frequently. This does not impact their Carlini & Wagner evasion attacks Car- implies that the points within each class of the MNIST lini and Wagner (2017b). dataset have highly correlated relationships even with the highly-dimensional dataset. On the contrary, in Figure 1c, 6 Robustness Against Evasion Attacks the CIFAR-10 dataset does not have well-defined clusters Since the data manifold structure heavily influences the ex- resulting in an almost opposite conclusion relative to the istence of adversarial examples and how these adversarial MNIST dataset. As a result, CIFAR-10 has been described attacks are optimized, we observe the changes in model per- as a substantially more difficult dataset to work with. There- formance from the perspective of the data manifold. To com- fore, conclusions made with MNIST may contain prop- pare the changes made to the manifold by the data trans- erties that do not generalize across tougher datasets such formations, we observe the codimension or the difference as CIFAR-10 (Carlini and Wagner 2017a). However, the between the dimension of the data manifold and the dimen- MHealth dataset lies between the MNIST and CIFAR-10 sion of the embedding space 5 (Khoury and Hadfield-Menell dataset in regards to the relationship between the points in 2018). We show only perturbation budget 0 to 1 to show the high-dimensional space. Thus, we are testing with a realis- impact given small perturbations since the concluding re- tic time series dataset that contains manifold properties that sults do not change as the attack success continues increas- may carry-out to various other highly-dimensional time se- ing. ries datasets. As a result, we believe our evaluation using the MHealth dataset is a valid example that brings to light the Manifold Impacts on Log Loss & Precision observations presented in this work. From Figure 3 on the next page, we can see that precision Learning Model Data pre-processing includes processes is consistently below baseline for both feature selection and such as data cleaning, normalization, transformation, fea- trend extraction techniques. The low log loss and precision ture extraction, selection, and is the step done before train- indicates that these models are overly confident but erro- ing in this work. For the learning model, we have imple- neous implying a closer proximity between submanifolds to mented a multi-class classification recurrent neural network the decision axis (Wu et al. 2017). In other words, when (RNN) with LSTM layers using Keras (Chollet et al. 2015). the submanifolds are closer to the decision boundary, the Network architecture and hyperparameter tuning were com- distance between two arbitrary points in different classes is pleted to guarantee that all trained models for each data lower relatively. Thus, when an ML model is tasked with cat- transformation technique received the same hyperparame- egorizing a new point, it will often confidently miscategorize ters while maintaining testing accuracy above 90% to ensure since it is “harder” to differentiate between the two candi- that the network architecture did not influence robustness re- date classes. From the perspective of an adversary, they now sults. The network contained contain only two LSTM units combined with dropout layers which showed to return satis- 5 The embedding space is the space in which the data is embed- factory training and testing results. We used the hyperbolic ded after dimensionality reduction. 70 6 Baseline 60 PCA 50% Attack success with l (%) 5 PCA 27% 50 PCA 81% 4 Random forest Log loss 40 EMA 3 30 Candlestick 2 Low variance 20 10 1 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Perturbation budget ( ) Perturbation budget ( ) Figure 2: Attack success and log loss scores given five data transformation techniques against the baseline model without pre- processing. We can see that the best performing technique was PCA using half of the principal components. However, the log loss scores corresponding to model confident shows the all PCA techniques returned the lowest confidence when ϵ > 0.57. Baseline 1.0 Baseline 0.275 PCA 50% EMA 0.8 PCA 27% Precision at = 0.8 0.250 Precision scores PCA 81% 0.6 Random forest 0.225 EMA Candlestick 0.200 Candlesticks 0.4 Low variance 0.175 PCA 50% Random forest 0.2 0.150PCA 27% Low variance PCA 81% 0.0 0.2 0.4 0.6 0.8 1.0 6 8 10 12 14 16 18 20 22 Perturbation budget ( ) Feature count Figure 3: Precision scores under-performed for all techniques once the perturbation budget was over ϵ = 0.68. From the scatter plot, we can see that reducing the number of features during training negatively impacted the precision scores given a high enough perturbation budget. require a minimal perturbation budget to “convince” the ML of. However, as Carlini and Wagner (2017a) already showed model to miscategorize incoming data points consistently this is not consistent given a convolutional neural network with high confidence. However, this is not the case with and, for our evaluation, it seems it is may not always consis- PCA. With PCA, the precision is improved when ϵ < 0.65 tent with our recurrent neural network. due to relatively better defined submanifolds as a direct re- The other PCA techniques using 27% and 81% of the sult of mapping the input embedding into a lower dimen- principal components did not perform as well once the per- sion. The reduced precision for greater values of epsilon is turbation budget exceeded ϵ = 0.1. Particularly, using only then introduced when the log loss of the model increases 27% of the principal components results losing too many because linear units can get low precision from responding dimensions which can in turn reduce the manifold coverage too strongly from a reduced confidence when it does not for the dataset. This lack of coverage makes it is much easier understand samples with larger perturbations (Goodfellow, for an adversary to find an example far away from the data Shlens, and Szegedy 2014). manifold (Feinman et al. 2017). This can happen easily in practice since high training/testing accuracy does not imply Takeaway 1.1: PCA creates more well-defined sub- high accuracy/coverage of the data manifold (Khoury and manifolds for each class such that it is more difficult Hadfield-Menell 2018). On the other hand, when using 81% for an adversary to “trick” an ML model with an im- of the principal components, there is high codimension re- perceptible adversarial example. This is not the case sulting in relatively more directions normal to the manifold for feature selection and trend extraction techniques. and directly contributing to a more efficient attack. Thus, we can conclude that an optimal codimension exists in datasets Manifold Impacts on Model Accuracy such that the vulnerabilities presented are minimized. From Figure 2, it is clear the attack success rate is only hin- dered by 24.39% when the PCA technique is used with half Takeaway 2.1: The dimensionality reduction tech- of its principal components. Bhagoji et al. (2018) proposed nique, PCA, is not a consistent defense against ad- that PCA should consistently increase robustness because versarial examples when the codimension is not op- PCA removing the high variance components should elimi- timal. nate the features that adversaries can easily take advantage Table 1: Summary of results: Columns from left to right present the data transformation technique, the number of features used from the original data, its clean accuracy when the model is not under attack, the perturbation budget required to attack success to 30%, and percentage change in robustness at ϵ = 0.80 relative to the baseline model with no data transformation applied to its training data. Data Transformation Feature Count Benign Accuracy Distance (l∞ ) ∆ in Robustness Baseline 22 97.93% 0.51 - PCA 50% 11 96.71% 0.40 ↑ 24.39% PCA 81% 18 98.80% 0.76 ↓ 43.90% PCA 27% 6 95.00% 0.34 ↓ 60.98% Random Forest 9 96.11% 0.13 ↓ 31.71% Low Variance 11 91.32% 0.15 ↓ 65.85% Candlesticks 22 92.78% 0.11 ↓ 60.98% EMA 22 96.48% 0.51 ↓ 7.32% The feature selection techniques behaved similarly (as ex- Reaching this ideal data representation can be done by pected) given both techniques selected a majority of the identifying the intrinsic dimension of a dataset. The intrinsic same features. In both cases, since no mapping to a lower di- dimension is defined as a potential solution from the codi- mension occurs and a majority of the features are removed, mension of solutions sets (Li et al. 2018). In other words, the model contains high codimension and a lack of mani- it can be described as the minimum number of parameters fold coverage relative to the dimensionality reduction. As a necessary to account for the observed properties in the data, result, feature selection aids an efficient adversarial attack achieve optimal ML performance accuracy and, a way to re- through all tested perturbation budgets. duce codimension. Takeaway 2.2: Feature selection techniques con- Takeaway 3.1: ML practitioners can reduce codimen- tribute to higher codimension, and they lack manifold sion in their models using the intrinsic dimension of coverage results in an adversary’s ability to construct their dataset. adversarial examples more easily. Finding and Using Intrinsic Dimension The geometry The trend extraction techniques, however, do not remove of the data manifold, or the dataset’s intrinsic dimensional- the features used but manage to force the data into a lower ity, is generally twisted and curved with non-uniformly dis- dimensional manifold by generalizing the trends that nor- tributed points, making identifying the intrinsic dimension- mally contribute to the high dimensionality in trained mod- ality a challenging task unique for each dataset (Facco et al. els (Xu, Evans, and Qi 2018). For the candlestick chart- 2017). There are various tools and algorithms to analyze the ing, the transformation into the four-tuple reshaped the fea- intrinsic characteristics, such as the intrinsic dimensional- tures but contributed to fundamental information loss for the ity of data. For example, the most straightforward way by dataset. The information loss resulted on the higher relative counting the number of features that contribute at least 90% end of codimension and the one of most efficient creation of the total variance (Van Der Maaten, Postma, and Van den of adversarial examples with a 60.98% decrease in robust- Herik 2009). With datasets and ML models that are more ness at ϵ = 1.0. However, EMA seemed to not smooth complex, (Li et al. 2018) proposed to measure the intrinsic the manifold enough for a drastic change from the baseline dimension of an “objective landscape” or the dimension of data. Therefore, no statistically significant change to the data the subspace of a parameterized model, such as a dataset manifold results in a performance on par with the baseline. or neural network. They do so by training a neural network from a small, randomly oriented subspace and slowly in- Takeaway 2.3: Candlesticks charting contributes to creasing its dimension (through added features or parame- the most vulnerable ML models due to information ters) until they reach a plateau of performance accuracy, and loss which significantly increases codimension. define that configuration to be the objective landscape’s in- trinsic dimension. To measure the intrinsic dimension of the MHealth Optimal Data Representations dataset, we used both of these techniques, (Van Der Maaten, From our experimentation, we were able to see that the data Postma, and Van den Herik 2009) and (Li et al. 2018). Us- transformation techniques which did not minimize codimen- ing (Van Der Maaten, Postma, and Van den Herik 2009), sion aided in allowing pathways for adversaries to exploit. 11 features contribute to approximately 91% of the total The difficulty arises when transformations do not always and variance. Using (Li et al. 2018), we sorted the features by consistently impact the codimension. This prompted us to descending variance and trained the same RNN one fea- ask the following question: how do we know how and what ture at a time and noticed the plataeu began with 9 features transformation to execute to ensure that the codimension is at approximately 94% performance accuracy. Overall, from not increased for an arbitrary dataset? these simple tests, we can see that the intrinsic dimension for the MHealth dataset is approximately between [9, 11], the technique embeds the high-dimensional input space into likely closer to 9 due to the complexity of the model and the a lower-dimensional structure that approaches the intrinsic looser bounds presented by the (Van Der Maaten, Postma, dimension of data. Specifically, PCA overperformed only and Van den Herik 2009) heuristic. Since the same neural when the dimensionality approached the intrinsic dimen- network architecture and parameters are used for all trans- sion. Meanwhile, the trend extraction techniques that re- formation techniques, its contribution to the intrinsic dimen- frained from sufficiently reaching the intrinsic dimension sionality is out of scope of this evaluation. However, ML showed to negatively impact the attack success and the pre- practitioners can incorporate the technique for future param- cision scores, overall making the ML model more vulnera- eter configurations into their pipelines with ease. ble to adversarial examples. Although we only considered a recurrent neural network with LSTM layers, the MHeath Takeaway 3.2: Observing the objective landscape of dataset that we used is a realistic, high-dimensional time se- data is one simple, flexible, and accurate way to iden- ries dataset that shows an example of the impacts that data tify the intrinsic dimension for consideration along transformation can have on an ML model. with any data transformations. Our results conclude that when the dimension approaches the optimal intrinsic dimension, lower codimension and higher manifold coverage result in a lesser need to gener- Intrinsic Dimension on Robustness with MHealth With alize features and reduce the inherent vulnerability to adver- the dimensionality reduction technique, PCA, we were able sarial examples. However, it is important to note that reach- to see that the performance was only consistent in the case ing the intrinsic dimensionality is not enough to guarantee when the input embedding dimensionality more closely ap- perfect robustness. The inevitability of adversarial examples proached the intrinsic dimension. Given the intrinsic dimen- has recently been theoretically studied, and it is still not pos- sionality reached with PCA 50%, the codimension was rel- sible to know the exact and consistent properties of real- atively minimized resulting in the most restricted number of world datasets or the resulting fundamental limits of adver- directions for the adversary to take advantage. sarial training for specific datasets (Shafahi et al. 2019). In On the other hand, for the feature selection techniques, other words, the underlying distributions themselves can be the lack of mapping to a lower dimension prevented the fea- complex enough such that there may be no guarantee of per- ture selection techniques to approximate the intrinsic dimen- fect robustness against adversarial examples. Nonetheless, sion as accurately as PCA, resulting in the poor performance our work highlights the value of considering potential vul- while under attack. However, since random forest selection nerabilities introduced to ML pipelines through data trans- closer approximates the intrinsic dimension (with 9 selected formations and how ML practitioners may utilize the intrin- features), the attack success rate differs to low variance se- sic dimension to reduce the overall complexity of models, lection by approximately 10%. Also, for the candlesticks, avoid introducing additional vulnerabilities, and create more the transformation into the four-tuple strayed the furthest reliable pipelines. away from the intrinsic dimensionality by reshaping the fea- Lastly, as a future direction, the analysis of data trans- tures. This transformation contributed in fundamental infor- formations (linear and non-linear) on adversarial examples mation loss for the dataset while straying away from the may benefit a model under a poisoning attack. Such analysis intrinsic dimension resulting on the higher relative end of could provide insight into how certain data transformations codimension and one of the most efficient creation of ad- can extricate adversarial noise to increase model robustness. versarial examples with a 60.98% decrease in robustness at ϵ = 1.0. References Takeaway 3.3: To avoid introducing additional vul- Aleman, C. S.; Pissinou, N.; Alemany, S.; and Kamhoua, nerabilities in ML pipelines, one must observe and G. A. 2018. Using Candlestick Charting and Dynamic Time understand the particular dataset’s intrinsic char- Warping for Data Behavior Modeling and Trend Prediction acteristics and ensure any transformation does not for MWSN in IoT. In 2018 IEEE International Conference stray from the intrinsic dimension. on Big Data (Big Data), 2884–2889. IEEE. Amsaleg, L.; Bailey, J.; Barbe, A.; Erfani, S. M.; Furon, T.; 7 Conclusion Houle, M. E.; Radovanović, M.; and Nguyen, X. V. 2020. High Intrinsic Dimensionality Facilitates Adversarial At- For this work, we have provided an example where linear tack: Theoretical Evidence. IEEE Transactions on Informa- data transformation techniques can change an adversary’s tion Forensics and Security, 16: 854–865. ability to create effective adversarial examples. From the conclusions presented in Amsaleg et al. (2020), one could be Banos, O.; Moral-Munoz, J. A.; Diaz-Reyes, I.; Arroyo- led to believe a transformation that has reduced complexity Morales, M.; Damas, M.; Herrera-Viedma, E.; Hong, C. S.; and high training/testing accuracy would be inherently more Lee, S.; Pomares, H.; Rojas, I.; et al. 2015. mDurance: a robust. However, their conclusion stands between datasets novel mobile health system to support trunk endurance as- of different complexities but does not speak on the poten- sessment. Sensors, 15(6): 13159–13183. tial impacts of data transformations. Positive impacts by di- Bhagoji, A. N.; Cullina, D.; Sitawarin, C.; and Mittal, P. mensionality reduction techniques are only presented where 2018. Enhancing robustness of machine learning systems via data transformations. In 2018 52nd Annual Conference Huang, J.; and Zhou, W. 2019. Re 2 EMA: Regularized and on Information Sciences and Systems (CISS), 1–5. IEEE. Reinitialized Exponential Moving Average for Target Model Biggio, B.; and Roli, F. 2018. Wild patterns: Ten years after Update in Object Tracking. In Proceedings of the AAAI Con- the rise of adversarial machine learning. Pattern Recogni- ference on Artificial Intelligence, volume 33, 8457–8464. tion, 84: 317–331. Ilyas, A.; Santurkar, S.; Tsipras, D.; Engstrom, L.; Tran, B.; Bramer, M.; and Devedic, V. 2004. Artificial Intelligence and Madry, A. 2019. Adversarial examples are not bugs, Applications and Innovations. Springer. they are features. Advances in neural information processing systems 32. Carlini, N.; Athalye, A.; Papernot, N.; Brendel, W.; Rauber, Khoury, M.; and Hadfield-Menell, D. 2018. On the J.; Tsipras, D.; Goodfellow, I.; Madry, A.; and Kurakin, A. geometry of adversarial examples. arXiv preprint 2019. On evaluating adversarial robustness. arXiv preprint arXiv:1811.00525. arXiv:1902.06705. Klinker, F. 2011. Exponential moving average versus mov- Carlini, N.; and Wagner, D. 2017a. Adversarial examples ing exponential average. Mathematische Semesterberichte, are not easily detected: Bypassing ten detection methods. In 58(1): 97–107. Proceedings of the 10th ACM Workshop on Artificial Intelli- gence and Security, 3–14. Krizhevsky, A. 2009. Learning multiple layers of features from tiny images. Technical report. Carlini, N.; and Wagner, D. 2017b. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on LeCun, Y.; Cortes, C.; and Burges, C. 2010. MNIST security and privacy (sp), 39–57. IEEE. handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2. Cheng, Z.; and Lu, Z. 2018. A novel efficient feature di- Li, C.; Farkhoor, H.; Liu, R.; and Yosinski, J. 2018. Measur- mensionality reduction method and its application in engi- ing the intrinsic dimension of objective landscapes. Interna- neering. Complexity, 2018. tional Conference on Learning Representations. Chmielewski, L.; Janowicz, M.; Kaleta, J.; and Orłowski, A. Naranjo, R.; and Santos, M. 2019. A fuzzy decision sys- 2015. Pattern recognition in the Japanese candlesticks. In tem for money investment in stock markets based on fuzzy Soft computing in computer and information science, 227– candlesticks pattern recognition. Expert Systems with Appli- 234. Springer. cations, 133: 34–48. Chollet, F.; et al. 2015. Keras. https://github.com/fchollet/ Nicolae, M.-I.; Sinn, M.; Tran, M. N.; Buesser, B.; Rawat, keras. A.; Wistuba, M.; Zantedeschi, V.; Baracaldo, N.; Chen, B.; Dash, M.; and Liu, H. 1997. Feature selection for classifica- Ludwig, H.; Molloy, I.; and Edwards, B. 2018. Adversarial tion. Intelligent data analysis, 1(3): 131–156. Robustness Toolbox v1.2.0. CoRR, 1807.01069. Elsayed, G. F.; Shankar, S.; Cheung, B.; Papernot, N.; Ku- Shafahi, A.; Huang, W. R.; Studer, C.; Feizi, S.; and Gold- rakin, A.; Goodfellow, I.; and Sohl-Dickstein, J. 2018. Ad- stein, T. 2019. Are adversarial examples inevitable? In In- versarial examples that fool both human and computer vi- ternational Conference on Learning Representations. sion. arXiv preprint arXiv:1802.08195. Shlens, J. 2014. A tutorial on principal component analysis. Facco, E.; d’Errico, M.; Rodriguez, A.; and Laio, A. 2017. arXiv preprint arXiv:1404.1100. Estimating the intrinsic dimension of datasets by a minimal Su, J.; Vargas, D. V.; and Sakurai, K. 2019. One pixel at- neighborhood information. Scientific reports, 7(1): 1–8. tack for fooling deep neural networks. IEEE Transactions Feinman, R.; Curtin, R. R.; Shintre, S.; and Gardner, A. B. on Evolutionary Computation. 2017. Detecting adversarial samples from artifacts. arXiv Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, preprint arXiv:1703.00410. D.; Goodfellow, I.; and Fergus, R. 2014. Intriguing prop- Golay, J.; and Kanevski, M. 2017. Unsupervised feature se- erties of neural networks. In International Conference on lection based on the Morisita estimator of intrinsic dimen- Learning Representations. sion. Knowledge-Based Systems, 135: 125–134. Tjeng, V.; Xiao, K.; and Tedrake, R. 2019. Evaluating ro- Goodfellow, I.; McDaniel, P.; and Papernot, N. 2018. Mak- bustness of neural networks with mixed integer program- ing machine learning robust against adversarial inputs. Com- ming. International Conference on Learning Representa- munications of the ACM, 61(7). tions. Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explain- Van Der Maaten, L.; Postma, E.; and Van den Herik, J. 2009. ing and harnessing adversarial examples. arXiv preprint Dimensionality reduction: a comparative. J Mach Learn arXiv:1412.6572. Res, 10(66-71): 13. Wu, X.; Jang, U.; Chen, L.; and Jha, S. 2017. Manifold Hendrycks, D.; Carlini, N.; Schulman, J.; and Steinhardt, assumption and defenses against adversarial perturbations. J. 2021. Unsolved problems in ml safety. arXiv preprint arXiv preprint arXiv:1711.08001. arXiv:2109.13916. Xu, W.; Evans, D.; and Qi, Y. 2018. Feature squeezing: De- Hendrycks, D.; Lee, K.; and Mazeika, M. 2019. Using pre- tecting adversarial examples in deep neural networks. Net- training can improve model robustness and uncertainty. In- work and Distributed Systems Security Symposium. ternational Conference on Machine Learning.