Introducing Quality Guarantees for Autoencoders Benedikt Böing1 , Rajarshi Roy2 , Emmanuel Müller1 and Daniel Neider2 1 TU Dortmund, Germany 2 Max Planck Institute for Software Systems, Germany Abstract Autoencoders are an essential concept in unsupervised learning. Currently, the quality of autoencoders is assessed either internally (e.g., by mean square error) or externally (e.g., by classification performance). Yet, there is no possibility to prove that autoencoders generalize beyond the finite training data, and hence they are not reliable for safety-critical applications requiring formal guarantees for unseen data. To address this issue, we propose the first framework to bound the worst-case error of an autoencoder within a safety-critical region of an infinite value domain, as well as the definition of unsupervised ad- versarial examples that cause such worst-case errors. Technically, our framework reduces the infinite search space for a uniform error bound to checking satisfiability of logical formulas in Linear Real Arith- metic. This allows us to leverage highly-optimized SMT solvers, a strategy that is very successful in the context of deductive software verification. We demonstrate our ability to find unsupervised adversarial examples as well as formal quality guarantees on a real-world dataset from the medical domain. 1. Introduction Autoencoders are widely used for many unsupervised learning tasks such as cluster analysis [1], compression [2], anomaly detection [3], as well as a variety of preprocessing steps [4, 2, 5] in other machine learning pipelines. The general assumption is that data can be compressed into a lower dimensional latent space by an encoder function extracting the most relevant features of the data distribution. From this latent representation the decoder tries to reconstruct the original input. As the latent representation is an information bottleneck the autoencoder’s input deviates from its output. Typically the autoencoder reconstructs better in dense regions (i.e., regions with many training examples) than in regions with few training examples [3] giving rise to its application in anomaly detection. Moreover even the small errors in dense regions are a desirable property as they allow it to be used, e.g., for denoising. At the same time it is necessary to control the error for all points in dense regions because otherwise the result - whether it is the latent representation or the reconstruction - is of little use. To this end current approaches to assess autoencoders either measure internally the mean square error (MSE) on the unsupervised training data or external performance on some supervised application such as classification performance [6, 7, 8]. However, a major shortcoming of these approaches is that they cannot provide a formal guar- antee in terms of the maximum deviation between input and output of the autoencoder as it OVERLAY 2021: 3rd Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis, September 22, 2021, Padova, Italy " benedikt.boeing@cs.tu-dortmund.de (B. Böing); rajarshi@mpi-sws.org (R. Roy); emmanuel.mueller@cs.tu-dortmund.de (E. Müller); neider@mpi-sws.org (D. Neider) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) is evaluated on test data only (i.e., with a finite number of inputs). This lack of formal quality guarantees for autoencoders leads to a very limited applicability of such unsupervised learning schemes for safety-critical applications. For instance, it is particularly important to consider the maximum deviation when working with data containing clusters. In such situations the autoencoder should not mix up the clusters because otherwise the autoencoder’s results are meaningless. If the maximum deviations for the respective clusters are small enough though, the autoencoder provably keeps the clusters separated. To address this and other shortcomings of unsupervised learning with autoencoders, we provide the first methodology to bound an autoencoder’s worst-case error in a safety-critical region. Inspired by the vast literature on supervised adversarial attacks [9, 10, 11], we define the notion of unsupervised adversarial examples as inputs (not necessarily contained in the training data) on which the autoencoder’s error exceeds a user-defined threshold. Then we define the worst-case error of an autoencoder as the largest error that can possibly manifest. Following a popular approach in the area of software verification, we reduce the problem of finding an unsupervised adversarial example to a satisfiability check of a formula in linear Real Arithmetic [12, 13]. This allows us to apply highly-optimized, off-the-shelf satisfiability modulo theory (SMT) solvers which can effectively reason about the infinite domains and, hence, can prove the existence or non-existence of unsupervised adversarial examples. Once we have found an unsupervised adversarial example, it serves as a lower bound for the worst-case error. Moreover, a binary search allows us to approximate the worst-case error arbitrarily well. We demonstrate the usefulness of our QUGA (QUality Guarantees for Autoencoders) approach by proving that a given autoencoder keeps two classes of a medical dataset separated. For more experiments and details please refer to the extended version of this paper [14]. 2. QUGA: Problem Statement In general, an autoencoder tries to reproduce its input; that is, it is trained to compute 𝑓 (𝑥) = 𝑥. However, it does so while propagating it through a latent space which typically has less dimensions than the input/output space. This latent space serves as an information bottleneck and therefore introduces errors to the identity function the autoencoder is supposed to learn. However, as most applications of autoencoders rely on a good approximation of the identity function, we are naturally interested in quantifying its error. More precisely, our goal is to give formal guarantees in terms of the maximum deviation from the identity function. As a first step towards this goal, we define the notion of adversarial examples of autoencoders. Intuitively, such adversarial examples are inputs on which the “distance” between the input and the output of the autoencoder is larger than a (user-defined) threshold 𝜀 > 0. Definition 1 (𝜀-adversarial examples). Let 𝑓 : R𝑛 → R𝑛 be an autoencoder, dist : R𝑛 × R𝑛 → R+ a distance function, and 𝜀 > 0. An 𝜀-adversarial example is a point 𝑥 ∈ R𝑛 such that dist(𝑥, 𝑓 (𝑥)) > 𝜀 (i.e., a point on which the input and output of 𝑓 deviate more than 𝜀). Note that our definition of 𝜀-adversarial examples is not restricted to inputs in the training or test sets but allows any input 𝑥 ∈ R𝑛 . In the context of safety-critical systems, however, it is not enough to identify individual 𝜀-adversarial examples, but it is necessary to know the worst-case (i.e., maximum) error an autoencoder produces. As we cannot expect to find a global maximum of the unbounded error, we restrict the region for which we want to calculate the worst-case error. Definition 2 (Worst-case error of autoencoders). Let 𝑓 : R𝑛 → R𝑛 be an autoencoder, dist : R𝑛 × R𝑛 → R+ a distance function, and 𝐴 ⊆ R𝑛 an (infinite) safety-critical region of inputs. Then, the worst-case error of 𝑓 in 𝐴 is defined as {︀ (︀ )︀ }︀ wce(𝑓, 𝐴) = sup dist 𝑥, 𝑓 (𝑥) ∈ R+ | 𝑥 ∈ 𝐴 (i.e., the largest deviation of an input in the region 𝐴 from its output). Definition 2 serves as our novel quality criterion for autoencoders that reflects how good the identity function is learned in the specific region of interest. It overcomes the limitation of classical quality metrics that are defined on finite test data only. In total, this leads us to the main problem statement. Problem 1 (QUGA: Quality Guarantees for Autoencoders). Given a region 𝐴 ⊆ R𝑛 , an autoencoder 𝑓 : R𝑛 → R𝑛 and a distance function dist : R𝑛 × R𝑛 → R+ , compute wce(𝑓, 𝐴). In general, computing the worst-case error is a very challenging and computationally in- tractable problem. Hence in the next section we introduce a restricted, solvable version of it. 3. Solution Framework In this section we outline a framework for computing 𝜀-adversarial examples and the worst-case error of autoencoders in a restricted and hence computationally tractable version of Problem 1. The following restrictions are designed in such a way that the solution framework remains applicable to a wide range of autoencoders used in practice: 1. We assume the the neurons of the autoencoder have linear or ReLU (Rectified Linear Unit) activation functions. 2. We assume the distance function to be the 𝐿1 or 𝐿∞ -norm. 3. We assume the safety-critical region 𝐴 to be a finite union of convex compact polytopes. 4. We approximate the worst-case error up to a user-defined accuracy. Given these assumptions we can encode the problem of finding an 𝜀-adversarial example in a given region 𝐴 as an instance of the SMT problem [15]. To this end we define formulas encoding the autoencoder (𝜙𝑎𝑒 ), the region (𝜙𝐴 ) and the distance function (𝜙dist 𝜀 ) and concatenate them to 𝜙𝜀 = 𝜙𝑎𝑒 ∧ 𝜙𝐴 ∧ 𝜙𝜀 such that the following theorem holds. 𝑎𝑒 dist Theorem 1. Let 𝑓 be an autoencoder, 𝐴 a region, dist a distance function, 𝜀 > 0, and 𝜙ae 𝜀 as defined above. Then, the following two properties hold: 1. If 𝐴 contains an 𝜀-adversarial example, then 𝜙ae 𝜀 is satisfiable. 2. If 𝜙ae 𝜀 is satisfiable, then any solution of 𝜙ae is an 𝜀-adversarial example in 𝐴. 𝜀 In order to approximate the worst-case error in a region we can thereafter exploit this property by performing a binary search over the values of 𝜀 and keeping track of lower and upper bounds on the worst-case error. 2 0 −2 −4 −6 10 20 30 40 50 60 70 80 90 100 110 120 130 140 Figure 1: ECG5000 dataset with two safety-critical regions (red and green) obtained by extracting prototypes for two classes and adding a margin of 0.25. 4 2 0 −2 −4 −6 10 20 30 40 50 60 70 80 90 100 110 120 130 140 Figure 2: Image spaces of the autoencoder (red and green) into which points from the respective safety-critical regions in Figure 1 can theoretically be mapped by the autoencoder. 4. Empirical Evaluation We demonstrate our QUGA framework on the ECG5000 dataset (Figure 1), by evaluating the unsupervised training based on two time series clusters. The goal of a traditional evaluation would be to show that all training objects are clearly separated in the latent space. In contrast, we care about all possible (infinitely many) objects in two safety-critical areas (green and red) that need to be distinguishable in the latent space. In Figure 2 we see the resulting corridors (green and red) into which points from the corresponding critical regions can be mapped. As the two corridors do not overlap on all timesteps (e.g., in time 137) the autoencoder keeps the two safety-critical classes apart allowing for the use of the latent space representations for follow up tasks such as, e.g., classification. 5. Conclusion QUGA overcomes major shortcomings of unsupervised learning with autoencoders. We provide the first methodology to bound the error of an autoencoder in a safety-critical region. With our solution framework based on SMT solvers we propose to search for adversarial examples and the worst-case error in the infinite search space of a safety-critical region. Our QUGA approach formulates the autoencoder, the safety-critical region, and the error of the loss function with a logical conjunction of linear constraints. This allows us to prove separation of the input data in the latent space for further downstream applications. References [1] S. E. Chazan, S. Gannot, J. Goldberger, Deep clustering based on A mixture of autoencoders, in: 29th IEEE International Workshop on Machine Learning for Signal Processing, 2019. [2] Q. Meng, D. R. Catchpoole, D. Skillicom, P. J. Kennedy, Relational autoencoder for feature extraction, in: 2017 International Joint Conference on Neural Networks, 2017. [3] M. Sakurada, T. Yairi, Anomaly detection using autoencoders with nonlinear dimension- ality reduction, in: Proc. of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, 2014. [4] L. Gondara, Medical image denoising using convolutional denoising autoencoders, in: IEEE International Conference on Data Mining Workshops, 2016. [5] L. Pasa, A. Sperduti, Pre-training of recurrent neural networks via linear autoencoders, in: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014. [6] Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, A. Y. Ng, Building high-level features using large scale unsupervised learning, in: Proceedings of the 29th International Conference on Machine Learning, 2012. [7] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res. (2010). [8] M. R. Min, D. A. Stanley, Z. Yuan, A. J. Bonner, Z. Zhang, A deep non-linear feature mapping for large-margin knn classification, in: ICDM 2009, The Ninth IEEE International Conference on Data Mining, 2009. [9] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, in: 3rd International Conference on Learning Representations, 2015. [10] N. N. Dalvi, P. M. Domingos, Mausam, S. K. Sanghai, D. Verma, Adversarial classification, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004. [11] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: 2nd International Conference on Learning Representations, 2014. [12] G. Katz, C. W. Barrett, D. L. Dill, K. Julian, M. J. Kochenderfer, Reluplex: An efficient SMT solver for verifying deep neural networks, in: Computer Aided Verification - 29th International Conference, 2017. [13] L. M. de Moura, N. Bjørner, Z3: an efficient SMT solver, in: Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, 2008. [14] B. Böing, R. Roy, E. Müller, D. Neider, Quality guarantees for autoencoders via unsupervised adversarial attacks, in: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD, 2020. [15] L. M. de Moura, N. Bjørner, Satisfiability modulo theories: introduction and applications, Commun. ACM (2011).