1. Introduction

V.ienna, Austria the usual performance indicators are global statistics and $ adrien.le-coz@irt-systemx.fr (A. Le Coz); cannot express in a fine grained way the algorithm behav- stephane.herbin@onera.fr (S. Herbin); faouzi.adjed@irt-systemx.fr ior: they can be used to rank competing solutions - this (F. Adjed) is currently done in academic benchmarks - but are not

Leveraging generative models to characterize the failure conditions of image classifiers

Adrien Le Coz

0 1

Stéphane Herbin

Faouzi Adjed

Trustworthy AI

0 DTIS, ONERA, Université Paris Saclay F-91123 Palaiseau - France 1 IRT SystemX , Palaiseau , France

2022

000 0 0002

We address in this work the question of identifying the failure conditions of a given image classifier. To do so, we exploit the capacity of producing controllable distributions of high quality image data made available by recent Generative Adversarial Networks (StyleGAN2): the failure conditions are expressed as directions of strong performance degradation in the generative model latent space. This strategy of analysis is used to discover corner cases that combine multiple sources of corruption, and to compare in more details the behavior of diferent classifiers. The directions of degradation can also be rendered visually by generating data for better interpretability. Some degradations such as image quality can afect all classes, whereas other ones such as shape are more class-specific. The approach is demonstrated on the MNIST dataset that has been completed by two sources of corruption: noise and blur, and shows a promising way to better understand and control the risks of exploiting Artificial Intelligence components for safety-critical applications.

eol>AI System Characterization Generative Models Explainable AI

1. Introduction

In spite of all this on-going activity, the available design tools have dificulty to master with an acceptable level of trustworthiness the complexity of AI-ML components for real-world safety-critical applications. The work presented in this paper contributes to better understand the behavior of an AI-ML component, and to identify the measurable and/or verifiable conditions inlfuencing success or failure. Its long-term motivation is to close the loop between the specification, design and testing steps by providing more refined analytical tools. The target application domain is computer vision where AI-ML techniques are now ubiquitous.

Characterizing AI components

Artificial Intelligence (AI) is getting every year more mature with potential applications to real world problems, and possibly to safety critical systems. Machine Learning (ML) is one of the most prominent set of AI techniques used to design predictive functions, especially for high dimensional inputs such as image, video, text or sound and generally involves Deep Neural Networks (DNN),

The exploitation of ML techniques introduces new issues to ensure safety and trustworthiness when designing or integrating AI based components: data quality assessment, robustness to adversarial perturbations, formal verification of DNNs, explainability, DNN calibration, etc.

These research actions are complemented by the production of a large number of position papers and reports produced by academic, industrial and government organizations or working groups (ISO, SAE, NHTSA, EASA, HLEG of EU, DEEL, etc.); one of the main objectives of which is to renew certification standards so that the various phases of an industrial process (specification, design, validation & verification, deployment, integration, operation, versioning, etc.) can accommodate AI. compared to others. (2) it is dificult to gather all the good that mimic a given random distribution. and bad operating conditions into one data set. There GANs exploit a representational latent space that can have been some attempts to describe rather exhaustively be sampled from a known low-dimension distribution, the possible hazards to families of algorithms [1], but often Gaussian, that is expected to encode enough inforwhat these attempts in fact revealed was the complexity mation to generate complete images. Generation is then to master. Test dataset replication experiences have also produced by a decoding network that is learned from tarshown that for high dimensional data, performance mea- get data samples. Recent approaches [11, 12, 13] are now sures can have large variance [2, 3]. (3) typical causes able to generate high quality high dimension data, with of performance degradation of AI-ML components such a photo realistic rendering when applied to images, and as distributional shift [4] and instability to small per- with good diversity and fidelity levels. One possible apturbations [5, 6] are dificult to catch with a single test plication of generative models for safety objectives is to dataset. augment data for testing various operational conditions

Another approach to characterize a given component, as in [14]. inspired by software engineering practices, is to define The latent space can also be used as a way to cona testing strategy “designed to reveal machine learning trol the generation process, for instance to edit images bugs” [7]. For instance, [8] exploits a concept of neuron [15, 16, 17]. When correctly disentangled, the latent space coverage inspired by test coverage in traditional software can be interpreted as a representation space where each testing, to detect erroneous inputs. dimension encodes some interpretable visual attribute

In our approach, we propose to combine these two dif- [16, 18]. In the case of face image generation, these atferent strategies, data-driven evaluation and testing, in tributes could be hairstyle, head orientation, eye color, order to characterize the behavior of a given function: we glasses, etc. Navigating in the representational latent identify the influential causes of performance degrada- space can also be used to identify the attributes that chartion by evaluating the performance on sets of generated acterize best a given class [19]. data that sample various data attributes, corruption or nuisance.

Main contributions Generative models to explore data space

Designing a probabilistic model in high dimensional data space such as image, video or sound, able to faithfully account for their diversity and informative features is a dificult (impossible?) objective. Generative models such as GANs [9] or generative invertible flows [ 10] is a series of ML techniques that provides means to give access to such a distribution by direct sampling. What is learned is not the parameters of the probability density but the parameters of a sampling process able to generate data We show how to exploit generative models to finely analyze the behavior of classifiers with high dimensional input in order to: • identify influential directions of performance degradation that can be expressed both in the data space and in a latent feature space of a generative model; • discover corner cases by exploring the directions of degradation in the latent feature space; • compare classifier performance on influential data features.

We focus in this paper on image classification as one of

the paradigmatic decision problems of computer vision with object detection and semantic segmentation, and illustrate our method on a corrupted version of the MNIST dataset [20] .

2. Proposed approach

The proposed approach is illustrated in Figure 1, where we explore how the latent space of a generative model diferentiates between data that are well and poorly classified by a given classifier. In the following, we will briefly describe the chosen generative model and its latent space structure (Section 2.1); explain how to find the dimensions of the latent space that diferentiate well-classified from mis-classified data (Section 2.2); describe how to manipulate images to visualize the attributes (Section 2.3); and see how we can estimate the accuracy of the classifier conditionally to the location of the data in the latent space (Section 2.4).

2.1. Resources

The current work is mainly based on two resources and objects that we present upstream in the two next subsections. They represent the theoretical and necessary tools allow us to better detail our proposed approach. 2.1.1. Classifier and data (a) t-SNE in (b) t-SNE in

(c) t-SNE in datasets, and the various levels of latent spaces. Indeed, three diferent latent spaces can be considered. The first latent space, , is typically normally distributed like many GANs and is the initial input space of the generator. Samples z ∈ are forwarded to the intermediate latent space using fully connected layers, resulting in a more disentangled representation than [22]. Using learned afine transformations, samples w ∈ are specialized into styles that scale the convolution weights for each feature map for each layer of the generator. A generated image is the result of an initial learned constant tensor that is up-sampled and transformed by residual convolution layers that are modulated by the style vector.

Images are generated from the style vector s by the generator (s). The space of styles, called StyleSpace, shows a high degree of disentanglement [16]. This latent space encodes distinct visual attributes along its dimensions and is typically used for image editing. To give an idea of the complexity of the generative model, in the original StyleGAN2 version that generates images of size 10242, and have 512 dimensions, has 9088 dimensions, and the initial constant tensor has a size of 42 with 512 channels. 2.1.2. Generative model The first input to our approach is a learned image clas- 2.2. Finding influential dimensions in the sifier to be analyzed. We assume that we have access latent space to its architecture and weights (“white box”). We also overcome what is the domain of application (handwritten The dimensions of the latent StyleSpace are expected digits, faces, indoor scenes, etc.), and have a correspond- to encode image attributes, such as shape, thickness, oriing dataset available, not necessarily used for learning entation and noise, in a rather disentangled way. We the classifier. exploit this property to define a simple search method able to identify the most influential dimensions regarding the accuracy of a given classifier.

The second ingredient of our approach is a generative model that can be controlled meaningfully. In our work, we used the StyleGAN2 model [21] for a few reasons: the quality of generated data, its scalability to complex

Gradient based approach The proposed strategy ranks the dimensions according to the gradient of the classifier output with respect to the StyleSpace input. The idea is to score each dimension based on its ability to lower the output score of the true class. More precisely, for each sample s in the StyleSpace, for which we know the true class, we generate the corresponding image x = (s), and then classify it according to (x). Then we compute the gradient with respect to the dimension in the style space of the -th classification output: ∇ (((s)), where is the index of the true class encoded by s. The gradient can be computed exactly by using an autograd algorithmic diferentiation provided in standard Deep Learning software environments – both the classifier and the generator being available in such framework. We compute the average gradient over a population of data as the score used to rank the dimensions.

(a) Top dimensions (b) Random dimensions

2.3. Image manipulation and corner cases

Starting from an image where the latent space representation – the style vector – is known, we can modify this representation to generate a modified image. In fact, once the influential dimensions are computed (see Section 2.2 above) and if we change the values of the style vector for those dimensions, then we modify the corresponding visual attributes for the generated image. Generating data that follow a high performance degradation is a simple heuristic: (1) we start from a given point s0 in the StyleSpace, (2) we increment the influential dimension by a given amount, and (3) we monitor the sign of the increment being given by the sign of the gradient. Note that the starting point s0 for exploration can be any point in the StyleSpace: it can be a “true” style, computed by mapping to a random z sampled in the input latent space , or any other point directly sampled in , for instance, an average of a given population of s data. We will use in the experiments (section 3) an “average” digit in the StyleSpace computed as the mean over a class conditional population.

This data space exploration along influential dimensions allows also the discovery of corner cases defined as the smallest degradation that shifts the classifier output from good to bad classification. The experiments in 3 will show several examples of corner cases discovered by this approach.

We can also use the latent space to understand better the classifier accuracy. In Section 2.2 we described how to find influential dimensions in the latent space. The data along those dimensions thus potentially correlates with the classifier accuracy. A classifier can be characterized globally by its accuracy decrease when fed by various amount of corruption produced by moving in the StyleSpace along influential dimensions. The decreasing slope characterizes globally the resilience to corruption of a classifier and can be used for comparison.

3. Results 3.1. Implementation We used the MNIST dataset [20] to evaluate our approach.

More precisely, we augmented the original data by introducing corruptions to simulate poor-quality data acquisition that may have an influence on class prediction. In particular, we chose Gaussian Noise and Gaussian Blur from [23] because they have a significant impact on classification accuracy for a classifier trained on clean data.

Data are corrupted in the following way: the first half of the dataset remains clean, and the second half is first blurred (with a severity level randomly chosen between 1, 2 and 3), and noise is added (with a severity level also randomly chosen between 1, 2 and 3). It ensures that most of the samples remain visually recognizable. Random samples are shown in Figure 2.

The StyleGAN2 generative models contains three different latent spaces. It is generally admitted that the so-called StyleSpace is a more disentangled represen- 3.2. Influential dimensions tation space. We found that the well-classified and mis- We apply the method described in 2.2 to rank dimenclassified samples are better separated in this space, even sions in the learned StyleSpace. In order to verify that though the generation was not constrained in any way several dimensions have a bigger impact to performance by the classifier. We can visualize this in Figure 3 by than others, we computed the histograms of the two popusing t-SNE projection [24]. ulations and on each dimension. Figure 4

After training on corrupted data, the classifier (a sim- depicts a selection of histograms. We see that the values ple Convolutional Neural Network) reaches an accuracy for the top dimensions follow diferent distributions for of 97% on the test data. The metric used to quantify the well-classified vs. mis-classified images, whereas random performance of the generative model is the Fréchet Incep- dimensions do not discriminate, meaning that the corretion Distance (FID) [25]. The generative model trained sponding style attribute does not influence performance. on corrupted data reaches an FID of 1.63 (computed by Figure 5 shows the impact of manipulating the latent comparing 50 generated images, unfiltered and without codes by shifting values along the most influential diusing truncation, to the 60 images of the whole training mensions. Each column represents one of the top ten dataset). This low value means a high generation quality. influential dimension and each line represents a diferent A few samples are shown in Figure 2 demonstrating the shift value. We clearly observe various types of image capacity of the generative model to encode various levels corruption that can be interpreted a posteriori when inand nature of corruption. creasing the shift value: the first three dimensions seem to introduce more noise, dimensions 4, 5 and 10 deform the original shape, dimension 9 lowers the intensity, ditrained on corrupted data. It also shows that it depends on the class: some classes are more dificult to predict than others.

Using most or all dimensions of to compute the distance makes the curve not monotonically decreasing.

It is better to use fewer dimensions, e.g. 100, as it makes the accuracy curve decrease faster and monotonically.

To make the curve clearer, we filtered out samples at too high distance values, where the generation quality decreases and the lower number of samples degrades the accuracy computation .

3.4. Identification of corner cases

As explained in Section 2.4, we can look at the classifier accuracy evolution when fed with populations of various corruption levels sampled in the StyleSpace. Figure 6 shows this evolution on two diferent classifiers. The accuracy degradation is representative of the robustness to corruptions of a classifier: the classifier trained on clean data sees its accuracy decrease faster than the classifier mensions 6, 7 and 8 introduce partial occlusions. Using a generative model allows a large corruption vocabulary, and in particular allows shape deformation, a capac- 4. Conclusion and perspectives ity that is not available in filter-based frameworks like Imagenet-C [23]. The current work addresses a relationship between data

The last three lines of Figure 5 show images corre- quality and model performance by exploring the latent sponding to a steep decrease of the classifier output score feature space of a generative model. Indeed, using our (from 1.0 to 0). This is where the class prediction shifts approach, we are able to identify the influential direcand where the generated image can be considered as a tions which deteriorate the classifier performances and corner case (see section 3.4). discover corner cases in this space. The proposed approach is based on ranking the latent space dimensions using the classifier output gradient with respect to the 3.3. Accuracy in the latent space StyleSpace input.

Our results show the impact and the influence of each identified direction in terms of performance degradation on the classifier. These identified directions, separately or jointly, allow a visual account of the degradation which could help in the interpretability and explainability of deep learning classifiers.

Acknowledgments This work has been supported by the French government

under the "Investissements d’avenir” program, as part of the SystemX Technological Research Institute. This work was granted access to the HPC/AI resources of IDRIS under the allocation 2022-AD011013372 made by GENCI.

Despite the first promising conclusions of this work,

our approach has been demonstrated only for generated and synthetic images. Its application to real data requires a capacity to encode – or invert – any data in the latent space [26], to be able to apply the degradation encoded by the influential directions.

Another perspective, is to evaluate our approach on more complex data to identify other types of degradation attributes. Recent works on image manipulation show that visual attributes can be controlled for more complex images [16, 18, 15, 17] and that generative models can be applied to larger datasets such as ImageNet [27]. Those two advances indicate the possibility of scaling-up our approach. P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems, volume 34, Curran Associates, Inc., 2021, pp. 16331–16345. URL: https://proceedings.neurips.cc/paper/2021/lfie/ 880610aa9f9de9ea7c545169c716f477-Paper.pdf. [18] O. Patashnik, Z. Wu, E. Shechtman, D. CohenOr, D. Lischinski, Styleclip: Text-driven manipulation of stylegan imagery, 2021. URL: https://arxiv. org/abs/2103.17249. doi:10.48550/ARXIV.2103. 17249. [19] O. Lang, Y. Gandelsman, M. Yarom, Y. Wald, G. Elidan, A. Hassidim, W. T. Freeman, P. Isola, A. Globerson, M. Irani, I. Mosseri, Explaining in style: Training a gan to explain a classifier in stylespace, 2021. URL: https://arxiv.org/abs/2104.13369. doi:10. 48550/ARXIV.2104.13369. [20] Y. LeCun, C. Cortes, C. Burges, Mnist handwritten digit database, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010). [21] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, T. Aila, Training generative adversarial networks with limited data, 2020. URL: https://arxiv. org/abs/2006.06676. doi:10.48550/ARXIV.2006. 06676. [22] T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, 2018. URL: https://arxiv.org/abs/1812.04948. doi:10. 48550/ARXIV.1812.04948. [23] D. Hendrycks, T. Dietterich, Benchmarking neural network robustness to common corruptions and perturbations, 2019. URL: https://arxiv.org/abs/ 1903.12261. doi:10.48550/ARXIV.1903.12261. [24] L. van der Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (2008) 2579–2605. URL: http://jmlr.org/papers/v9/ vandermaaten08a.html. [25] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems 30 (2017). [26] W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou, M.-H. Yang, GAN Inversion: A Survey, 2022. doi:10.48550/arXiv.2101.05278. arXiv:2101.05278. [27] A. Sauer, K. Schwarz, A. Geiger, Stylegan-xl: Scaling stylegan to large diverse datasets, arXiv preprint arXiv:2202.00273 (2022).