<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V.ienna, Austria the usual performance indicators are global statistics and
$ adrien.le-coz@irt-systemx.fr (A. Le Coz); cannot express in a fine grained way the algorithm behav-
stephane.herbin@onera.fr (S. Herbin); faouzi.adjed@irt-systemx.fr ior: they can be used to rank competing solutions - this
(F. Adjed) is currently done in academic benchmarks - but are not</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Leveraging generative models to characterize the failure conditions of image classifiers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adrien Le Coz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stéphane Herbin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faouzi Adjed</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trustworthy AI</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DTIS, ONERA, Université Paris Saclay F-91123 Palaiseau -</institution>
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IRT SystemX</institution>
          ,
          <addr-line>Palaiseau</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We address in this work the question of identifying the failure conditions of a given image classifier. To do so, we exploit the capacity of producing controllable distributions of high quality image data made available by recent Generative Adversarial Networks (StyleGAN2): the failure conditions are expressed as directions of strong performance degradation in the generative model latent space. This strategy of analysis is used to discover corner cases that combine multiple sources of corruption, and to compare in more details the behavior of diferent classifiers. The directions of degradation can also be rendered visually by generating data for better interpretability. Some degradations such as image quality can afect all classes, whereas other ones such as shape are more class-specific. The approach is demonstrated on the MNIST dataset that has been completed by two sources of corruption: noise and blur, and shows a promising way to better understand and control the risks of exploiting Artificial Intelligence components for safety-critical applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI System Characterization</kwd>
        <kwd>Generative Models</kwd>
        <kwd>Explainable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In spite of all this on-going activity, the available
design tools have dificulty to master with an acceptable
level of trustworthiness the complexity of AI-ML
components for real-world safety-critical applications. The
work presented in this paper contributes to better
understand the behavior of an AI-ML component, and to
identify the measurable and/or verifiable conditions
inlfuencing success or failure. Its long-term motivation is
to close the loop between the specification, design and
testing steps by providing more refined analytical tools.
The target application domain is computer vision where
AI-ML techniques are now ubiquitous.</p>
      <sec id="sec-1-1">
        <title>Characterizing AI components</title>
        <p>Artificial Intelligence (AI) is getting every year more
mature with potential applications to real world problems,
and possibly to safety critical systems. Machine Learning
(ML) is one of the most prominent set of AI techniques
used to design predictive functions, especially for high
dimensional inputs such as image, video, text or sound
and generally involves Deep Neural Networks (DNN),</p>
        <p>The exploitation of ML techniques introduces new
issues to ensure safety and trustworthiness when designing
or integrating AI based components: data quality
assessment, robustness to adversarial perturbations, formal
verification of DNNs, explainability, DNN calibration, etc.</p>
        <p>These research actions are complemented by the
production of a large number of position papers and reports
produced by academic, industrial and government
organizations or working groups (ISO, SAE, NHTSA, EASA,
HLEG of EU, DEEL, etc.); one of the main objectives
of which is to renew certification standards so that the
various phases of an industrial process (specification,
design, validation &amp; verification, deployment, integration,
operation, versioning, etc.) can accommodate AI.
compared to others. (2) it is dificult to gather all the good that mimic a given random distribution.
and bad operating conditions into one data set. There GANs exploit a representational latent space that can
have been some attempts to describe rather exhaustively be sampled from a known low-dimension distribution,
the possible hazards to families of algorithms [1], but often Gaussian, that is expected to encode enough
inforwhat these attempts in fact revealed was the complexity mation to generate complete images. Generation is then
to master. Test dataset replication experiences have also produced by a decoding network that is learned from
tarshown that for high dimensional data, performance mea- get data samples. Recent approaches [11, 12, 13] are now
sures can have large variance [2, 3]. (3) typical causes able to generate high quality high dimension data, with
of performance degradation of AI-ML components such a photo realistic rendering when applied to images, and
as distributional shift [4] and instability to small per- with good diversity and fidelity levels. One possible
apturbations [5, 6] are dificult to catch with a single test plication of generative models for safety objectives is to
dataset. augment data for testing various operational conditions</p>
        <p>Another approach to characterize a given component, as in [14].
inspired by software engineering practices, is to define The latent space can also be used as a way to
cona testing strategy “designed to reveal machine learning trol the generation process, for instance to edit images
bugs” [7]. For instance, [8] exploits a concept of neuron [15, 16, 17]. When correctly disentangled, the latent space
coverage inspired by test coverage in traditional software can be interpreted as a representation space where each
testing, to detect erroneous inputs. dimension encodes some interpretable visual attribute</p>
        <p>In our approach, we propose to combine these two dif- [16, 18]. In the case of face image generation, these
atferent strategies, data-driven evaluation and testing, in tributes could be hairstyle, head orientation, eye color,
order to characterize the behavior of a given function: we glasses, etc. Navigating in the representational latent
identify the influential causes of performance degrada- space can also be used to identify the attributes that
chartion by evaluating the performance on sets of generated acterize best a given class [19].
data that sample various data attributes, corruption or
nuisance.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Main contributions</title>
      </sec>
      <sec id="sec-1-3">
        <title>Generative models to explore data space</title>
        <p>Designing a probabilistic model in high dimensional data
space such as image, video or sound, able to faithfully
account for their diversity and informative features is a
dificult (impossible?) objective. Generative models such
as GANs [9] or generative invertible flows [ 10] is a series
of ML techniques that provides means to give access to
such a distribution by direct sampling. What is learned
is not the parameters of the probability density but the
parameters of a sampling process able to generate data
We show how to exploit generative models to finely
analyze the behavior of classifiers with high dimensional
input in order to:
• identify influential directions of performance
degradation that can be expressed both in the
data space and in a latent feature space of a
generative model;
• discover corner cases by exploring the directions
of degradation in the latent feature space;
• compare classifier performance on influential
data features.</p>
        <sec id="sec-1-3-1">
          <title>We focus in this paper on image classification as one of</title>
          <p>the paradigmatic decision problems of computer vision
with object detection and semantic segmentation, and
illustrate our method on a corrupted version of the MNIST
dataset [20] .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed approach</title>
      <p>The proposed approach is illustrated in Figure 1, where
we explore how the latent space of a generative model
diferentiates between data that are well and poorly
classified by a given classifier. In the following, we will briefly
describe the chosen generative model and its latent space
structure (Section 2.1); explain how to find the
dimensions of the latent space that diferentiate well-classified
from mis-classified data (Section 2.2); describe how to
manipulate images to visualize the attributes (Section
2.3); and see how we can estimate the accuracy of the
classifier conditionally to the location of the data in the
latent space (Section 2.4).</p>
      <sec id="sec-2-1">
        <title>2.1. Resources</title>
        <p>The current work is mainly based on two resources and
objects that we present upstream in the two next
subsections. They represent the theoretical and necessary tools
allow us to better detail our proposed approach.
2.1.1. Classifier and data
(a) t-SNE in 
(b) t-SNE in</p>
        <p>(c) t-SNE in 
datasets, and the various levels of latent spaces. Indeed,
three diferent latent spaces can be considered. The first
latent space, , is typically normally distributed like
many GANs and is the initial input space of the
generator. Samples z ∈  are forwarded to the intermediate
latent space  using fully connected layers, resulting in
a more disentangled representation than  [22]. Using
learned afine transformations, samples w ∈  are
specialized into styles that scale the convolution weights for
each feature map for each layer of the generator. A
generated image is the result of an initial learned constant
tensor that is up-sampled and transformed by residual
convolution layers that are modulated by the style vector.</p>
        <p>Images are generated from the style vector s by the
generator (s). The space of styles, called StyleSpace, shows
a high degree of disentanglement [16]. This latent space
 encodes distinct visual attributes along its dimensions
and is typically used for image editing. To give an idea of
the complexity of the generative model, in the original
StyleGAN2 version that generates images of size 10242,
 and  have 512 dimensions,  has 9088 dimensions,
and the initial constant tensor has a size of 42 with 512
channels.
2.1.2. Generative model
The first input to our approach is a learned image clas- 2.2. Finding influential dimensions in the
sifier to be analyzed. We assume that we have access latent space
to its architecture and weights (“white box”). We also
overcome what is the domain of application (handwritten The dimensions of the latent StyleSpace  are expected
digits, faces, indoor scenes, etc.), and have a correspond- to encode image attributes, such as shape, thickness,
oriing dataset available, not necessarily used for learning entation and noise, in a rather disentangled way. We
the classifier. exploit this property to define a simple search method
able to identify the most influential dimensions regarding
the accuracy of a given classifier.</p>
        <sec id="sec-2-1-1">
          <title>The second ingredient of our approach is a generative model that can be controlled meaningfully. In our work, we used the StyleGAN2 model [21] for a few reasons: the quality of generated data, its scalability to complex</title>
          <p>Gradient based approach The proposed strategy
ranks the dimensions according to the gradient of the
classifier output with respect to the StyleSpace input.
The idea is to score each dimension based on its ability
to lower the output score of the true class. More
precisely, for each sample s in the StyleSpace, for which
we know the true class, we generate the corresponding
image x = (s), and then classify it according to (x).
Then we compute the gradient with respect to the
dimension  in the style space of the -th classification
output: ∇ (((s)), where  is the index of the true
class encoded by s. The gradient can be computed exactly
by using an autograd algorithmic diferentiation
provided in standard Deep Learning software environments
– both the classifier and the generator being available
in such framework. We compute the average gradient
over a population of data as the score used to rank the
dimensions.</p>
          <p>(a) Top dimensions
(b) Random dimensions</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Image manipulation and corner cases</title>
        <p>Starting from an image where the latent space
representation – the style vector – is known, we can modify this
representation to generate a modified image. In fact, once
the influential dimensions are computed (see Section 2.2
above) and if we change the values of the style vector
for those dimensions, then we modify the corresponding
visual attributes for the generated image. Generating
data that follow a high performance degradation is a
simple heuristic: (1) we start from a given point s0 in the
StyleSpace, (2) we increment the influential dimension
by a given amount, and (3) we monitor the sign of the
increment being given by the sign of the gradient. Note
that the starting point s0 for exploration can be any point
in the StyleSpace: it can be a “true” style, computed by
mapping to  a random z sampled in the input latent
space , or any other point directly sampled in , for
instance, an average of a given population of s data. We
will use in the experiments (section 3) an “average” digit
in the StyleSpace computed as the mean over a class
conditional population.</p>
        <p>This data space exploration along influential
dimensions allows also the discovery of corner cases defined as
the smallest degradation that shifts the classifier output
from good to bad classification. The experiments in 3
will show several examples of corner cases discovered by
this approach.</p>
        <p>We can also use the latent space to understand better
the classifier accuracy. In Section 2.2 we described how
to find influential dimensions in the latent space. The
data along those dimensions thus potentially correlates
with the classifier accuracy. A classifier can be
characterized globally by its accuracy decrease when fed by
various amount of corruption produced by moving in the
StyleSpace along influential dimensions. The decreasing
slope characterizes globally the resilience to corruption
of a classifier and can be used for comparison.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Implementation</title>
        <sec id="sec-3-1-1">
          <title>We used the MNIST dataset [20] to evaluate our approach.</title>
          <p>More precisely, we augmented the original data by
introducing corruptions to simulate poor-quality data
acquisition that may have an influence on class prediction. In
particular, we chose Gaussian Noise and Gaussian Blur
from [23] because they have a significant impact on
classification accuracy for a classifier trained on clean data.</p>
          <p>Data are corrupted in the following way: the first half
of the dataset remains clean, and the second half is first
blurred (with a severity level randomly chosen between 1,
2 and 3), and noise is added (with a severity level also
randomly chosen between 1, 2 and 3). It ensures that most
of the samples remain visually recognizable. Random
samples are shown in Figure 2.</p>
          <p>The StyleGAN2 generative models contains three
different latent spaces. It is generally admitted that the
so-called StyleSpace  is a more disentangled represen- 3.2. Influential dimensions
tation space. We found that the well-classified and mis- We apply the method described in 2.2 to rank
dimenclassified samples are better separated in this space, even sions in the learned StyleSpace. In order to verify that
though the generation was not constrained in any way several dimensions have a bigger impact to performance
by the classifier. We can visualize this in Figure 3 by than others, we computed the histograms of the two
popusing t-SNE projection [24]. ulations  and  on each dimension. Figure 4</p>
          <p>After training on corrupted data, the classifier (a sim- depicts a selection of histograms. We see that the values
ple Convolutional Neural Network) reaches an accuracy for the top dimensions follow diferent distributions for
of 97% on the test data. The metric used to quantify the well-classified vs. mis-classified images, whereas random
performance of the generative model is the Fréchet Incep- dimensions do not discriminate, meaning that the
corretion Distance (FID) [25]. The generative model trained sponding style attribute does not influence performance.
on corrupted data reaches an FID of 1.63 (computed by Figure 5 shows the impact of manipulating the latent
comparing 50 generated images, unfiltered and without codes by shifting values along the most influential
diusing truncation, to the 60 images of the whole training mensions. Each column represents one of the top ten
dataset). This low value means a high generation quality. influential dimension and each line represents a diferent
A few samples are shown in Figure 2 demonstrating the shift value. We clearly observe various types of image
capacity of the generative model to encode various levels corruption that can be interpreted a posteriori when
inand nature of corruption. creasing the shift value: the first three dimensions seem
to introduce more noise, dimensions 4, 5 and 10 deform
the original shape, dimension 9 lowers the intensity,
ditrained on corrupted data. It also shows that it depends
on the class: some classes are more dificult to predict
than others.</p>
          <p>Using most or all dimensions of  to compute the
distance makes the curve not monotonically decreasing.</p>
          <p>It is better to use fewer dimensions, e.g. 100, as it makes
the accuracy curve decrease faster and monotonically.</p>
          <p>To make the curve clearer, we filtered out samples at
too high distance values, where the generation quality
decreases and the lower number of samples degrades the
accuracy computation .</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.4. Identification of corner cases</title>
        <p>As explained in Section 2.4, we can look at the classifier
accuracy evolution when fed with populations of various
corruption levels sampled in the StyleSpace. Figure 6
shows this evolution on two diferent classifiers. The
accuracy degradation is representative of the robustness to
corruptions of a classifier: the classifier trained on clean
data sees its accuracy decrease faster than the classifier
mensions 6, 7 and 8 introduce partial occlusions. Using
a generative model allows a large corruption
vocabulary, and in particular allows shape deformation, a capac- 4. Conclusion and perspectives
ity that is not available in filter-based frameworks like
Imagenet-C [23]. The current work addresses a relationship between data</p>
        <p>The last three lines of Figure 5 show images corre- quality and model performance by exploring the latent
sponding to a steep decrease of the classifier output score feature space of a generative model. Indeed, using our
(from 1.0 to 0). This is where the class prediction shifts approach, we are able to identify the influential
direcand where the generated image can be considered as a tions which deteriorate the classifier performances and
corner case (see section 3.4). discover corner cases in this space. The proposed
approach is based on ranking the latent space dimensions
using the classifier output gradient with respect to the
3.3. Accuracy in the latent space StyleSpace input.</p>
        <p>Our results show the impact and the influence of each
identified direction in terms of performance degradation
on the classifier. These identified directions, separately or
jointly, allow a visual account of the degradation which
could help in the interpretability and explainability of
deep learning classifiers.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>This work has been supported by the French government</title>
        <p>under the "Investissements d’avenir” program, as part of
the SystemX Technological Research Institute.
This work was granted access to the HPC/AI resources
of IDRIS under the allocation 2022-AD011013372 made
by GENCI.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Despite the first promising conclusions of this work,</title>
        <p>our approach has been demonstrated only for generated
and synthetic images. Its application to real data requires
a capacity to encode – or invert – any data in the latent
space [26], to be able to apply the degradation encoded
by the influential directions.</p>
        <p>Another perspective, is to evaluate our approach on
more complex data to identify other types of degradation
attributes. Recent works on image manipulation show
that visual attributes can be controlled for more complex
images [16, 18, 15, 17] and that generative models can be
applied to larger datasets such as ImageNet [27]. Those
two advances indicate the possibility of scaling-up our
approach.
P. Liang, J. W. Vaughan (Eds.), Advances in Neural
Information Processing Systems, volume 34,
Curran Associates, Inc., 2021, pp. 16331–16345. URL:
https://proceedings.neurips.cc/paper/2021/lfie/
880610aa9f9de9ea7c545169c716f477-Paper.pdf.
[18] O. Patashnik, Z. Wu, E. Shechtman, D.
CohenOr, D. Lischinski, Styleclip: Text-driven
manipulation of stylegan imagery, 2021. URL: https://arxiv.
org/abs/2103.17249. doi:10.48550/ARXIV.2103.
17249.
[19] O. Lang, Y. Gandelsman, M. Yarom, Y. Wald, G.
Elidan, A. Hassidim, W. T. Freeman, P. Isola, A.
Globerson, M. Irani, I. Mosseri, Explaining in style:
Training a gan to explain a classifier in stylespace,
2021. URL: https://arxiv.org/abs/2104.13369. doi:10.
48550/ARXIV.2104.13369.
[20] Y. LeCun, C. Cortes, C. Burges, Mnist
handwritten digit database, ATT Labs [Online]. Available:
http://yann.lecun.com/exdb/mnist 2 (2010).
[21] T. Karras, M. Aittala, J. Hellsten, S. Laine, J.
Lehtinen, T. Aila, Training generative adversarial
networks with limited data, 2020. URL: https://arxiv.
org/abs/2006.06676. doi:10.48550/ARXIV.2006.
06676.
[22] T. Karras, S. Laine, T. Aila, A style-based generator
architecture for generative adversarial networks,
2018. URL: https://arxiv.org/abs/1812.04948. doi:10.
48550/ARXIV.1812.04948.
[23] D. Hendrycks, T. Dietterich, Benchmarking
neural network robustness to common corruptions
and perturbations, 2019. URL: https://arxiv.org/abs/
1903.12261. doi:10.48550/ARXIV.1903.12261.
[24] L. van der Maaten, G. Hinton, Visualizing data
using t-sne, Journal of Machine Learning Research
9 (2008) 2579–2605. URL: http://jmlr.org/papers/v9/
vandermaaten08a.html.
[25] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler,
S. Hochreiter, Gans trained by a two time-scale
update rule converge to a local nash equilibrium,
Advances in neural information processing systems
30 (2017).
[26] W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou,
M.-H. Yang, GAN Inversion: A Survey,
2022. doi:10.48550/arXiv.2101.05278.
arXiv:2101.05278.
[27] A. Sauer, K. Schwarz, A. Geiger, Stylegan-xl:
Scaling stylegan to large diverse datasets, arXiv
preprint arXiv:2202.00273 (2022).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>