Explaining Convolutional Neural Networks by Tagging
Filters
Anna Nguyen1 , Daniel Hagenmayer1 , Tobias Weller2 and Michael Färber1
1
    Karlsruhe Institute of Technology (KIT), Institute AIFB, Karlsruhe, Germany
2
    University of Mannheim, Mannheim, Germany


                                          Abstract
                                          Convolutional neural networks (CNNs) have achieved astonishing performance on various image classification tasks, but it
                                          is difficult for humans to understand how a classification comes about. Recent literature proposes methods to explain the
                                          classification process to humans. These focus mostly on visualizing feature maps and filter weights, which are not very
                                          intuitive for non-experts. In this paper, we propose FilTag, an approach to effectively explain CNNs even to non-experts. The
                                          idea is that if images of a class frequently activate a convolutional filter, that filter will be tagged with that class. Based on the
                                          tagging, individual image classifications can then be intuitively explained using the tags of the filters that the input image
                                          activates. Finally, we show that the tags are useful in analyzing classification errors caused by noisy input images and that
                                          the tags can be further processed by machines.

                                          Keywords
                                          CNN, images, explainable AI, semantic interpretability


1. Introduction
Deep convolutional neural networks (CNNs) are the state-
of-the-art machine learning technique for image classifi-                                                                  …                 …                 parrot

cation [1, 2]. In contrast to traditional feed-forward neu-

                                                                                                                                    …


                                                                                                                                                     …
                                                                                                                   …


ral networks, CNNs have layers that perform a convolu-
tional step (see Figure 2 for the relations in a convolution).                                                                                                         most
Filters are used in a convolutional step which outputs a                                                                                                             activated
feature map in which activated neurons highlight certain                                                                dog,                                           filter
                                                                                                                        parrot,
patterns of the input image. Although CNNs achieve                                                                      cat
high accuracy on many classification tasks, these models                                                                   …                 …                 parrot
do not provide an explanation (i.e., decisive information)
of the classifications. Thus, researchers recently focused
                                                                                                                                    …


                                                                                                                                                     …
                                                                                                                   …


                                                                                                                                          parrot
on methods to explain how CNNs classify images.                                                                        toucan,
                                                                                                                       parrot
Related Work. Some of the earliest works on explaining
CNNs focus on visualizing the activations of individual                                                Figure 1: Explanations of Convolutional Filters. The upper
neurons [3, 4]. However, these methods cannot explain                                                  part shows a visual explanation. The lower part contains an
more complex relationships between multiple neurons,                                                   example of our tagging approach FilTag.
as no human-understandable explanation is used. Olah
et al. [5] defined a semantic dictionary by pairing every
neuron activation with its abstract visualization using a                                                        image, but it does not explain the role of that channel
channel attribution, determining how much each channel                                                           across all possible input images. Hohman et al. [6] try to
contributes to the classification result. This may explain                                                       overcome this problem by aggregating particularly im-
the role of a channel in the classification of an individual                                                     portant neurons and identifying relations between them.
                                                                                                                 Other approaches focus on filters, the discerning feature
AIMLAI’22: Advances in Interpretable Machine Learning and Artificial
Intelligence (AIMLAI@CIKM’22), October 21, 2022, Atlanta, Georgia,
                                                                                                                 of CNNs. For example, Zeiler and Fergus [7] visualize the
USA                                                                                                              filter weights to illustrate the patterns these filters detect.
Envelope-Open anna.nguyen@kit.edu (A. Nguyen);                                                                   However, these visualizations are based on the inputs
daniel.hagenmayer@student.kit.edu (D. Hagenmayer);                                                               of the layers to which the respective filter belongs to.
tobi@informatik.uni-mannheim.de (T. Weller);                                                                     Thus, only the filter patterns of the first layer can be di-
michael.faerber@kit.edu (M. Färber)
Orcid 0000-0001-9004-2092 (A. Nguyen); 0000-0001-5458-8645
                                                                                                                 rectly associated with patterns on the input image of the
(M. Färber)                                                                                                      network. To overcome this, the method Net2Vec [8] quan-
                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                    Attribution 4.0 International (CC BY 4.0).
                                                                                                                 tifies how concepts are encoded by filters by examining
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
filter embeddings. Alternatively, Network Dissection [9]
uses human-labeled visual concepts to bring semantics
to the convolutional layers. However, visualizations and                                          convolution
embedding filters only explain the outcome of a model                                                                 feature maps
implicitly, whereas we assign explicit tags to filters which           input image
can be understood by non-experts. Most visualizations                                             convolution
used for explaining CNNs are similar to the upper ex-                                                                 Neuron
ample in Figure 1, which visualizes the most activated                                              =
convolutional filters. Clearly, such visualizations are dif-
                                                                                     *
                                                                                         filter
ficult to understand on their own. Adding an explicit                                                   feature map
explanation such as a semantic tag (e.g. “dog,” “parrot,”          RGB channels
“cat,” or “toucan”) as shown in the bottom example would                                                              feature maps
dramatically improve the explanation, including for non-                             *              =
                                                                                         filter
experts.
                                                                                                        feature map
Contribution. Our contribution is threefold. First, we
introduce FilTag, an automatic approach to explain the           Figure 2: Terminology of a filter in a convolution.
role of each convolutional filter of a CNN to non-expert
humans. We use the fact that each filter is dedicated to
a specific set of classes [7, 10, 11, 12]. Indeed, the idea
of FilTag is to quantify how much a filter is dedicated     images of each class. In the second step, we use this
to a class, and then tag each convolutional filter with     information to tag the filters.
a set of particularly important classes. The lower part     Quantifying Filter Activations. Feature maps with
of Figure 1 shows an example of what a CNN tagged in        high activations can be used as an indication of the im-
this way could look like. In that example, the rightmost    portance of the preceding filter for the input image [6, 7].
                                                            Traditional explanation approaches focus on one image
filter highlighted in red plays a role in classifying parrots,
whereas the filter in the middle only plays a role in clas- and therefore use the most activated feature map while
sifying birds in general, as both, toucans and parrots are  our approach focuses on a set of images of the same class.
                                                            Given a pre-trained CNN with a set of convolutional lay-
both birds. This filter extracts features that are specific to
these classes (e.g. wings, feathers, etc.). Second, our ap- ers 𝑀 with its respective set of filters 𝐼(⋅) and a labeled
proach can also be used to explain the classification of an data set 𝐷 with labels 𝑐 ∈ 𝐶 from a set of labels 𝐶, let
individual image. In the example in Figure 1, the classifi- 𝑑 ∈ 𝐷 be an input image and 𝑚 ∈ 𝑀 a convolutional
cation of the input image as a parrot would be explained    layer. First, we collect the activations in the feature map
by the union of the tags of the activated filters, which    to get the importance of the filters regarding an input
are all animals, particularly tagged with parrot. Third,    image, i.e. the output in the feature map for a given filter
FilTag is suitable to analyze classification errors. We ana-(see terminology in Figure 2). Second, we scale these
lyze our approach with thorough experimentation using       activations per layer between [0, 1]. In scaling the ac-
multiple CNNs, including VGG16, as well as ImageNet         tivations, we ensure that no image is overrepresented
as a data set. All source code is available online.1        with overall high activation values. We scale the activa-
                                                            tions per layer because each layer has its specific pattern
                                                            compositionality of filters. For example, the first convo-
2. Approach                                                 lutional layers detect simple patterns such as lines and
                                                            edges whereas the layers, in the end, detect compositional
In Section 2.1, we propose a method to provide explana- structures which match better to human-understandable
tions based on the role of each filter in a CNN (indepen- objects [7]. Let 𝑎(𝑚, 𝑖, 𝑑, 𝑗) be such a scaled activation
dent from concrete input images) using our concept of in the 𝑗th element in the feature map calculated from
filter tags. Then, in Section 2.2, we explain how a partic- image 𝑑 and filter 𝑖 ∈ 𝐼𝑚 in convolutional layer 𝑚. In
ular input image can be explained, namely in terms of order to get a total activation value per feature map, we
                                                                                      𝑛
the filters that it activates.                                        ̄ 𝑖, 𝑑) = 1𝑛 ∑𝑗 𝑎(𝑚, 𝑖, 𝑑, 𝑗), 0 ≤ 𝑎(𝑚,
                                                            define 𝑎(𝑚,                                   ̄ 𝑖, 𝑑) ≤ 1, as
                                                            the arithmetic mean of the scaled activations in a feature
2.1. Explanations of Filters                                map where 𝑛 is the number of activations in the feature
                                                            map. We do this for all filters 𝑖 ∈ 𝐼𝑚 and repeat these
Our explanation of filters works in two steps. In the first steps for all layers 𝑚 ∈ 𝑀.
step, we quantify how much each filter is activated by Next, we use the labels as the desired explanation. Let
                                                            𝑑𝑐 be an input image with label 𝑐. We define 𝑧𝑐 (𝑚, 𝑖) =
                                                              1    |𝐷 |
1
  https://github.com/michaelfaerber/FilTag                       ∑ 𝑐 𝑎(𝑚,
                                                             |𝐷 | 𝑑𝑐
                                                                        ̄ 𝑖, 𝑑𝑐 ), 0 ≤ 𝑧𝑐 (𝑚, 𝑖) ≤ 1 as arithmetic mean
                                                                   𝑐
of 𝑎(𝑚,
     ̄ 𝑖, 𝑑𝑐 ) over one class 𝑐 where |𝐷𝑐 | is the number of as ConceptNet [13] or FAIRnets [14] can bring more in-
images in class 𝑐. This way, 𝑧𝑐 (𝑚, 𝑖) is the averaged value sights. ConceptNet is a semantic network with meanings
of all activations of the images in one class respective its of words and FAIRnets is a neural network graph with
filter 𝑖 in layer 𝑚. Thus, we can rank the classes according metadata about the architecture. For example, in Figure 1,
to the highest averaged activation of the filter per layer if we input an image of a car but the most activated filters
which will be the decisive criterion for the labeling. We, have tags of animals, we can conclude that the wrong
therefore, compare the received values for each feature filters were activated.
map. We repeat these steps for all images in 𝐷 per label
class.
Filter Tagging. We tag the filters according to their 3. Experiment
corresponding values received in 𝑧𝑐 (𝑚, 𝑖) with the label
of the input image class. We are interested in the feature 3.1. Experimental Setup
maps with high activations of a certain class because Data Set. Following related work, we use ImageNet [16]
they indicate important features associated with that from ILSVRC 2014 to conduct experiments on the
class [6]. We define two methods to select those feature introduced approach. This data set contains over one
maps per class and per layer (because of the mentioned million images and 1, 000 possible class labels including
complexity in different layers): (i) 𝑘-best-method (choose animals, plants, and persons. Each class contains
the 𝑘 feature maps with highest activation values) and approximately 1, 200 images. We use a holdout split,
(ii) 𝑞-quantile-method (choose the 𝑞-quantile of feature using 80% of the images to tag the filters, while ensuring
maps with highest activation values). These tags serve that there were at least 500 images from each class in
as an explanation of what the filter does. For example, in the set, and the remaining 20% to test the explanations.
Figure 1, the leftmost activated filter has the three tags Baseline. We compare our approach with two state-of-
dog, parrot and cat, which suggests that this filter plays the-art visualization methods in explaining neural net-
a role in recognizing animals.                               works. The selection of the methods was based on their
                                                             focus on feature visualization. One of the methods used
2.2. Explanations of Individual                              provided the fundamental basis of visualization of fea-
        Classifications                                      tures and uses minimal regularization [15], the other
                                                             method uses optimization objectives [4].
While previous visual methods for explaining filters are Implementation. We implemented our method in
difficult for humans to understand, textual assignment Python3 and used TensorFlow as deep learning library.
can lead to unambiguous explanations (as later seen in The experiments were performed on a server with In-
our experiments in Figure 3). To get an explanation given tel(R) Xeon(R) Gold 6142 CPU@2.60 GHz, 16 physical
an input, we assume that the tags have a better informa- cores, 188GB RAM and GeForce GTX 1080 Ti. We used
tion value with the classification of the CNN if the tags pre-trained neural network models from Keras Appli-
match with the classification output. Therefore, we want cations. The filters of a VGG16 were explained in the
to measure the hit of the prediction with the tags in the experiments using the introduced method. VGG16 was
most activated filters. To do this, we determine the most used as CNN as it is frequently used in various computer
frequently occurring labels for each image of a class ac- vision applications. We also evaluated on VGG19 and
cording to the previous mentioned method using the InceptionNet but omit them due to page limitations.
metric Hits@𝑛. Hits@𝑛 measures how many positive la-
bel tags are ranked in the top-𝑛 positions. For example, in
Figure 1, the classification of the input image as a parrot
                                                             3.2. Analysis of the Explanations
is explained by its high activation of filters tagged with In this analysis, we want to study the explanations of
parrot.                                                      the filters using 𝑘-best-method, with 𝑘 = 1, in order to
                                                             provide a better comparison with the state-of-the-art
2.3. Analysis of Classification Errors                       methods since they frequently visualize the most acti-
                                                             vated feature map. Figure 3 shows exemplary the visual
FilTag can be used for error analysis using Hits@𝑛. Tak- explanations of the baseline methods, and the tags of our
ing misclassified input images, Hits@𝑛 indicates if the approach FilTag. As shown, the visual explanations of
most relevant filters were activated. If Hits@𝑛 is high, we the baseline methods [15, 4] do not provide satisfactory
can assume that there are similar features of the misclas- comprehension. At first sight, there is not much to under-
sified class and original image. Analyzing the tags, we stand. Considering our tags, one can imagine what the
may find correlations in their semantics. Furthermore, visualizations display. We additionally include pictures
linking the tags and filters to knowledge graphs such corresponding to our tags, to show the information value
                                                             compared to only visualizations of the filters. Filter 95
Figure 3: Comparison of filter explanations of the last conv. layer of VGG16 [1]. The visualizations of the baseline methods
[15, 4] are ambiguous and difficult to interpret. Our approach FilTag allows a more precise understanding which features the
filters detect. Pictures corresponding to our tags were added to show the information value.


seems to recognize a lampshade especially a trapezoidal                  0.8                                           k=1
shape. Filter 150 is only tagged with cannon, i.e. the filter            0.7                                           k=5
                                                                                                                       k=25
is specific for this class. Filter 288 detects a head of a               0.6                                           q=1%
goldfinch especially with consideration of the yellow and                                                              q=5%
                                                                         0.5                                           q=25%
                                                                Hits@n


black pattern. Filter 437 and Filter 462 recognize ears of
brown dogs and the body of snakes, respectively. This                    0.4
information would be hard to retrieve without the tags.                  0.3
Even without considering the visualizations, one has a                   0.2
good impression of what a filter detects. For example, it
                                                                         0.1
is quite impressive that Filter 288 detects this black yel-                    0   10   20       30   40     50
low pattern which we can follow from the tags goldfinch,                                     n
toucan, and european fire salamander. As well, Filter 95
detects the trapezoid in table lamp, yurt, and lampshade.        Figure 4: Hits@𝑛 with different 𝑘 and 𝑞 on ImageNet
   In addition to comparing our method to the state-of-
the-art methods in CNN explanations, we linked the tags
to concepts from ConceptNet [13] to achieve a coarsen- accuracy. If the labels, and thus Hits@𝑛, do not correlate
ing of common tags. ConceptNet is a semantic network   with the output of the neural network, and thus with the
with meanings of words. This comparison revealed that  accuracy, then the filters have not been tagged sensibly
many tags have both visual and semantic commonali-     with our approach to gain an accurate explanation. We
ties (e.g., see Filter 437 in Figure 3, rhodesian ridgeback,
                                                       will interpret Hits@𝑛 and accuracy with different hyper-
bloodhound and redbone are all of type dog). Following parameters 𝑘 and 𝑞, respectively. In Figure 4, we compute
this evaluation process, we manually reviewed 100 filters
                                                       Hits@𝑛 with the test set from ImageNet depending on 𝑘
in the context of common visual and semantic common-   and 𝑞. We can see that Hits@𝑛 increases for increasing 𝑘,
alities. Here we found 88% conformance with common     𝑞 and 𝑛. For 𝑞 = 25% and 𝑛 = 50, we even get a hit rate of
tags in the filters.                                   80% over all 1,000 object classes. This result shows that
                                                       FilTag can be taken as a significant explanation for the
3.3. Impact of Hyperparameters                         classification. For example, we have observed that the
                                                       class shoji gets the highest hit rate of 98.47% followed by
In the following we evaluate which impact the hyperpa- the classes slot, odometer and entertainment center with
rameters 𝑘 and 𝑞 have on the correlation of Hits@𝑛 and
also around 98%. This correlates with the likelihood of
the best classes, which are exactly the same classes: shoji
(81.22%), slot (92.30%), odometer (91.73%) and entertain-
ment center (82.89%). Likewise, Hits@𝑛 also correlates
with the accuracy of the worst classes, which are spatula,
schipperke, reel, bucket, and hatchet. These results fit to
the top-1 accuracy of VGG16 with 74, 4% for all classes.
The high correlation with Hits@𝑛 and accuracy shows
                                                                            (a) Mortarboard      (b) Computer
that the relevant features, labeled by our approach, are
                                                               Figure 5: Example images from ImageNet
in fact detected from the images, which confirms the hy-
pothesis that the tags are useful to generate explanations
by means of our approach. However, for larger values           neural network to assign it correctly. Moreover, it is
of 𝑞 we observed that the interpretability decreases           an old computer, whereas the other images in ImageNet
because the number of tags increases for each filter. This     generally represent rather modern computers. In order
makes it harder to find similarities between the classes.      to classify this image correctly, further images showing
Thus, there is a trade-off between expressiveness for the      old computers from the side have to be included to
classification and interpretability for the filters.           change the distribution and train the VGG16 to classify
                                                               this image correctly.

3.4. Using the Explanations
                                                               4. Conclusion
FilTag can be used for error analysis using Hits@𝑛. Tak-
ing misclassified input images, Hits@𝑛 indicates if the        We have introduced FilTag, an approach to provide
most relevant filters were activated. If Hits@𝑛 is high,       human-understandable explanations of convolutional fil-
we can assume that there are similar features of the mis-      ters and individual image classifications. These tags can
classified class and original image. Analyzing the tags,       be used to query and identify specific filters that are
we may find correlations in their semantics.                   relevant for feature detection. In contrast to state-of-
   Figure 5 (a) shows an image of the class mortarboard        the-art explanations, our approach allows for explicit,
in ImageNet. Using VGG16, the class academic gown is           non-visual explanations which are more understandable
predicted with a confidence of 83.8%, while the actual         for non-experts.
class mortarboard is predicted with a confidence of only          A limitation of our approach is the use of the class
16.2%. Considering the image, we notice that both ob-          labels as tags to describe the filters. As a result, filters
jects are part of this image, making this result reasonable.   are not described in terms of specific objects such as ears,
Reviewing the activated filters, we observe that filters       wings, or legs. We would like to address this limitation
tagged by FilTag with the tag mortarboard, as well as          in the future by using ConceptNet and other knowledge
with the tag academic gown, are usually activated. As          bases to identify commonalities of the tags and thus add
a result, we can verify that features are extracted from       specific object descriptions to the filters.
these two classes and used for prediction. This allows to
give non-experts an understanding of the reason for the
misclassification, as often features of the other class are    References
extracted from this image. Likewise, we can use the in-
formation to increase the number of images in which the         [1] K. Simonyan, A. Zisserman, Very Deep Convolu-
mortarboard is the actual class but not in the main focus           tional Networks for Large-Scale Image Recognition,
of the image, in order to continue learning the network             in: 3rd International Conference on Learning Rep-
to make the predictions more accurate.                              resentations, ICLR 2015, 2015.
   Figure 5 (b) shows an image from the class computer.         [2] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wo-
This image is classified by VGG16 as cash machine with              jna, Rethinking the Inception Architecture for Com-
a probability of 99%. Looking at the tagged filters, filters        puter Vision, in: 2016 IEEE Conference on Com-
of the tags cash machine are mostly activate, followed              puter Vision and Pattern Recognition, CVPR 2016,
by screen, CD player, and file. Considering Figure 5 (b)            2016, pp. 2818–2826.
and having knowledge about the other images of the              [3] G. Montavon, W. Samek, K.-R. Müller, Methods
class computer in ImageNet, the reason this image is not            for interpreting and understanding deep neural net-
assigned to this class becomes clear. Generally, frontal            works, Digital Signal Processing 73 (2018) 1–15.
images of a computer were used for the computer class           [4] C. Olah, A. Mordvintsev, L. Schubert, Feature Visu-
for learning. However, this image does not correspond               alization, Distill (2017).
to the same distribution. Thus, it is difficult for the
 [5] C. Olah, A. Satyanarayan, I. Johnson, S. Carter,
      L. Schubert, K. Ye, A. Mordvintsev, The Building
      Blocks of Interpretability, Distill (2018).
 [6] F. Hohman, H. Park, C. Robinson, D. H. P. Chau,
      Summit: Scaling deep learning interpretability by
     visualizing activation and attribution summariza-
      tions, IEEE transactions on visualization and com-
      puter graphics 26 (2019) 1096–1106.
 [7] M. D. Zeiler, R. Fergus, Visualizing and Understand-
      ing Convolutional Networks, in: Computer Vision
     - ECCV 2014, volume 8689 of Lecture Notes in Com-
      puter Science, 2014, pp. 818–833.
 [8] R. Fong, A. Vedaldi, Net2Vec: Quantifying and Ex-
      plaining How Concepts Are Encoded by Filters in
      Deep Neural Networks, in: Conference on Com-
      puter Vision and Pattern Recognition, CVPR 2018,
      2018, pp. 8730–8738.
 [9] D. Bau, B. Zhou, A. Khosla, A. Oliva, A. Torralba,
      Network Dissection: Quantifying Interpretability
      of Deep Visual Representations, in: Conference on
      Computer Vision and Pattern Recognition, CVPR
      2017, 2017, pp. 3319–3327.
[10] R. B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich
      Feature Hierarchies for Accurate Object Detection
      and Semantic Segmentation, in: Conference on
      Computer Vision and Pattern Recognition, CVPR
      2014, 2014, pp. 580–587.
[11] K. Simonyan, A. Vedaldi, A. Zisserman, Deep In-
      side Convolutional Networks: Visualising Image
      Classification Models and Saliency Maps, in: 2nd
      International Conference on Learning Representa-
      tions, ICLR 2014, 2014.
[12] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. A.
      Riedmiller, Striving for Simplicity: The All Convo-
      lutional Net, in: 3rd International Conference on
      Learning Representations, ICLR 2015, 2015.
[13] R. Speer, J. Chin, C. Havasi, ConceptNet 5.5: An
      Open Multilingual Graph of General Knowledge, in:
      Proceedings of the Thirty-First AAAI Conference
      on Artificial Intelligence, 2017, pp. 4444–4451.
[14] A. Nguyen, T. Weller, M. Färber, Y. Sure-Vetter, Mak-
      ing Neural Networks FAIR, in: Knowledge Graphs
      and Semantic Web - Second Iberoamerican Confer-
      ence and First Indo-American Conference, KGSWC
      2020, volume 1232 of Communications in Computer
      and Information Science, 2020, pp. 29–44.
[15] D. Erhan, Y. Bengio, A. Courville, P. Vincent, Visu-
      alizing Higher-Layer Features of a Deep Network,
     Technical Report, Univeristé de Montréal (2009).
[16] O. Russakovsky, J. Deng, H. Su, J. Krause,
      S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
     A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei, Im-
      ageNet Large Scale Visual Recognition Challenge,
      International Journal of Computer Vision (IJCV)
     115 (2015) 211–252.