Explaining Convolutional Neural Networks by Tagging Filters Anna Nguyen1 , Daniel Hagenmayer1 , Tobias Weller2 and Michael Färber1 1 Karlsruhe Institute of Technology (KIT), Institute AIFB, Karlsruhe, Germany 2 University of Mannheim, Mannheim, Germany Abstract Convolutional neural networks (CNNs) have achieved astonishing performance on various image classification tasks, but it is difficult for humans to understand how a classification comes about. Recent literature proposes methods to explain the classification process to humans. These focus mostly on visualizing feature maps and filter weights, which are not very intuitive for non-experts. In this paper, we propose FilTag, an approach to effectively explain CNNs even to non-experts. The idea is that if images of a class frequently activate a convolutional filter, that filter will be tagged with that class. Based on the tagging, individual image classifications can then be intuitively explained using the tags of the filters that the input image activates. Finally, we show that the tags are useful in analyzing classification errors caused by noisy input images and that the tags can be further processed by machines. Keywords CNN, images, explainable AI, semantic interpretability 1. Introduction Deep convolutional neural networks (CNNs) are the state- of-the-art machine learning technique for image classifi- … … parrot cation [1, 2]. In contrast to traditional feed-forward neu- … … … ral networks, CNNs have layers that perform a convolu- tional step (see Figure 2 for the relations in a convolution). most Filters are used in a convolutional step which outputs a activated feature map in which activated neurons highlight certain dog, filter parrot, patterns of the input image. Although CNNs achieve cat high accuracy on many classification tasks, these models … … parrot do not provide an explanation (i.e., decisive information) of the classifications. Thus, researchers recently focused … … … parrot on methods to explain how CNNs classify images. toucan, parrot Related Work. Some of the earliest works on explaining CNNs focus on visualizing the activations of individual Figure 1: Explanations of Convolutional Filters. The upper neurons [3, 4]. However, these methods cannot explain part shows a visual explanation. The lower part contains an more complex relationships between multiple neurons, example of our tagging approach FilTag. as no human-understandable explanation is used. Olah et al. [5] defined a semantic dictionary by pairing every neuron activation with its abstract visualization using a image, but it does not explain the role of that channel channel attribution, determining how much each channel across all possible input images. Hohman et al. [6] try to contributes to the classification result. This may explain overcome this problem by aggregating particularly im- the role of a channel in the classification of an individual portant neurons and identifying relations between them. Other approaches focus on filters, the discerning feature AIMLAI’22: Advances in Interpretable Machine Learning and Artificial Intelligence (AIMLAI@CIKM’22), October 21, 2022, Atlanta, Georgia, of CNNs. For example, Zeiler and Fergus [7] visualize the USA filter weights to illustrate the patterns these filters detect. Envelope-Open anna.nguyen@kit.edu (A. Nguyen); However, these visualizations are based on the inputs daniel.hagenmayer@student.kit.edu (D. Hagenmayer); of the layers to which the respective filter belongs to. tobi@informatik.uni-mannheim.de (T. Weller); Thus, only the filter patterns of the first layer can be di- michael.faerber@kit.edu (M. Färber) Orcid 0000-0001-9004-2092 (A. Nguyen); 0000-0001-5458-8645 rectly associated with patterns on the input image of the (M. Färber) network. To overcome this, the method Net2Vec [8] quan- © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). tifies how concepts are encoded by filters by examining CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) filter embeddings. Alternatively, Network Dissection [9] uses human-labeled visual concepts to bring semantics to the convolutional layers. However, visualizations and convolution embedding filters only explain the outcome of a model feature maps implicitly, whereas we assign explicit tags to filters which input image can be understood by non-experts. Most visualizations convolution used for explaining CNNs are similar to the upper ex- Neuron ample in Figure 1, which visualizes the most activated = convolutional filters. Clearly, such visualizations are dif- * filter ficult to understand on their own. Adding an explicit feature map explanation such as a semantic tag (e.g. “dog,” “parrot,” RGB channels “cat,” or “toucan”) as shown in the bottom example would feature maps dramatically improve the explanation, including for non- * = filter experts. feature map Contribution. Our contribution is threefold. First, we introduce FilTag, an automatic approach to explain the Figure 2: Terminology of a filter in a convolution. role of each convolutional filter of a CNN to non-expert humans. We use the fact that each filter is dedicated to a specific set of classes [7, 10, 11, 12]. Indeed, the idea of FilTag is to quantify how much a filter is dedicated images of each class. In the second step, we use this to a class, and then tag each convolutional filter with information to tag the filters. a set of particularly important classes. The lower part Quantifying Filter Activations. Feature maps with of Figure 1 shows an example of what a CNN tagged in high activations can be used as an indication of the im- this way could look like. In that example, the rightmost portance of the preceding filter for the input image [6, 7]. Traditional explanation approaches focus on one image filter highlighted in red plays a role in classifying parrots, whereas the filter in the middle only plays a role in clas- and therefore use the most activated feature map while sifying birds in general, as both, toucans and parrots are our approach focuses on a set of images of the same class. Given a pre-trained CNN with a set of convolutional lay- both birds. This filter extracts features that are specific to these classes (e.g. wings, feathers, etc.). Second, our ap- ers 𝑀 with its respective set of filters 𝐼(⋅) and a labeled proach can also be used to explain the classification of an data set 𝐷 with labels 𝑐 ∈ 𝐶 from a set of labels 𝐶, let individual image. In the example in Figure 1, the classifi- 𝑑 ∈ 𝐷 be an input image and 𝑚 ∈ 𝑀 a convolutional cation of the input image as a parrot would be explained layer. First, we collect the activations in the feature map by the union of the tags of the activated filters, which to get the importance of the filters regarding an input are all animals, particularly tagged with parrot. Third, image, i.e. the output in the feature map for a given filter FilTag is suitable to analyze classification errors. We ana-(see terminology in Figure 2). Second, we scale these lyze our approach with thorough experimentation using activations per layer between [0, 1]. In scaling the ac- multiple CNNs, including VGG16, as well as ImageNet tivations, we ensure that no image is overrepresented as a data set. All source code is available online.1 with overall high activation values. We scale the activa- tions per layer because each layer has its specific pattern compositionality of filters. For example, the first convo- 2. Approach lutional layers detect simple patterns such as lines and edges whereas the layers, in the end, detect compositional In Section 2.1, we propose a method to provide explana- structures which match better to human-understandable tions based on the role of each filter in a CNN (indepen- objects [7]. Let 𝑎(𝑚, 𝑖, 𝑑, 𝑗) be such a scaled activation dent from concrete input images) using our concept of in the 𝑗th element in the feature map calculated from filter tags. Then, in Section 2.2, we explain how a partic- image 𝑑 and filter 𝑖 ∈ 𝐼𝑚 in convolutional layer 𝑚. In ular input image can be explained, namely in terms of order to get a total activation value per feature map, we 𝑛 the filters that it activates. ̄ 𝑖, 𝑑) = 1𝑛 ∑𝑗 𝑎(𝑚, 𝑖, 𝑑, 𝑗), 0 ≤ 𝑎(𝑚, define 𝑎(𝑚, ̄ 𝑖, 𝑑) ≤ 1, as the arithmetic mean of the scaled activations in a feature 2.1. Explanations of Filters map where 𝑛 is the number of activations in the feature map. We do this for all filters 𝑖 ∈ 𝐼𝑚 and repeat these Our explanation of filters works in two steps. In the first steps for all layers 𝑚 ∈ 𝑀. step, we quantify how much each filter is activated by Next, we use the labels as the desired explanation. Let 𝑑𝑐 be an input image with label 𝑐. We define 𝑧𝑐 (𝑚, 𝑖) = 1 |𝐷 | 1 https://github.com/michaelfaerber/FilTag ∑ 𝑐 𝑎(𝑚, |𝐷 | 𝑑𝑐 ̄ 𝑖, 𝑑𝑐 ), 0 ≤ 𝑧𝑐 (𝑚, 𝑖) ≤ 1 as arithmetic mean 𝑐 of 𝑎(𝑚, ̄ 𝑖, 𝑑𝑐 ) over one class 𝑐 where |𝐷𝑐 | is the number of as ConceptNet [13] or FAIRnets [14] can bring more in- images in class 𝑐. This way, 𝑧𝑐 (𝑚, 𝑖) is the averaged value sights. ConceptNet is a semantic network with meanings of all activations of the images in one class respective its of words and FAIRnets is a neural network graph with filter 𝑖 in layer 𝑚. Thus, we can rank the classes according metadata about the architecture. For example, in Figure 1, to the highest averaged activation of the filter per layer if we input an image of a car but the most activated filters which will be the decisive criterion for the labeling. We, have tags of animals, we can conclude that the wrong therefore, compare the received values for each feature filters were activated. map. We repeat these steps for all images in 𝐷 per label class. Filter Tagging. We tag the filters according to their 3. Experiment corresponding values received in 𝑧𝑐 (𝑚, 𝑖) with the label of the input image class. We are interested in the feature 3.1. Experimental Setup maps with high activations of a certain class because Data Set. Following related work, we use ImageNet [16] they indicate important features associated with that from ILSVRC 2014 to conduct experiments on the class [6]. We define two methods to select those feature introduced approach. This data set contains over one maps per class and per layer (because of the mentioned million images and 1, 000 possible class labels including complexity in different layers): (i) 𝑘-best-method (choose animals, plants, and persons. Each class contains the 𝑘 feature maps with highest activation values) and approximately 1, 200 images. We use a holdout split, (ii) 𝑞-quantile-method (choose the 𝑞-quantile of feature using 80% of the images to tag the filters, while ensuring maps with highest activation values). These tags serve that there were at least 500 images from each class in as an explanation of what the filter does. For example, in the set, and the remaining 20% to test the explanations. Figure 1, the leftmost activated filter has the three tags Baseline. We compare our approach with two state-of- dog, parrot and cat, which suggests that this filter plays the-art visualization methods in explaining neural net- a role in recognizing animals. works. The selection of the methods was based on their focus on feature visualization. One of the methods used 2.2. Explanations of Individual provided the fundamental basis of visualization of fea- Classifications tures and uses minimal regularization [15], the other method uses optimization objectives [4]. While previous visual methods for explaining filters are Implementation. We implemented our method in difficult for humans to understand, textual assignment Python3 and used TensorFlow as deep learning library. can lead to unambiguous explanations (as later seen in The experiments were performed on a server with In- our experiments in Figure 3). To get an explanation given tel(R) Xeon(R) Gold 6142 CPU@2.60 GHz, 16 physical an input, we assume that the tags have a better informa- cores, 188GB RAM and GeForce GTX 1080 Ti. We used tion value with the classification of the CNN if the tags pre-trained neural network models from Keras Appli- match with the classification output. Therefore, we want cations. The filters of a VGG16 were explained in the to measure the hit of the prediction with the tags in the experiments using the introduced method. VGG16 was most activated filters. To do this, we determine the most used as CNN as it is frequently used in various computer frequently occurring labels for each image of a class ac- vision applications. We also evaluated on VGG19 and cording to the previous mentioned method using the InceptionNet but omit them due to page limitations. metric Hits@𝑛. Hits@𝑛 measures how many positive la- bel tags are ranked in the top-𝑛 positions. For example, in Figure 1, the classification of the input image as a parrot 3.2. Analysis of the Explanations is explained by its high activation of filters tagged with In this analysis, we want to study the explanations of parrot. the filters using 𝑘-best-method, with 𝑘 = 1, in order to provide a better comparison with the state-of-the-art 2.3. Analysis of Classification Errors methods since they frequently visualize the most acti- vated feature map. Figure 3 shows exemplary the visual FilTag can be used for error analysis using Hits@𝑛. Tak- explanations of the baseline methods, and the tags of our ing misclassified input images, Hits@𝑛 indicates if the approach FilTag. As shown, the visual explanations of most relevant filters were activated. If Hits@𝑛 is high, we the baseline methods [15, 4] do not provide satisfactory can assume that there are similar features of the misclas- comprehension. At first sight, there is not much to under- sified class and original image. Analyzing the tags, we stand. Considering our tags, one can imagine what the may find correlations in their semantics. Furthermore, visualizations display. We additionally include pictures linking the tags and filters to knowledge graphs such corresponding to our tags, to show the information value compared to only visualizations of the filters. Filter 95 Figure 3: Comparison of filter explanations of the last conv. layer of VGG16 [1]. The visualizations of the baseline methods [15, 4] are ambiguous and difficult to interpret. Our approach FilTag allows a more precise understanding which features the filters detect. Pictures corresponding to our tags were added to show the information value. seems to recognize a lampshade especially a trapezoidal 0.8 k=1 shape. Filter 150 is only tagged with cannon, i.e. the filter 0.7 k=5 k=25 is specific for this class. Filter 288 detects a head of a 0.6 q=1% goldfinch especially with consideration of the yellow and q=5% 0.5 q=25% Hits@n black pattern. Filter 437 and Filter 462 recognize ears of brown dogs and the body of snakes, respectively. This 0.4 information would be hard to retrieve without the tags. 0.3 Even without considering the visualizations, one has a 0.2 good impression of what a filter detects. For example, it 0.1 is quite impressive that Filter 288 detects this black yel- 0 10 20 30 40 50 low pattern which we can follow from the tags goldfinch, n toucan, and european fire salamander. As well, Filter 95 detects the trapezoid in table lamp, yurt, and lampshade. Figure 4: Hits@𝑛 with different 𝑘 and 𝑞 on ImageNet In addition to comparing our method to the state-of- the-art methods in CNN explanations, we linked the tags to concepts from ConceptNet [13] to achieve a coarsen- accuracy. If the labels, and thus Hits@𝑛, do not correlate ing of common tags. ConceptNet is a semantic network with the output of the neural network, and thus with the with meanings of words. This comparison revealed that accuracy, then the filters have not been tagged sensibly many tags have both visual and semantic commonali- with our approach to gain an accurate explanation. We ties (e.g., see Filter 437 in Figure 3, rhodesian ridgeback, will interpret Hits@𝑛 and accuracy with different hyper- bloodhound and redbone are all of type dog). Following parameters 𝑘 and 𝑞, respectively. In Figure 4, we compute this evaluation process, we manually reviewed 100 filters Hits@𝑛 with the test set from ImageNet depending on 𝑘 in the context of common visual and semantic common- and 𝑞. We can see that Hits@𝑛 increases for increasing 𝑘, alities. Here we found 88% conformance with common 𝑞 and 𝑛. For 𝑞 = 25% and 𝑛 = 50, we even get a hit rate of tags in the filters. 80% over all 1,000 object classes. This result shows that FilTag can be taken as a significant explanation for the 3.3. Impact of Hyperparameters classification. For example, we have observed that the class shoji gets the highest hit rate of 98.47% followed by In the following we evaluate which impact the hyperpa- the classes slot, odometer and entertainment center with rameters 𝑘 and 𝑞 have on the correlation of Hits@𝑛 and also around 98%. This correlates with the likelihood of the best classes, which are exactly the same classes: shoji (81.22%), slot (92.30%), odometer (91.73%) and entertain- ment center (82.89%). Likewise, Hits@𝑛 also correlates with the accuracy of the worst classes, which are spatula, schipperke, reel, bucket, and hatchet. These results fit to the top-1 accuracy of VGG16 with 74, 4% for all classes. The high correlation with Hits@𝑛 and accuracy shows (a) Mortarboard (b) Computer that the relevant features, labeled by our approach, are Figure 5: Example images from ImageNet in fact detected from the images, which confirms the hy- pothesis that the tags are useful to generate explanations by means of our approach. However, for larger values neural network to assign it correctly. Moreover, it is of 𝑞 we observed that the interpretability decreases an old computer, whereas the other images in ImageNet because the number of tags increases for each filter. This generally represent rather modern computers. In order makes it harder to find similarities between the classes. to classify this image correctly, further images showing Thus, there is a trade-off between expressiveness for the old computers from the side have to be included to classification and interpretability for the filters. change the distribution and train the VGG16 to classify this image correctly. 3.4. Using the Explanations 4. Conclusion FilTag can be used for error analysis using Hits@𝑛. Tak- ing misclassified input images, Hits@𝑛 indicates if the We have introduced FilTag, an approach to provide most relevant filters were activated. If Hits@𝑛 is high, human-understandable explanations of convolutional fil- we can assume that there are similar features of the mis- ters and individual image classifications. These tags can classified class and original image. Analyzing the tags, be used to query and identify specific filters that are we may find correlations in their semantics. relevant for feature detection. In contrast to state-of- Figure 5 (a) shows an image of the class mortarboard the-art explanations, our approach allows for explicit, in ImageNet. Using VGG16, the class academic gown is non-visual explanations which are more understandable predicted with a confidence of 83.8%, while the actual for non-experts. class mortarboard is predicted with a confidence of only A limitation of our approach is the use of the class 16.2%. Considering the image, we notice that both ob- labels as tags to describe the filters. As a result, filters jects are part of this image, making this result reasonable. are not described in terms of specific objects such as ears, Reviewing the activated filters, we observe that filters wings, or legs. We would like to address this limitation tagged by FilTag with the tag mortarboard, as well as in the future by using ConceptNet and other knowledge with the tag academic gown, are usually activated. As bases to identify commonalities of the tags and thus add a result, we can verify that features are extracted from specific object descriptions to the filters. these two classes and used for prediction. This allows to give non-experts an understanding of the reason for the misclassification, as often features of the other class are References extracted from this image. Likewise, we can use the in- formation to increase the number of images in which the [1] K. Simonyan, A. Zisserman, Very Deep Convolu- mortarboard is the actual class but not in the main focus tional Networks for Large-Scale Image Recognition, of the image, in order to continue learning the network in: 3rd International Conference on Learning Rep- to make the predictions more accurate. resentations, ICLR 2015, 2015. Figure 5 (b) shows an image from the class computer. [2] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wo- This image is classified by VGG16 as cash machine with jna, Rethinking the Inception Architecture for Com- a probability of 99%. Looking at the tagged filters, filters puter Vision, in: 2016 IEEE Conference on Com- of the tags cash machine are mostly activate, followed puter Vision and Pattern Recognition, CVPR 2016, by screen, CD player, and file. Considering Figure 5 (b) 2016, pp. 2818–2826. and having knowledge about the other images of the [3] G. Montavon, W. Samek, K.-R. Müller, Methods class computer in ImageNet, the reason this image is not for interpreting and understanding deep neural net- assigned to this class becomes clear. Generally, frontal works, Digital Signal Processing 73 (2018) 1–15. images of a computer were used for the computer class [4] C. Olah, A. Mordvintsev, L. Schubert, Feature Visu- for learning. However, this image does not correspond alization, Distill (2017). to the same distribution. Thus, it is difficult for the [5] C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, A. Mordvintsev, The Building Blocks of Interpretability, Distill (2018). [6] F. Hohman, H. Park, C. Robinson, D. H. P. Chau, Summit: Scaling deep learning interpretability by visualizing activation and attribution summariza- tions, IEEE transactions on visualization and com- puter graphics 26 (2019) 1096–1106. [7] M. D. Zeiler, R. Fergus, Visualizing and Understand- ing Convolutional Networks, in: Computer Vision - ECCV 2014, volume 8689 of Lecture Notes in Com- puter Science, 2014, pp. 818–833. [8] R. Fong, A. Vedaldi, Net2Vec: Quantifying and Ex- plaining How Concepts Are Encoded by Filters in Deep Neural Networks, in: Conference on Com- puter Vision and Pattern Recognition, CVPR 2018, 2018, pp. 8730–8738. [9] D. Bau, B. Zhou, A. Khosla, A. Oliva, A. Torralba, Network Dissection: Quantifying Interpretability of Deep Visual Representations, in: Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, pp. 3319–3327. [10] R. B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, in: Conference on Computer Vision and Pattern Recognition, CVPR 2014, 2014, pp. 580–587. [11] K. Simonyan, A. Vedaldi, A. Zisserman, Deep In- side Convolutional Networks: Visualising Image Classification Models and Saliency Maps, in: 2nd International Conference on Learning Representa- tions, ICLR 2014, 2014. [12] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. A. Riedmiller, Striving for Simplicity: The All Convo- lutional Net, in: 3rd International Conference on Learning Representations, ICLR 2015, 2015. [13] R. Speer, J. Chin, C. Havasi, ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 4444–4451. [14] A. Nguyen, T. Weller, M. Färber, Y. Sure-Vetter, Mak- ing Neural Networks FAIR, in: Knowledge Graphs and Semantic Web - Second Iberoamerican Confer- ence and First Indo-American Conference, KGSWC 2020, volume 1232 of Communications in Computer and Information Science, 2020, pp. 29–44. [15] D. Erhan, Y. Bengio, A. Courville, P. Vincent, Visu- alizing Higher-Layer Features of a Deep Network, Technical Report, Univeristé de Montréal (2009). [16] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei, Im- ageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV) 115 (2015) 211–252.