=Paper=
{{Paper
|id=Vol-3884/paper4
|storemode=property
|title=Understanding CNN Hidden Neuron Activations using Concept Induction over Background Knowledge
|pdfUrl=https://ceur-ws.org/Vol-3884/paper4.pdf
|volume=Vol-3884
|authors=Abhilekha Dalal
|dblpUrl=https://dblp.org/rec/conf/semweb/Dalal24
}}
==Understanding CNN Hidden Neuron Activations using Concept Induction over Background Knowledge==
Understanding CNN Hidden Neuron Activations using
Concept Induction over Background Knowledge
Abhilekha Dalal1
1
Kansas State University, Manhattan KS, USA
Abstract
A major challenge in Explainable AI is interpreting hidden neuron activations accurately. These interpretations
can reveal what a deep learning system perceives as relevant in the input data, thereby addressing the black-box
nature of such systems. The state of the art indicates that hidden node activations can be interpretable by
humans, but there’s a lack of systematic automated methods to verify these interpretations, especially those that
utilize substantial background knowledge and inherently explainable methods. In this proposal, we introduce a
novel model-agnostic post-hoc Explainable AI method based on a Wikipedia-derived concept hierarchy with
approximately 2 million classes. Our approach utilizes OWL-reasoning-based Concept Induction for explanation
generation and compares with off-the-shelf pre-trained multimodal-based explainable methods. Our results
demonstrate that our method automatically provides meaningful class expressions as explanations to individual
neurons in the dense layer of a Convolutional Neural Network, outperforming prior work in both quantitative
and qualitative aspects.
Keywords
Explainable AI, Concept Induction, Convolutional Neural Network, Knowledge Graph,
1. Introduction
Deep learning has revolutionized various fields such as image classification [1], speech recognition [2],
translation [3], drug design [4], medical diagnosis [5], climate sciences [6]. However, the opaque
nature of deep learning systems poses challenges in applications involving automated decisions and
safety-critical systems. For instance, concerns arise from incidents like Steve Wozniak’s accusation
of gender discrimination in Apple Card credit limits and biased image search results for "CEOs" [7].
Safety-critical areas like self-driving cars [8] and [9, 10] are also vulnerable to adversarial attacks [11],
including altering classification results [11] and manipulating the order of training images [12]. Some
attacks are hard to detect post facto, posing significant risks [13, 14].
Problem Statement: While statistical evaluations are standard for assessing deep learning per-
formance, they fall short in providing explanations for specific system behaviors [15]. Therefore,
developing robust explanation methods for deep learning systems remains crucial. Despite significant
progress in this area (see Section 4), current approaches often rely on a limited set of predefined ex-
planation categories. This reliance on human-selected categories is problematic, as it assumes they
are suitable for explaining deep learning systems, which lacks evidence. Some methods leverage deep
learning models, such as LLMs, to generate explanations [16], introducing another layer of opacity.
Additionally, state-of-the-art explanation systems often require modified deep learning architectures,
which can lead to reduced system performance compared to unmodified versions [17].
Importance: The importance of solving this challenge cannot be overstated. Transparent and
interpretable AI systems are crucial for building trust, especially in domains like healthcare, finance,
and autonomous vehicles. By providing explanations, we empower users, including non-experts,
to understand AI decisions, fostering better acceptance and adoption. Advancing explainable AI
contributes to interdisciplinary collaboration and can enhance societal benefits while mitigating ethical
Proceedings of the Doctoral Consortium at ISWC 2024, co-located with the 23rd International Semantic Web Conference (ISWC
2024)
$ adalal@ksu.edu (A. Dalal)
0000-0002-7047-5074 (A. Dalal)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
risks associated with AI deployment. Therefore, it is imperative to address the challenge of developing
transparent and interpretable explanation methods for deep learning systems.
The subsequent section presents the research question and objectives, building on the above core
principles. 2.1 describe the contributions we have made, focusing on methods we use or plan to use to
support these contributions and then describing the results 3 thus far from them.
2. Research Question and Contributions
Research Question: How can we develop an effective approach to explainable deep learning that can
be used to assign human-understandable interpretations to the activations of hidden neurons in the
deep learning model?
This proposal outlines an approach to use Concept Induction, i.e., formal logical deductive reason-
ing [18] to automatically provide meaningful explanations for hidden neuron activation in a Convolu-
tional Neural Network (CNN) architecture for image scene classification (on the ADE20K dataset [19]),
using a class hierarchy consisting of about 2 · 106 classes, derived from Wikipedia, as the pool of
categories [20]. Stating the hypothesis clearly that drives the work outlined in this proposal.
Hypothesis: Concept Induction analysis with large-scale background knowledge yields meaningful
labels that stably explain neuron activation in the hidden layer of CNN architecture.
2.1. Contributions and Methodology
To achieve the above-stated hypothesis, the following objectives with the methodology followed or
planned to follow are outlined:
Objective 1: Employing Concept Induction and a Wikipedia Knowledge Graph to Assign Meaningful
Labels to Hidden Neurons’ Activation.
We explored and evaluated three concrete methods (Concept Induction, CLIP-Dissect [16], GPT-
4 [21]) to generate high-level concepts for explaining hidden neuron activations. Our comprehensive
methodology for Objective 1 is detailed in our paper [22].
1. Prep: Scenario and CNN Training - Utilizing the annotated ADE20K dataset [19], we trained
Resnet50V2 for scene classification, achieving an accuracy of (86.46%). The annotations are only
used for generating label hypotheses, not for CNN training. While highest accuracy isn’t critical for
our investigation, it’s important for models to be practically applicable.
2. Concept Induction - [18] system accepts three inputs: positive set 𝑃 and negative set 𝑁 of
images from ADE20K, and a knowledge base 𝐾, all expressed as description logic theories, and
all examples 𝑥 ∈ 𝑃 ∪ 𝑁 occur as individuals (constants) in 𝐾. It returns description logic class
expressions 𝐸 such that 𝐾 |= 𝐸(𝑝) for all 𝑝 ∈ 𝑃 and 𝐾 ̸|= 𝐸(𝑞) for all 𝑞 ∈ 𝑁 . For scalability,
we used ECII [23] heuristic Concept Induction system with Wikipedia [20]. We included the
images in the background knowledge by associating object annotations from ADE20K images
with classes in the hierarchy, using the Levenshtein string similarity metric [24] with edit distance
0.
3. Generating Label Hypotheses -
a) In Concept Induction, we used 1,370 ADE20K images with our trained ResNet50V2, extract-
ing activations from the dense layer with 64 neurons. Positive examples (𝑃 ) are images
activating the neuron with > 80% of its max activation, negative examples (𝑁 ) are those
activating it with < 20% of its max or not at all. ECII generates the target label for each
neuron based on these sets and background knowledge.
b) CLIP-Dissect employs the top 20,000 English vocabulary words as concepts. Subsequently,
activations from our trained ResNet50v2 model for ADE20K test images were collected,
resulting in a matrix (Number of Images × 64). Utilizing these inputs, CLIP-Dissect assigns a
label to each neuron such that the neuron is most activated when the corresponding concept
is present in the image, resulting in 22 distinct concepts across 64 neurons.
c) GPT-4 Leveraging GPT-4, we adopt a methodology akin to [25] for concept generation to
differentiate image classes [26]. We input image annotations from positive (𝑃 ) and negative
(𝑁 ) sets into GPT-4 with prompts to discern concepts unique to 𝑃 . The prompt "Generate
top three classes of objects/general scenarios that better represent what images in the
positive set (𝑃 ) have but the images in the negative set (𝑁 ) do not," yields three concepts
per neuron, from which we select one per class for assessment.
Objective 2: Automate Concept Label Association for Input Images using Neuron Ensembles and
Non-target Activation Probabilities.
1. Concept Associations and Non-Target Activations - In pursuit of Objective 1, Step 3 generates
labels for neuron activation. Each neuron’s label is the target concept, with all other images
considered as non-target concepts. This analysis focuses on the top three ECII responses, assessing
neuron activation for non-target concepts at various cut-off values relative to each neuron’s
maximum activation value: > 0, > 20% of max, > 40% of max, and > 60% of max. The goal is
to establish strong associations between concepts and neuron activations, understanding which
concepts trigger specific neurons and to what extent.
2. Neuron Ensembles for Concept Associations - Input information can be distributed across
simultaneously activated neurons, necessitating the examination of neuron ensemble activations
using previously established cut-off values. However, the scale challenge arises with 264 potential
neuron ensembles for just 64 neurons. To address this, we propose combining neurons activated
for semantically related labels (with top-3 responses from ECII). For instance, if "building" acti-
vates both neuron 0 and neuron 63. We assess all images activating both neurons 0 and 63 for
specified cut-off values. In cases where a concept activates more than two neurons, our analysis
encompasses all possible combinations of pairs, evaluating target and non-target activations. We
proceed with concepts, including neuron ensembles, that exhibit target activation exceeding 80%
for further analysis
3. Validating Neuron-Concept Associations - After completing Step 1 and Step 2, we obtain
probabilities for non-target concepts across all concepts, including those activating single neu-
rons as well as neuron ensembles. This allows for identifying potential concepts and assessing
associated error margins. To verify or reject these concepts, we revisit the ADE20K dataset. Using
a subset of 1050 randomly chosen images, we conduct a user study via Amazon Mechanical Turk
(MTurk) [27] to annotate images with target concepts. We then cross-reference these designated
concepts with image annotations obtained from the MTurk study. We evaluate the likelihood of
neuron activations for non-target concepts.
4. Developing an Automated System - We propose developing an automated system to streamline
the entire process, enabling scalability to larger datasets and exploration of a broader parameter
range. The system would comprise: Concept induction: Generates class expressions/responses
ranked by coverage score. Neuron activation: Calculates activation for target and non-target
concepts (including neuron ensembles) at various cut-off values. Concept validation: Validates
generated concepts. This automated system would analyze new images, generating a list of
potential concepts with associated probabilities. Users could review the concepts and select the
most relevant ones for the image. The automated approach offers several advantages, including
speed, efficiency, scalability to larger datasets, and exploration of diverse parameter settings.
3. Evaluation and Results
Objective 1: The three approaches generate label hypotheses for all studied neurons, which we
validated using new images. We search Google Images using each target label as keywords and collect
200 images per label with Imageye1 . These images are split into 80% for evaluation and 20% for statistical
analysis. We then determine if the target neuron activates when the retrieval label matches the target
1
https://chrome.google.com/webstore/detail/image-downloader-imageye/agionbommeaifngbhincahgmoflcikhm
Table 1
Generated label hypotheses from all three approaches,Bold denotes neurons whose labels are considered
confirmed(the full version can be found in our work at [22]).
Concept Induction
Neuron Obtained Label(s)Images Coverage Target % Non-Target %
0 building 164 0.997 89.024 72.328
1 cross_walk 186 0.994 88.710 28.923
11 river_water 157 0.995 31.847 22.309
CLIP-Dissect
0 restaurants 140 55.000 59.295
3 dresser 171 95.322 66.199
7 bathroom 153 93.333 44.113
GPT-4
0 Urban Landscape 176 54.545 59.078
1 Street Scene 164 92.073 29.884
3 Bedroom 165 97.576 62.967
Table 2
Statistical Evaluation details for all three approaches(full version can be found in our work at [22]).
Concept Induction
Neuron Label(s) Images # Activations (%) Mean Median z-score p-value
targ non-t targ non-t targ non-t
0 building 42 80.95 73.40 2.08 1.81 2.00 1.50 -1.28 0.0995
1 cross_walk 47 91.49 28.94 4.17 0.67 4.13 0.00 -8.92 <.00001
18 slope 35 91.43 68.85 1.59 1.37 1.44 1.00 -2.03 0.0209
49 footboard, chain 32 84.38 66.41 2.63 1.67 2.30 1.17 -2.58 0.0049
CLIP-Dissect
3 dresser 43 93.02 64.61 2.59 1.42 2.62 0.68 5.01 <0.0001
7 bathroom 46 89.47 41.56 2.02 1.01 2.15 0.00 5.45 <0.0001
18 dining 36 94.87 76.82 3.01 1.85 3.11 1.44 4.52 <0.0001
GPT-4
1 Street Scene 42 90.50 30.40 3.80 0.70 4.20 0.00 -9.62 <0.0001
14 Living Room 41 78.00 67.50 1.40 1.30 1.20 0.90 -0.77 0.4413
17 Dining Room 40 97.50 45.90 2.20 0.60 2.50 0.00 -8.29 <0.0001
31 Urban Street Scene 41 80.50 65.70 1.80 1.30 1.70 0.90 -2.4 0.164
label and if any other neurons activate. Table 1(presents selective representation due to space constraints,
complete version is available at [22].) show the percentage of target images that activated each neuron.
A target label is confirmed if it activates for ≥ 80% of its target images, regardless of its activation for
non-target images. Detailed paper can be found at [22].
Statistical Evaluation and Result:- After generating confirmed labels from all three approaches,
we assess node labeling using the remaining images, treating each neuron-label pair in Table 1 as a
hypothesis. Concept Induction, CLIP-Dissect, and GPT-4 produce 20, 8, and 27 hypotheses, respectively,
based on confirmed labels. Using the Mann-Whitney U test, we compared activation strengths between
images retrieved using the target label and those retrieved using other keywords. Table 2 shows
the selective representation of results obtained through Mann-Whitney U test. Concept Induction
consistently outperforms other methods, as evidenced by Mann-Whitney U results and statistical
analysis. For most neurons, activation values of target images significantly exceed those of non-target
images (with 𝑝 < 0.00001). Concept Induction rejects 19 out of 20 null hypotheses at 𝑝 < 0.05,
CLIP-Dissect rejects all 8 null hypotheses, and GPT-4 rejects 25 out of 27 null hypotheses at 𝑝 < 0.05.
More details in [22].
Objective 2: We will conduct a comprehensive statistical evaluation using the Mann-Whitney U
(MWU) test for each concept across different cut-off values. This evaluation aims to compare the
activation strengths of non-target concepts retrieved through Google Images(from Objective 1) with
those retrieved from the ADE20K dataset. The hypothesis under consideration is that the activation
strength of non-target concepts from Google Images exceeds that from the ADE20K dataset. Conversely,
the null hypothesis (H0) posits that the activation strength of non-target concepts from Google Images
equals that from the ADE20K dataset. For each category of cut-off values, concepts exhibiting a
significant difference in activation strengths (p-value < 0.005) will undergo further validation through
the Wilcoxon signed-rank test across all cut-off values as a collective unit. We refine our approach
and enhance concept label associations’ accuracy by identifying concepts with significantly higher
activation strengths.
4. Related Work
With the recent advances in deep learning [28], its wide usage in nearly every field, and its opaque nature
make explainable AI more important than ever, and there are multiple ongoing efforts to demystify
deep learning [29, 30, 31]. Existing explainable methods can be categorized based on input data (feature)
understanding, e.g., feature summarizing [32, 33], or based on the model’s internal unit representation,
e.g., node summarizing [34, 11]. Those methods can be further categorized as model-specific [32] or
model-agnostic [33]. Another kind of approach relies on human interpretation of explanatory data
returned, such as counterfactual questions [35].
We focus on the understanding of internal units of the neural network-based deep learning models.
Prior work has shown that internal units may indeed represent human-understandable concepts [34, 11],
but these approaches often require resource-intensive methods like semantic segmentation [36] or
explicit concept annotations [37]. There has been research utilizing Semantic Web data for explaining
deep learning models [38, 39], and Concept Induction for generating explanations [40, 41]. However,
they mainly focused on analyzing how inputs relate to outputs and generating explanations for the
whole system, while we focused on understanding internal node activations.
CLIP-Dissect [16], similar to our work, takes a different approach. It utilizes the CLIP pre-trained
model, employing zero-shot learning to associate images with labels. Another related work, Label-Free
Concept Bottleneck Models [26], builds upon CLIP-Dissect, using GPT-4 [21] for concept set generation.
However, CLIP-Dissect faces challenges in accurately predicting output labels based on concepts in the
last hidden layer and transferring to other modalities or domain-specific applications. The Label-Free
approach inherits these limitations and may compromise explainability due to its use of a concept
derivation method that lacks inherent explainability.
5. Conclusion
Concept Induction, leveraging large-scale ontological background knowledge, provides meaningful
labeling of hidden neuron activations, validated by statistical analysis. This allows us to pinpoint con-
cepts that strongly trigger neuron responses, effectively explaining neuron activations. Our approach
introduces novel possibilities for diverse label categories. Comparative analysis against CLIP-Dissect
and GPT-4 showcases Concept Induction’s superiority, especially in settings with labeled data. Ulti-
mately, our work aims to thoroughly analyze hidden layers in deep learning systems, facilitating the
interpretation of activations as implicit input features and explaining system input-output behavior.
Moving forward, future work will focus on enhancing Concept Induction’s scalability and efficiency,
enabling its broader applicability across various domains.
Acknowledgments
The author acknowledge advisor Dr. Pascal Hitzler and partial funding under National Science Founda-
tion grants 2119753 and 2333782.
References
[1] M. Ramprasath, M. V. Anand, S. Hariharan, Image classification using convolutional neural
networks, International Journal of Pure and Applied Mathematics 119 (2018) 1307–1319.
[2] A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in:
International conference on machine learning, The Proceedings of Machine Learning Research,
2014, pp. 1764–1772.
[3] M. Auli, M. Galley, C. Quirk, G. Zweig, Joint language and translation modeling with recurrent
neural networks, in: Proceedings of the 2013 Conference on Empirical Methods in Natural
Language Processing, 2013, pp. 1044–1054.
[4] M. H. Segler, T. Kogej, C. Tyrchan, M. P. Waller, Generating focused molecule libraries for drug
discovery with recurrent neural networks, ACS central science 4 (2018) 120–131.
[5] H.-I. Choi, S.-K. Jung, S.-H. Baek, W. H. Lim, S.-J. Ahn, I.-H. Yang, T.-W. Kim, Artificial intelligent
model with neural network machine learning for the diagnosis of orthognathic surgery, Journal
of Craniofacial Surgery 30 (2019) 1986–1989.
[6] Y. Liu, E. Racah, Prabhat, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel, M. F. Wehner, W. D.
Collins, Application of deep convolutional neural networks for detecting extreme weather in
climate datasets, 2016. URL: http://arxiv.org/abs/1605.01156. arXiv:1605.01156.
[7] I. A. Hamilton, Apple cofounder Steve Wozniak says Apple Card offered his wife a lower credit
limit, Business Insider (2019).
[8] Z. Chen, X. Huang, End-to-end learning for lane keeping of self-driving cars, in: 2017 IEEE
Intelligent Vehicles Symposium (IV), IEEE, 2017, pp. 1856–1860.
[9] A. S. Rifaioglu, E. Nalbat, V. Atalay, M. J. Martin, R. Cetin-Atalay, T. Doğan, Deepscreen: high
performance drug–target interaction prediction with convolutional neural networks using 2-d
structural compound representations, Chemical science 11 (2020) 2531–2557.
[10] W. Hariri, A. Narin, Deep neural networks for covid-19 detection and diagnosis using images and
acoustic-based techniques: a recent review, Soft computing 25 (2021) 15345–15362.
[11] D. Bau, J.-Y. Zhu, H. Strobelt, A. Lapedriza, B. Zhou, A. Torralba, Understanding the role of
individual units in a deep neural network, Proceedings of the National Academy of Sciences 117
(2020) 30071–30078.
[12] I. Shumailov, Z. Shumaylov, D. Kazhdan, Y. Zhao, N. Papernot, M. A. Erdogdu, R. J. Anderson,
Manipulating SGD with data ordering attacks, in: Advances in Neural Information Processing
Systems (NeurIPS), volume 34, 2021, pp. 18021–18032.
[13] S. Goldwasser, M. P. Kim, V. Vaikuntanathan, O. Zamir, Planting undetectable backdoors in
machine learning models: [extended abstract], in: IEEE Annual Symposium on Foundations of
Computer Science (FOCS), 2022, pp. 931–942. doi:10.1109/FOCS54457.2022.00092.
[14] T. Clifford, I. Shumailov, Y. Zhao, R. Anderson, R. Mullins, ImpNet: Imperceptible and blackbox-
undetectable backdoors in compiled neural networks, 2022. arXiv:2210.00108.
[15] D. Doran, S. Schulz, T. R. Besold, What does explainable AI really mean? a new conceptualization
of perspectives, in: CEUR Workshop Proceedings, volume 2071, CEUR, 2018.
[16] T. Oikarinen, T.-W. Weng, CLIP-Dissect: Automatic description of neuron representations in deep
vision networks, in: International Conference on Learning Representations, ICLR, 2023. URL:
https://openreview.net/forum?id=iPWiwWHc1V.
[17] M. E. Zarlenga, P. Barbiero, G. Ciravegna, G. Marra, F. Giannini, M. Diligenti, Z. Shams, F. Precioso,
S. Melacci, A. Weller, P. Lió, M. Jamnik, Concept embedding models: Beyond the accuracy-
explainability trade-off, in: Advances in Neural Information Processing Systems (NeurIPS), 2022.
[18] J. Lehmann, P. Hitzler, Concept learning in description logics using refinement operators,
Mach. Learn. 78 (2010) 203–250. URL: https://doi.org/10.1007/s10994-009-5146-2. doi:10.1007/
s10994-009-5146-2.
[19] B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso, A. Torralba, Semantic understanding of
scenes through the ADE20K dataset, International Journal of Computer Vision 127 (2019) 302–321.
[20] M. K. Sarker, J. Schwartz, P. Hitzler, L. Zhou, S. Nadella, B. S. Minnery, I. Juvina, M. L. Raymer,
W. R. Aue, Wikipedia knowledge graph for explainable AI, in: B. Villazón-Terrazas, F. Ortiz-
Rodríguez, S. M. Tiwari, S. K. Shandilya (Eds.), Proceedings of the Knowledge Graphs and Semantic
Web Second Iberoamerican Conference and First Indo-American Conference (KGSWC), volume
1232 of Communications in Computer and Information Science, Springer, 2020, pp. 72–87. URL:
https://doi.org/10.1007/978-3-030-65384-2_6. doi:10.1007/978-3-030-65384-2\_6.
[21] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt,
S. Altman, S. Anadkat, et al., GPT-4 technical report, arXiv preprint arXiv:2303.08774 (2023).
[22] A. Dalal, R. Rayan, A. Barua, E. Y. Vasserman, M. K. Sarker, P. Hitzler, On the value of labeled data
and symbolic methods for hidden neuron activation analysis, 2024. arXiv:2404.13567.
[23] M. K. Sarker, P. Hitzler, Efficient concept induction for description logics, in: The Thirty-Third
AAAI Conference on Artificial Intelligence (AAAI) The Thirty-First Innovative Applications of
Artificial Intelligence Conference (IAAI), The Ninth AAAI Symposium on Educational Advances
in Artificial Intelligence (EAAI), AAAI Press, 2019, pp. 3036–3043. URL: https://doi.org/10.1609/
aaai.v33i01.33013036. doi:10.1609/aaai.v33i01.33013036.
[24] V. I. Levenshtein, On the minimal redundancy of binary error-correcting codes, Inf.
Control. 28 (1975) 268–291. URL: https://doi.org/10.1016/S0019-9958(75)90300-9. doi:10.1016/
S0019-9958(75)90300-9.
[25] A. Barua, C. Widmer, P. Hitzler, Concept induction using LLMs: a user experiment for assessment,
2024. URL: https://arxiv.org/abs/2404.11875. arXiv:2404.11875.
[26] T. Oikarinen, S. Das, L. M. Nguyen, T.-W. Weng, Label-free concept bottleneck models, in:
The Eleventh International Conference on Learning Representations, ICLR, 2023. URL: https:
//openreview.net/forum?id=FlCg47MNvBA.
[27] K. Crowston, Amazon mechanical turk: A research tool for organizations and information systems
scholars, in: A. Bhattacherjee, B. Fitzgerald (Eds.), Shaping the Future of ICT Research. Methods
and Approaches, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 210–221.
[28] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444.
[29] D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, G.-Z. Yang, XAI – explainable artificial
intelligence, Science robotics 4 (2019) eaay7120.
[30] A. Adadi, M. Berrada, Peeking inside the black-box: a survey on explainable artificial intelligence
(xai), IEEE access 6 (2018) 52138–52160.
[31] D. Minh, H. X. Wang, Y. F. Li, T. N. Nguyen, Explainable artificial intelligence: a comprehensive
review, Artificial Intelligence Review (2022) 1–66.
[32] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, D. Batra, Grad-CAM: Why did you
say that?, 2016. URL: http://arxiv.org/abs/1611.07450. arXiv:1611.07450.
[33] M. T. Ribeiro, S. Singh, C. Guestrin, "Why Should I Trust You?": Explaining the Predictions of Any
Classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’16, Association for Computing Machinery, New York, NY,
USA, 2016, p. 1135–1144. URL: https://doi.org/10.1145/2939672.2939778. doi:10.1145/2939672.
2939778.
[34] B. Zhou, D. Bau, A. Oliva, A. Torralba, Interpreting deep visual representations via network
dissection, IEEE transactions on pattern analysis and machine intelligence 41 (2018) 2131–2145.
[35] S. Wachter, B. D. Mittelstadt, C. Russell, Counterfactual explanations without opening the black
box: Automated decisions and the GDPR, CoRR abs/1711.00399 (2017). URL: http://arxiv.org/abs/
1711.00399. arXiv:1711.00399.
[36] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, J. Sun, Unified perceptual parsing for scene understanding, in:
Proceedings of the European conference on computer vision (ECCV), 2018, pp. 418–434.
[37] B. Kim, M. Wattenberg, J. Gilmer, C. J. Cai, J. Wexler, F. B. Viégas, R. Sayres, Interpretability
beyond feature attribution: Quantitative testing with concept activation vectors (TCAV), in:
J. G. Dy, A. Krause (Eds.), Proceedings of the International Conference on Machine Learning
(ICML), volume 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 2673–2682. URL:
http://proceedings.mlr.press/v80/kim18d.html.
[38] R. Confalonieri, T. Weyde, T. R. Besold, F. M. del Prado Martín, Using ontologies to enhance human
understandability of global post-hoc explanations of black-box models, Artificial Intelligence 296
(2021) 103471.
[39] N. Díaz-Rodríguez, A. Lamas, J. Sanchez, G. Franchi, I. Donadello, S. Tabik, D. Filliat, P. Cruz,
R. Montes, F. Herrera, Explainable neural-symbolic learning (x-nesyl) methodology to fuse deep
learning representations with expert knowledge graphs: The monumai cultural heritage use case,
Information Fusion 79 (2022) 58–83.
[40] M. K. Sarker, N. Xie, D. Doran, M. L. Raymer, P. Hitzler, Explaining trained neural networks
with semantic web technologies: First steps, in: T. R. Besold, A. S. d’Avila Garcez, I. Noble
(Eds.), Proceedings of the Twelfth International Workshop on Neural-Symbolic Learning and
Reasoning (NeSy), volume 2003 of CEUR Workshop Proceedings, CEUR-WS.org, 2017. URL: https:
//ceur-ws.org/Vol-2003/NeSy17_paper4.pdf.
[41] T. Procko, T. Elvira, O. Ochoa, N. D. Rio, An exploration of explainable machine learning using
semantic web technology, in: 2022 IEEE 16th International Conference on Semantic Computing
(ICSC), IEEE Computer Society, 2022, pp. 143–146. doi:10.1109/ICSC52841.2022.00029.