<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Knowledge Graphs and Explanations for Improving Detection of Diseases in Images of Grains</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Lenka</forename><surname>Tětková</surname></persName>
							<email>lenhy@dtu.dk</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Section for Cognitive Systems</orgName>
								<orgName type="department" key="dep2">DTU Compute</orgName>
								<orgName type="institution">Technical University of Denmark</orgName>
								<address>
									<postCode>2800</postCode>
									<settlement>Kongens Lyngby</settlement>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Knowledge Graphs and Explanations for Improving Detection of Diseases in Images of Grains</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8CF66ED3A600B57C4D1B268F8DE9C4F0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>post-hoc explanations</term>
					<term>convexity of representations</term>
					<term>alignment of representations</term>
					<term>concept-based explainability</term>
					<term>knowledge graphs</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Many research works focus on benchmark datasets and overlook the issues appearing when attempting to use the methods in real-world applications. The application used in this work is the detection of diseases and damages in grain kernels from images. This dataset is very different from standard benchmark datasets and poses an additional challenge of biological variation in the data. The goal is to improve disease detection and introduce explainability into the process. We explore how knowledge graphs can be used to improve image classification by using existing metadata and to create collections of data depicting a specific concept. We identify challenges one faces when applying post-hoc explainability methods on data with biological variation and propose a workflow for the choice of the most suitable method for any application. Moreover, we evaluate the robustness of these methods to naturally occurring small changes in the input images. Finally, we explore the notion of convexity in representations of neural networks and its implications for the performance of the fine-tuned models and alignment to human representations.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>During my PhD, I cooperate with a Danish company FOSS. Their EyeFoss™ instrument is being used for objective grain quality estimation using image-based classification of grain types and grain damages. Over the years, they created a large database of images of grains of various types, mostly healthy kernels, but also a reasonable amount of grains for various diseases or damages. The images were taken over a couple of years at different geographical locations, creating an interesting collection for further research work. This application is the overarching topic for my research.</p><p>From all possible research directions, we decided to take two paths: variability of grains depending on external conditions and explainability. The first one stems from the need to train a new model for each geographical location, and often each harvest the general look of kernels differs too much to be handled by the models. A human expert usually looks at the batch of kernels as a whole (or also has other information regarding the yield at that specific time and location) and adjusts their decision according to this accompanying information. The model, on the other hand, classifies single kernels without knowing anything else. This lack of knowledge makes the task very challenging. The need for explainability emerged naturally from contact with customers. The instrument determines the price of grains and also a possible need for the destruction of the whole yield if a dangerous disease is found, so both farmers and companies buying the grains have to believe that the decisions are fair and based on good reasons. Next, we will describe how each of the topics introduced as motivation formed research questions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Motivation and Research Questions</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Knowledge Graphs and Metadata</head><p>Knowledge graphs (KGs) might be a good instrument for providing machine learning algorithms with additional knowledge that is not present in the input images themselves. The information about the other grains in the same batch would be useful since they were exposed to the same conditions, and, if, for example, one kernel clearly shows the presence of an infectious disease, the rest of the batch is more likely to be infected as well and should be inspected more carefully. Ideally, all possible metadata could be included to eradicate the need for fine-tuning the models for each customer. The metadata we have in mind can be, for example, information about the field where it was grown (location, weather, quality of soil), history of the field (what was grown there before; what fertilizers and pesticides were used; what diseases and damages were detected in the past, etc.) or how it was transported and stored (because of possibilities for diseases and damages caused by poor storage conditions, e.g., mold). All of these factors affect the grains. How could they be used to help with classification? We generalized this special case into a more general topic concerning any image classification when more information is easily available -for instance, text that is close to the image on a webpage. Can we use the metadata to improve image classification?</p><p>The second use of KGs connects this motivation with the following one: could we build a knowledge database about grains and then use it to explain the models using concept-based explainability? For example, if we could represent the concept of "pink fusarium" (a fungal infection), we might explore the global functioning of the model concerning this concept and get insights into the whole process. KGs could be a great source of information about concepts. This inspired us to explore whether KGs can be used for concept definition and data collection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Explainability</head><p>Since explaining the decisions on the pixel level for each image separately would be useful for gaining trust, we decided to explore how post-hoc explainability methods could be applied to the problem of grain images. One of the first concerns is robustness: during photo collection and image preprocessing, the grains and the final photos are rotated and centered, and other changes are applied. Moreover, the light conditions depend on the light bulb inside the machine, which might slightly differ in each machine. We need to ensure that the explanations are robust against these small, naturally-occurring changes. Therefore, the first step is to explore how the explanations change if we change the input image (using standard data-augmentation methods).</p><p>Subsequently, when trying to apply the methods to this specific data, we found many open questions without clear answers in the current research. For example: how to choose good hyperparameters; how to visualize the resulting explanations; and how to evaluate the quality with regards to this application? Stimulated by all the ambiguities and unknowns, we explore this topic in-depth and propose a workflow that could also be used in other applications.</p><p>When faced with a classification problem, one has to make decisions about the architecture and size of the model used for training. One part of this decision is choosing between training from scratch or fine-tuning an existing pretrained model. Which would give better results? Could we tell something about the performance of the fine-tuned model based on the representations created by the pretrained model? We explore the notion of convexity in the context of machine representations for both models. A better understanding of the inner workings of neural networks is a prerequisite for ensuring the alignment between AI and human values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Related Work</head><p>We provide a general overview of the research relevant to this work. Most references are omitted because of lack of space and can be found in the corresponding papers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Post-hoc Explainability in Image Domain and Quality Evaluation</head><p>Although explainability is important for understanding neural networks, the existing methods differ in the quality of produced explanations and many saliency methods have been criticized. Therefore, quality evaluation metrics have been developed. They usually measure to what degree the explanations satisfy certain desiderata. For example, the explanation should reflect model's predictive behavior (e.g., pixel-flipping <ref type="bibr" target="#b0">[1]</ref>, IROF <ref type="bibr" target="#b1">[2]</ref>), be stable to slight perturbations of the input (sensitivity <ref type="bibr" target="#b2">[3]</ref>), and use only a few features (complexity <ref type="bibr" target="#b3">[4]</ref>). It has been shown that both image classifiers and explanation methods are fragile and that attackers can manipulate the explanations arbitrarily. Rieger and Hansen in <ref type="bibr" target="#b4">[5]</ref> used an aggregate of a few explanation methods to defend against attacks on explanations. However, this does not solve the problem for a single method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Learning from Hints</head><p>There is a long history of combining separate pieces of information to improve the learning process and resulting models. We use additional information about a specific image to improve its classification, not the whole model during training. There is a growing interest in including knowledge bases or metadata in the learning process for hybrid models combining neural networks with symbolic knowledge. There are many approaches to combining multiple modalities, usually by training a new model jointly with all data. In comparison, our approach uses already existing large pre-trained models eliminating the need for processing and incorporating the metadata into a complicated pipeline. Integration can happen at the input level (early fusion), at the decision level (late fusion), intermediately, or in a combined way (hybrid fusion).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Concept-Based Explainability Methods</head><p>As opposed to per-instance explanations, concept-based methods use higher-level attributes, usually referred to as concepts. Various theoretical frameworks have been proposed in recent years, most distinctively post-hoc and inherently interpretable methods. Many methods require pre-defined concepts with examples, but these data are difficult to get. For example, concept activation vectors (CAVs) <ref type="bibr" target="#b5">[6]</ref> use the data to determine a direction in the hidden space that represents the concept, and concept activation regions (CARs) <ref type="bibr" target="#b6">[7]</ref> generalize this approach to regions. There are also approaches aiming to discover the concepts that a model has learned without the need for labeled concept data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methods</head><p>This section presents a general overview of the methods used in all the experiments included in this work. For all the details, see the respective papers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Robustness of Explanations to Data-Augmentation Methods [8]</head><p>We choose six augmentation methods and divide them into two categories: invariant (changes in brightness, hue, and saturation) and equivariant (rotation, translation, and scale). For the invariant methods, we want the explanations of the augmented image to be the same as the explanation of the original image. For the equivariant methods, we compare the explanation of the augmented image with an augmented version of the original explanation (e.g., a rotated explanation). For each method, we choose a symmetric interval determining the strength of the augmentation such that the probability of the correct class drops by at least 10% at one of the end points. We choose the ResNet50 model architecture and train it in two settings: first using all available data augmentation, and then using only necessary augmentations (for centering and clipping the input image). We compare the results for both models to see if the pertaining influences the robustness. We evaluate the robustness by computing the correlation between the explanations and compare it to the drop in the probability of the target class (i.e., the robustness of the classifier). We define a robustness score (see <ref type="bibr" target="#b7">[8]</ref>) such that values lower/higher than 1 mean that the explanations are less/more robust than the classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Challenges in Explaining Models for Data with Biological Variation [9]</head><p>The grain image data used in this paper was obtained from the FOSS's EyeFoss™ image database. We selected two well-known and well-described barley defects that are important for the malting process: pink fusarium infection and skinned barley. We treat them as a binary classification and train a simple convolutional network for each of them. Since one of the goals is to measure how similar to human perception the explanations are, we collected manual annotations of the defects (as binary masks) made by an expert on grain quality evaluation. In <ref type="bibr" target="#b8">[9]</ref>, we identify and discuss many challenges faced when applying explainability methods in general and on a particular dataset. These include insufficient evaluation methods, subjectivity of annotated explanations, many hyperparameters to define, and many ways of visualization. Even slight changes in the choices make a big difference on the explanation. We first evaluate the quality without ground truth using sensitivity <ref type="bibr" target="#b2">[3]</ref>, pixel-flipping <ref type="bibr" target="#b0">[1]</ref>, IROF <ref type="bibr" target="#b1">[2]</ref>, complexity <ref type="bibr" target="#b3">[4]</ref>, and we replicate the experiments from <ref type="bibr" target="#b7">[8]</ref> (described in the previous paragraph) to compare the results of the two different datasets. Next, we evaluate the similarity to the ground truth masks using two metrics: the area under the Receiver Operating Characteristic Curve (ROC-AUC) and Relevance Mass Accuracy <ref type="bibr" target="#b9">[10]</ref>. To determine the best method, we combine all the results into one final ranking using mean reciprocal rank (MRR). All details can be found in <ref type="bibr" target="#b8">[9]</ref>.</p><p>Using Metadata for Classification Improvement <ref type="bibr" target="#b10">[11]</ref> The idea of this approach is quite simple: we need pretrained classifiers for each of the data types available (one main + any number of metadata) with the same target classes. In this work, we use one for images and one for text. We gather logits from both models and combine them just before applying the softmax activation. Jørgensen et al. derive a theorem in <ref type="bibr" target="#b10">[11]</ref> that implies (under certain assumptions) that 𝑃 (𝑐 𝑖 |𝑥 1 , . . . ,</p><formula xml:id="formula_0">𝑥 𝑁 ) = softmax 𝑖 (︁ ∑︀ 𝑁 𝑗=1 𝑧 𝑥 𝑗 − (𝑁 − 1) ln 𝜋 )︁</formula><p>, where 𝑁 is the number of combined models, 𝑐 𝑖 , 𝑖 ∈ {1, . . . , 𝐶} a class, 𝑥 1 , . . . , 𝑥 𝑁 input data, 𝑧 𝑥 1 , . . . , 𝑧 𝑥 𝑁 are logits such that for all relevant 𝑖, 𝑗: softmax 𝑖 (𝑧 𝑥 𝑗 ) = 𝑃 (𝑐 𝑖 |𝑥 𝑗 ), and 𝜋 is a vector of probabilities 𝜋 𝑖 = 𝑃 (𝑐 𝑖 ). It is discussed in the paper that the assumptions may not be satisfied in general but when using this formula empirically for combining two classifiers, it improved the accuracy. Moreover, we evaluate the influence of calibrating each classifier before combining them. We compare these results to a linear SVM classifier trained on the concatenated logits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Concept Definition Using Knowledge Graphs [12]</head><p>We propose a pipeline for collecting personalized concept data. We use knowledge graphs to get structural knowledge for a concept we are interested in. We propose a simple interactive tool to "go up" or "go down" on the level of generality of a concept in KGs, and disambiguate among different meanings. In this way, the end-user decides what concepts are relevant for a specific application and assures their correctness. In the next step, we use Wikipedia (for text) and Wikimedia Commons (for images) to collect data linked to each concept in Wikidata. We evaluate the quality of the collected data using CAVs <ref type="bibr" target="#b5">[6]</ref> and CARs <ref type="bibr" target="#b6">[7]</ref>: accuracy of the classifiers, the role of the number of data available, comparison to human-defined concepts, and alignment between the concepts and their subconcepts (if the CAVs and CARs of concepts that are close in the knowledge graph, i.e., human cognition, are also similar in machine representations). For more details, see <ref type="bibr" target="#b11">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convexity of Decision Regions [13]</head><p>The goal is to evaluate to what degree is convexity of decision regions present in the representations throughout the whole model. Convexity in general is a yes/no property but we define it as a proportion from 0 to 1. We define two types of convexity: Euclidean and graph. Euclidean builds on the "standard" definition of convexity, where we sample points on the segment (in Euclidean geometry) between two points from the same class and compute how many of those are classified as belonging to the same class. The graph convexity is motivated by the observation that the representational geometries are often better described as general manifolds. The shortest paths between two points are then geodesics instead of segments. Geodesics are hard to compute, so we approximate them by the shortest paths in a graph, where vertices are available datapoints and edges are Euclidean distances between the closest points (we keep only 10 nearest neighbors). The graph convexity score is then defined as a proportion of the "well-classified" vertices on the shortest paths between each two points from the same class. Each score captures different properties of the representations. An extensive definition and illustration of both scores can be found in <ref type="bibr" target="#b12">[13]</ref>. We evaluate both convexity scores on five modalities (images, text, audio, human activity recognition, and medical images), multiple models, and all hidden layers. We compare the results for corresponding pretrained and fine-tuned models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>In all the described papers, the methods section (briefly recapitulated in this work) defines a new notion, a score, or a workflow. These should be seen as results themselves. Moreover, we present an overview of the results of the experiments. The reader is referred to the individual papers for detailed results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Post-hoc Explainability</head><p>We found out that LRP composites <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref> and Guided Backpropagation <ref type="bibr" target="#b15">[16]</ref> created the most stable explanations (with respect to data augmentations) and Gradients <ref type="bibr" target="#b16">[17]</ref> and Input x Gradients <ref type="bibr" target="#b16">[17]</ref> were the least stable ones. When perturbing with the invariant methods, the explanations were more stable (almost as stable as the classifier itself) than when perturbing with equivariant methods. Training with data augmentation did not increase robustness. The results of robustness to data augmentations on grain images were very similar to the results on ImageNet, suggesting that this metric is quite stable to the distribution shifts in the input data.</p><p>The experiments on the images of grains showed that it is hard to evaluate explainability methods even with the evaluation metrics (some methods were better in some aspects and worse in others). After aggregating all the metrics, the three best methods were LRP (EpsilonPlusFlat), SHAP <ref type="bibr" target="#b17">[18]</ref>, and Deconvolution <ref type="bibr" target="#b18">[19]</ref>. However, the presented analysis should be taken predominantly as a framework for evaluating explainability methods on non-standard data because the results are likely to be different when applied to other images with different properties.</p><p>Using Metadata for Classification Improvement The proposed fusion scheme improved the performance by combining preexisting unimodal classifiers. Compared to a linear SVM classifier, it achieved comparable accuracy with much fewer computational resources. However, calibration of the unimodal classifiers was crucial for the performance of the fusion model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Concept Definition Using Knowledge Graphs</head><p>By using the proposed pipeline and publicly available resources, we can create larger concept databases than the available labeled databases. Importantly, databases defined like this lead to comparable or even better accuracies for CAVs and CARs. We observed lower accuracy and agreement of CAVs and CARs in the early layers of the networks, indicating that explanations derived from the early layers should be viewed critically. We showed that explanations based on the retrieved concept databases are robust to in-distribution shifts (e.g., variations in the negative set) and even, to a certain degree in the later layers, to out-of-distribution shifts (i.e., using a different dataset). However, it is still critical to align the concept definition and database with the user's intention, as the explanation can strongly depend on the context of the concept. Finally, we showed that networks learn a similar relation of concepts to sub-concepts as in human-generated knowledge graphs, suggesting some inherent alignment. This human-machine alignment is essential for successful communication and underscores the promising future of concept-based explainability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convexity of Decision Regions</head><p>We carried out extensive experiments in multiple domains and on networks trained by self-supervised learning and next fine-tuned on domain-specific labels. We found evidence that both Euclidean and graph convexity were pervasive in pretrained and fine-tuned models. We found that decision region convexity generally increased after finetuning. Importantly, we found evidence that the higher convexity of a class decision region after pretraining was associated with the higher recall of the given class after fine-tuning. This is in line with observations made in cognitive systems, that convexity supports few-shot learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions and Next Steps</head><p>Real-world data is a great source of research questions and challenges that need to be solved. We presented a couple of research questions stemming from images of grains, namely using metadata to enhance classification, and explaining the models. Despite the motivation coming from a specific application, many of the presented results concern general setup and benchmark datasets. A natural next step is to utilize these findings in the application -on images of grains. Specifically, use the grain metadata to improve classification and collect concept data for concepts relevant to grain disease detection. We also developed methods for evaluating certain properties of the explanations (robustness against data augmentation) and representations (convexity). The next step is to develop training methods that would improve on these properties.</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by the DIREC Bridge project Deep Learning and Automation of Imaging-Based Quality of Seeds and Grains, Innovation Fund Denmark grant number 9142-00001B.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Klauschen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0130140</idno>
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e0130140</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Rieger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.08747</idno>
		<title level="m">Irof: a low resource evaluation metric for explanation methods</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">On the (in) fidelity and sensitivity of explanations</title>
		<author>
			<persName><forename type="first">C.-K</forename><surname>Yeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Suggala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">I</forename><surname>Inouye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Ravikumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">U</forename><surname>Bhatt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Weller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Moura</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.00631</idno>
		<title level="m">Evaluating and aggregating feature-based model explanations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A simple defense against adversarial attacks on heatmap explanations</title>
		<author>
			<persName><forename type="first">L</forename><surname>Rieger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">5th Annual Workshop on Human Interpretability in Machine Learning</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wattenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gilmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wexler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Viegas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="2668" to="2677" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Concept activation regions: A generalized framework for concept-based explanations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Crabbé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Der Schaar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="2590" to="2607" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Robustness of visual explanations to common data augmentation methods</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tětková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="3714" to="3719" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Tětková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Dreier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Malm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2406.09981</idno>
		<title level="m">Challenges in explaining deep learning models for data with biological variation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Arras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Osman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.07258</idno>
		<title level="m">Ground truth evaluation of neural network explanations with clevr-xai</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Image classification with symbolic hints using limited resources</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Jørgensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tětková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0301360</idno>
		<ptr target="https://doi.org/10.1371/journal.pone.0301360.doi:10.1371/journal.pone.0301360" />
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Knowledge graphs for empirical concept retrieval</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tětková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Scheidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Fogh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M G</forename><surname>Jørgensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">Å</forename><surname>Nielsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Explainable Artificial Intelligence</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Longo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Seifert</surname></persName>
		</editor>
		<meeting><address><addrLine>Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="160" to="183" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">On convex decision regions in deep network representations</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tětková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brüsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Scheidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Mager</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">Ø</forename><surname>Aagaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Foldager</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">S</forename><surname>Alstrøm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.17154</idno>
	</analytic>
	<monogr>
		<title level="m">ICLR 2024 Workshop on Representational Alignment (Re-Align)</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv preprint</note>
	<note>workshop paper</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Klauschen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PloS one</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e0130140</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Towards best practice in explaining neural network decisions with lrp</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kohlbrenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nakajima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2020 International Joint Conference on Neural Networks (IJCNN), IEEE</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Springenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedmiller</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6806</idno>
		<title level="m">Striving for simplicity: The all convolutional net</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6034</idno>
		<title level="m">Deep inside convolutional networks: Visualising image classification models and saliency maps</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A unified approach to interpreting model predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 30</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><forename type="middle">V</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4765" to="4774" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Visualizing and understanding convolutional networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European conference on computer vision</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="818" to="833" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
