<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gabriele</forename><surname>Dominici</surname></persName>
							<email>gabriele.dominici@usi.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Università della Svizzera Italiana</orgName>
								<address>
									<settlement>Lugano</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pietro</forename><surname>Barbiero</surname></persName>
							<email>pietro.barbiero@usi.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Università della Svizzera Italiana</orgName>
								<address>
									<settlement>Lugano</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Francesco</forename><surname>Giannini</surname></persName>
							<email>francesco.giannini@unisi.it</email>
							<affiliation key="aff1">
								<orgName type="institution">Università di Siena</orgName>
								<address>
									<settlement>Siena</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Martin</forename><surname>Gjoreski</surname></persName>
							<email>martin.gjoreski@usi.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Università della Svizzera Italiana</orgName>
								<address>
									<settlement>Lugano</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marc</forename><surname>Langeinrich</surname></persName>
							<email>marc.langeinrich@usi.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Università della Svizzera Italiana</orgName>
								<address>
									<settlement>Lugano</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">E9CE5D0254E96B3A681798A0E0DC99B5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Interpretability</term>
					<term>Explainable AI</term>
					<term>Concept Learning</term>
					<term>Concept Bottleneck Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Numerous national and international regulatory frameworks underscore the transformative potential of artificial intelligence (AI). However, they also warn of the inherent risks associated with such powerful technology, emphasizing the importance of careful monitoring and strict protections. For instance, the recent AI Act <ref type="bibr" target="#b0">[1]</ref> aims to implement detailed regulations for AI systems, ensuring their safety, transparency, and accountability. Similarly, in the US, the federal government issued an executive order that proposes principles for trustworthy AI. Hence, interpretable AI has become a crucial aspect of modern machine learning to address concerns over the opaque nature of deep learning (DL) models <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. The quest for transparency has been driven by the need to understand the decision-making processes of AI systems, particularly in critical areas where ethical <ref type="bibr" target="#b3">[4]</ref> and legal <ref type="bibr" target="#b4">[5]</ref> implications of these systems' decisions are significant. Concept Bottleneck Models (CBMs) <ref type="bibr" target="#b5">[6]</ref> are a family of differentiable models aiming to increase DL interpretability <ref type="bibr" target="#b6">[7]</ref>. These models map input data (e.g., pixel intensities) to humanunderstandable concepts (e.g., shapes, colors), and then use these concepts to predict labels of a downstream classification task. However, existing CBMs necessitate training a new model from the beginning even in settings where trained or fine-tuned models already exist. In these scenarios, current CBM architectures would consume significant resources in re-training or fine-tuning again possibly large models. As a result, this limitation restricts CBMs' ability to be adopted in new domains. To bridge this gap, we introduce Any Concept Bottleneck Models (AnyCBMs, Figure <ref type="figure" target="#fig_0">1</ref>), a method to transform any black-box neural architecture into an interpretable CBM. The key innovation of AnyCBMs lies in a neural model mapping black-box embeddings into a set of supervised concepts and then mapping the predicted concepts back to black-box embeddings. This allows AnyCBMs to be applied to any layer of a trained black box and to perform concept-based interventions as in standard CBMs. Results demonstrate that AnyCBMs match black-box performance in classification accuracy in downstream tasks and CBM performance in concept accuracy. In addition, AnyCBM could steer the behaviour of a black-box model acting on human-understandable concepts as effectively as in CBMs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>Concept-based models 𝑓 : 𝐶 → 𝑌 learn a map from a concept space 𝐶 to a task space 𝑌 <ref type="bibr" target="#b7">[8]</ref>. If concepts are semantically meaningful, then humans can interpret this mapping by tracing back predictions to the most relevant concepts <ref type="bibr" target="#b6">[7]</ref>. When the features of the input space are hard for humans to reason about (such as pixel intensities), concept-based models work on the output of a concept-encoder mapping 𝑔 : 𝑋 → 𝐶 from the input space 𝑋 to the concept space 𝐶 <ref type="bibr" target="#b8">[9]</ref>. These architectures are known as Concept Bottleneck Models (CBMs) <ref type="bibr" target="#b5">[6]</ref>. In general, training a CBM model may require a dataset where each sample consists of input features 𝑥 ∈ 𝑋 ⊆ R 𝑛 (e.g., an image's pixels), 𝑘 ground truth concepts 𝑐 ∈ 𝐶 ⊆ {0, 1} 𝑘 (i.e., a binary vector with concept annotations, when available) and 𝑜 task labels 𝑦 ∈ 𝑌 ⊆ {0, 1} 𝑜 (e.g., an image's classes). During training, a CBM is encouraged to align its predictions to task labels i.e., 𝑦 ≈ 𝑦 ^= 𝑓 (𝑔(𝑥)). Similarly, a concept predictor can be supervised when concept labels are available i.e., 𝑐 ≈ 𝑐 ^= 𝑔(𝑥). We indicate concept and task predictions as 𝑐 ^𝑖 = (𝑔(𝑥)) 𝑖 and 𝑦 ^𝑗 = (𝑓 (𝑐 ^)) 𝑗 respectively. When concept labels are not available, they can still be extracted in with unsupervised techniques <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>, which make CBMs applicable to a wide range of applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">AnyCBM: Turning Black Boxes into Concept Bottleneck Models</head><p>AnyCBM (Figure <ref type="figure" target="#fig_0">1</ref>) is a method designed to convert any opaque neural network architecture into a Concept Bottleneck Model (CBM) that is interpretable. The fundamental innovation of AnyCBMs involves the use of an external model that processes embeddings from a trained black box model. These embeddings, denoted as ℎ (𝑙) ∈ 𝐻 (𝑙) ⊆ R 𝑙 , are encoded into a set of supervised concepts 𝑐 ∈ 𝐶. Subsequently, these concepts are mapped back into embeddings ℎ (𝑞) ∈ 𝐻 (𝑞) ⊆ R 𝑞 . This process allows for the embedding space of the black box model to be translated into a more understandable and interpretable form, where each concept represents a meaningful feature or characteristic that explains the decision-making process of the neural network. The following definition formalizes AnyCBMs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 3.1 (AnyCBM).</head><p>Given a black box model 𝜑 : 𝐻 𝑙 → 𝐻 𝑞 and a set of concepts 𝐶, a AnyCBM is a tuple of models (𝜓 𝑐 , 𝜓 𝑦 ) such that, the following diagram commutes:</p><formula xml:id="formula_0">𝐻 (𝑙) 𝐻 (𝑞) 𝐶 𝜑 𝜓 𝑐 𝜓 𝑦</formula><p>More specifically, the concept predictor 𝜓 𝑐 : 𝐻 (𝑙) → 𝐶 encodes black box embeddings into concepts, and the task encoder 𝜓 𝑦 : 𝐶 → 𝐻 (𝑞) maps concepts back into black box embeddings. In practice, the commutative diagram describes how the interpretable mapping through 𝐶 via 𝜓 𝑐 and 𝜓 𝑦 should be consistent with the direct transformation of the black box 𝜑. Also properties and capabilities of AnyCBMs can directly be derived from the commutative diagram as it constraints the relationships among the transformations 𝜓 𝑐 , 𝜓 𝑦 , and 𝜑.</p><p>In the following we present two practical case studies.</p><p>Case 1: 𝜑 is the identity function on 𝐻 When 𝜑 is the identity function, 𝜑(ℎ (𝑙) ) = ℎ (𝑙) for all ℎ (𝑙) ∈ 𝐻 (𝑙) , and 𝐻 (𝑙) = 𝐻 (𝑞) . The diagram simplifies, and we have:</p><formula xml:id="formula_1">𝜓 𝑦 ∘ 𝜓 𝑐 = id 𝐻 Theorem 3.2.</formula><p>If 𝜑 is the identity function on 𝐻, then 𝜓 𝑦 is injective:</p><formula xml:id="formula_2">𝜑 = id 𝐻 =⇒ 𝜓 𝑦 : 𝐶 ˓→ 𝐻 (𝑞) (1)</formula><p>Proof. Assume 𝜓 𝑦 (𝑐 1 ) = 𝜓 𝑦 (𝑐 2 ). Since 𝜓 𝑐 is surjective, there exist ℎ 1 , ℎ 2 ∈ 𝐻 (𝑙) such that 𝜓 𝑐 (ℎ 1 ) = 𝑐 1 and 𝜓 𝑐 (ℎ 2 ) = 𝑐 2 . Then,</p><formula xml:id="formula_3">ℎ 1 = 𝜓 𝑦 (𝜓 𝑐 (ℎ 1 )) = 𝜓 𝑦 (𝑐 1 ) = 𝜓 𝑦 (𝑐 2 ) = 𝜓 𝑦 (𝜓 𝑐 (ℎ 2 )) = ℎ 2 .</formula><p>Thus, 𝑐 1 = 𝑐 2 , proving that 𝜓 𝑦 is injective.</p><p>Significance: This property implies that 𝜓 𝑦 can uniquely reconstruct elements of 𝐻 (𝑙) from 𝐶, despite 𝜓 𝑐 not being injective. For example, if 𝜓 𝑐 represents a lossy compression, then 𝜓 𝑦 could be an error-correcting decoding where no information is lost despite compression.</p><p>Case 2: independent training In many practical cases, concept predictors and task encoders are independently trained to reduce concept leakage <ref type="bibr" target="#b11">[12]</ref>. In this common setting, we can prove another property of AnyCBMs task encoders. Theorem 3.3. If 𝜓 𝑐 and 𝜓 𝑦 are independently trained and 𝜑 is a multi-layer neural network, then 𝜓 𝑦 cannot be surjective.</p><p>Proof. Assume for contradiction that 𝜓 𝑦 is surjective. The surjectivity of 𝜓 𝑦 would require that every point in 𝐻 (𝑞) is the image of some point in 𝐶. Given the independent training, the domain of 𝜓 𝑦 is finite, specifically 2 𝑘 . Since 𝐻 (𝑞) ⊆ R 𝑞 , the mapping 𝜓 𝑦 : 𝐶 → 𝐻 (𝑞) must pull from a set with finite cardinality 2 𝑘 to cover R 𝑞 which is a contradiction. Hence, 𝜓 𝑦 cannot be surjective.</p><p>Significance: This theorem indicates that the surjectivity of 𝜓 𝑦 depends on the way we train the concept bottleneck. This means that, under independent training, AnyCBMs are not invertible, even when 𝜑 represents an invertible transformation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>Our experiments aim to answer the following questions:</p><p>• How is AnyCBMs classification performance on concepts and downstream tasks compared to standard CBMs and black boxes? • How effective are concept interventions in AnyCBM compared to concept interventions in CBM? • Is it possible to train AnyCBM with a dataset slightly different from the one used to train the black-box model?</p><p>This section describes essential information about the experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Data &amp; task setup</head><p>In our experiments, we use two different datasets commonly used to evaluate CBMs: MNIST even/odd <ref type="bibr" target="#b12">[13]</ref>, where the task is to predict whether handwritten digits are even or odd; and CUB <ref type="bibr" target="#b13">[14]</ref>, where the task is to predict bird species based on bird characteristics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Evaluation</head><p>In our analysis, we use ROC-AUC scores to measure classification performance in concepts and downstream tasks and to measure the effectiveness of concept-based interventions in improving classification performance in downstream tasks. To measure the effectiveness of interventions, we follow a similar approach to the one described by Espinosa Zarlenga et al. <ref type="bibr" target="#b14">[15]</ref>. First, we perturb the latent embeddings by adding a small random noise a few layers before predicting concepts both in AnyCBM and CBM. Then, we intervene on a portion of the concepts with the ground truth. Finally, we test our assumption about the possibility of training AnyCBM with a different dataset with concepts. We train the black-box model with an MNIST even/odd dataset with RGB images. Then, we train AnyCBM with a version of MNIST that contains greyscale images with associated concepts. All results are reported using the mean and standard error over five different runs with different parameter initializations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Baselines</head><p>In our experiments, we compare AnyCBMs with standard CBMs and with an end-to-end blackbox model in terms of generalisation performance. We compare AnyCBMs' interventions with the effectiveness of interventions in standard CBMs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Key findings</head><p>AnyCBMs match black box and CBM performances in terms of classification accuracy on concepts and downstream tasks (Table <ref type="table" target="#tab_0">1</ref>), AnyCBMs perform just as well as the original black-box models on which they are based when it comes to accurately completing tasks. Additionally, the accuracy with which these models handle concepts is equal to that of other similar Concept Bottleneck Model architectures. This suggests that AnyCBMs could be a valuable tool for making existing black-box models easier to understand. Using AnyCBMs, we might be able to explain how these complex models work and, in particular, which encoded information is inside the layers of the models, making them more transparent and accessible for further analysis and improvement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AnyCBM interventions are as effective as in Concept Bottleneck Models (Figure 2)</head><p>AnyCBMs are as responsive to concept-based interventions as standard CBMs. This means that when concepts predicted by AnyCBMs are manually changed by human experts at test time, they effectively impact the downstream task accuracy. This finding underlines the ability of AnyCBMs to interact with domain experts as it would have been expected by CBMs. In  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Downstream task and concept accuracy of AnyCBMs (trained on MNIST Greyscale) compared to CBMs (trained on MNIST Greyscale) and a black box model (trained on MNIST RGB) addition, this represents a successful method to steer the behaviour of the model modifying human-understandable concepts.</p><p>AnyCBM can be trained with a different dataset from the one used to train the blackbox model (Table <ref type="table">2</ref>) One can initially train a black box model with a dataset, which could be larger or more beneficial for addressing the downstream task. Subsequently, the AnyCBM module can be trained on a slightly different dataset that includes concept annotations. As demonstrated in Table <ref type="table">2</ref>, this approach does not compromise the model's performance in terms of task accuracy when both the black-box model and AnyCBM are utilised in the original dataset. It also partially accurately predicts concepts in the original dataset, even when there is a distribution shift. This indicates that AnyCBM can alleviate a significant constraint of CBMs, which is the requirement for concept annotations in the dataset used to train the entire model. In addition, the dataset used to train the AnyCBM module could contain only input and concept annotations, without the need for label annotations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>Advantages In the age of Large Models with billions of parameters, the development of solutions that do not require retraining to enhance their capabilities is crucial. AnyCBMs successfully meet this need, as they do not require the alteration of the weights of a pre-trained black-box model. This enables any black-box model to acquire the extra features of CBMs, such as the interpretability of the latent space and the capacity to change the model's behaviour through concept interventions. Furthermore, we believe that AnyCBM can be trained using a dataset that is smaller than the one used to train the original black-box because it has a consistently smaller number of parameters. Interestingly, the dataset can even be distinct (for instance, we might train the model with a dataset without concepts while training AnyCBM with a slightly different dataset that has only concept annotations), mitigating the CBMs' constraint of needing concept annotations for the training set used to train the model. Under these circumstances, it might be intriguing to determine whether certain concepts can be accurately predicted from the latent embeddings of black-box models. If some concepts are unpredictable, this could suggest that the black-box models did not grasp that particular concept in the prior training, either due to the dataset employed or its irrelevant role in task prediction.</p><p>Limitations Although the model gains the benefits of CBMs, it also takes on some of their drawbacks. The primary constraint is the necessity for concept data to train the AnyCBM component, although this is somewhat alleviated by the reduced need for concept annotations and the option to utilise an alternate dataset for their extraction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Future work</head><p>We underscore the importance of delving deeper into AnyCBM and its benefits, while also trying to mitigate its drawbacks. For example, it would be intriguing to examine its application in multimodal contexts, where automatic concept extraction could be feasible, as suggested in <ref type="bibr" target="#b10">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>This paper introduces Any Concept Bottleneck Models (AnyCBMs), a method for transforming opaque neural networks into interpretable Concept Bottleneck Models (CBMs), allowing for insights into the decision-making process in terms of concept-based explanations and interventions. This paper analyses practical case studies that demonstrate the properties and limitations of AnyCBMs in enhancing interpretability while maintaining high classification performances from both a theoretical and an experimental perspective. These results suggest how AnyCBMs could represent a computationally effective solution to enhance the interpretability of existing trained or fine-tuned black-box neural networks, allowing also for concept-based interventions in the black-box latent space.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Any Concept Bottleneck Models (AnyCBMs) transform any black box neural architecture into an interpretable CBM mapping black box embeddings into a set of supervised concepts and then mapping the predicted concepts back to black box embeddings. This allows AnyCBMs to be applied to any layer of a trained black box and to perform concept-based interventions as in standard CBMs.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 : 4 ±</head><label>24</label><figDesc>Figure 2: Task accuracy of AnyCBMs compared to CBMs after intervening on an increasing number of family of concepts on the MNIST and CUB dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Downstream task and concept ROC AUC of AnyCBMs compared to CBMs and a black box model on MNIST and CUB datasets.</figDesc><table><row><cell></cell><cell cols="2">MNIST even/odd</cell><cell>CUB</cell><cell></cell></row><row><cell></cell><cell cols="4">Task ROC AUC Concept ROC AUC Task ROC AUC Concept ROC AUC</cell></row><row><cell>Black box</cell><cell>99.8 ± 0.0</cell><cell>-</cell><cell>90.5 ± 0.3</cell><cell>-</cell></row><row><cell>CBM</cell><cell>99.8 ± 0.0</cell><cell>99.8 ± 0.0</cell><cell>90.0 ± 0.2</cell><cell>83.0 ± 0.2</cell></row><row><cell>Black box +</cell><cell>99.6 ± 0.0</cell><cell>98.8 ± 0.3</cell><cell>90.3 ± 0.2</cell><cell>84.8 ± 0.3</cell></row><row><cell>AnyCBMs</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This study was funded by TRUST-ME (project 205121L_214991), SmartCHANGE (GA No. 101080965) and XAI-PAC (PZ00P2_216405) projects.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Madiega</surname></persName>
		</author>
		<title level="m">Artificial intelligence act, European Parliament</title>
				<imprint>
			<publisher>European Parliamentary Research Service</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The role of explanations on trust and reliance in clinical decision support systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bussone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Stumpf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>O'sullivan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2015 international conference on healthcare informatics</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="160" to="169" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</title>
		<author>
			<persName><forename type="first">C</forename><surname>Rudin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="206" to="215" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Jongsma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Medical Ethics</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page" from="329" to="335" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Lo</forename><surname>Piano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Humanities and Social Sciences Communications</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="1" to="7" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Concept bottleneck models</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Koh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mussmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pierson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5338" to="5348" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Interpretation of neural networks is fragile</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ghorbani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI conference on artificial intelligence</title>
				<meeting>the AAAI conference on artificial intelligence</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="3681" to="3688" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">On completeness-aware concept-based explanations in deep neural networks</title>
		<author>
			<persName><forename type="first">C.-K</forename><surname>Yeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Arik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pfister</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ravikumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="20554" to="20565" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Ghorbani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wexler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1902.03129</idno>
		<title level="m">Towards automatic concept-based explanations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Magister</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kazhdan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liò</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2107.11889</idno>
		<title level="m">Gcexplainer: Human-in-the-loop concept-based explanations for graph neural networks</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Oikarinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-W</forename><surname>Weng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.06129</idno>
		<title level="m">Label-free concept bottleneck models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Mahinpei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Lage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Doshi-Velez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Pan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.13314</idno>
		<title level="m">Promises and pitfalls of black-box concept learning models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Entropy-based logic explanations of neural networks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Barbiero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ciravegna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Giannini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lió</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Melacci</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="6046" to="6054" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Wah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Branson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Welinder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Belongie</surname></persName>
		</author>
		<title level="m">The caltech-ucsd birds-200-2011 dataset</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Concept embedding models</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Espinosa</forename><surname>Zarlenga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barbiero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ciravegna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Marra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Giannini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Diligenti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Shams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Precioso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Melacci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Weller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
