<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Explainability for Speech Models: On the Challenges of Acoustic Feature Selection</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Dennis</forename><surname>Fucci</surname></persName>
							<email>dfucci@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
								<address>
									<addrLine>Via Calepina, 14</addrLine>
									<postCode>38122</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<postCode>38123</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Beatrice</forename><surname>Savoldi</surname></persName>
							<email>bsavoldi@fbk.eu</email>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<postCode>38123</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marco</forename><surname>Gaido</surname></persName>
							<email>mgaido@fbk.eu</email>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<postCode>38123</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matteo</forename><surname>Negri</surname></persName>
							<email>negri@fbk.eu</email>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<postCode>38123</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mauro</forename><surname>Cettolo</surname></persName>
							<email>cettolo@fbk.eu</email>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<postCode>38123</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luisa</forename><surname>Bentivogli</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<addrLine>Via Sommarive, 18</addrLine>
									<postCode>38123</postCode>
									<settlement>Trento</settlement>
									<region>TN</region>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Tenth Italian Conference on Computational Linguistics</orgName>
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Explainability for Speech Models: On the Challenges of Acoustic Feature Selection</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C3838FC2307220E849F624C154382F44</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Speech Models</term>
					<term>Explainability</term>
					<term>Feature Attribution</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Spurred by the demand for transparency and interpretability in Artificial Intelligence (AI), the field of eXplainable AI (XAI) has experienced significant growth, marked by both theoretical reflections and technical advancements. While various XAI techniques, especially feature attribution methods, have been extensively explored across diverse tasks, their adaptation for the speech modality is comparatively lagging behind. We argue that a key challenge in feature attribution for speech processing lies in identifying informative acoustic features. In this paper, we discuss the key challenges in selecting the features for speech explanations. Also, in light of existing research, we highlight current gaps and propose future avenues to enhance the depth and informativeness of explanations for speech.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Models are only as interpretable as their features. <ref type="bibr" target="#b0">[1]</ref> Spoken language-as perhaps our most natural form of interaction-is the foundational element of many technologies we interact with in our daily lives <ref type="bibr" target="#b1">[2]</ref>, from virtual assistants to voice dictation <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>. More recently, the emergence of highly capable speech foundation models <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref> has also facilitated and expanded the adoption of speech technologies on an unprecedented multilingual scale. In light of this proliferation, a need arises to prioritize transparency and interpretability, qualities already demanded in the growing landscape of Machine Learning (ML).</p><p>As a response, the field of eXplainable AI (XAI) has risen prominently, with the aim of facilitating understanding of the rationale behind model decisions and fostering users' trust <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref>. XAI is also reinforced by the establishment of norms and legal frameworks, as seen in the European Union's General Data Protection Regulation, which enshrines the 'right to explanation', and the AI Act, which emphasizes transparency as a pivotal component of ML applications <ref type="bibr" target="#b13">[14]</ref>.</p><p>XAI encompasses various tasks and methods, such as identifying relevant model components for specific predictions, understanding the information processed by these components, and determining which input elements guide the model's predictions <ref type="bibr" target="#b14">[15]</ref>. The latter task is the focus of feature attribution methods, which provide intuitive explanations by visualizing which input elements (e.g., pixels in an image or words in a sentence) have influenced the model's predictions. These methods assign a score to each input feature, quantifying its importance or contribution to the output: higher scores indicate greater importance of the corresponding input features for generating the output <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19]</ref>. They can help identify potential causes for errors and unexpected behaviors, as well as analyze the model's response to specific input properties. Overall, these explainability methods serve to present the reason why models make specific predictions by establishing a connection between input and output as a form of intuitive explanation for humans, thereby enhancing interpretability. 1  Over time, ongoing efforts have aimed to refine feature attribution techniques and provide more effective explanations <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>. However, it is essential to recognize that the effectiveness of feature attribution explanations relies not only on the techniques themselves but also on the informativeness of the input features used as explanatory variables. If an explanation highlights unintelligible or poorly informative features, it does little to enhance the understanding of the model's behavior 1 Despite numerous efforts to differentiate the closely related concepts of explainability and interpretability, no consensus exists in the literature on their definitions <ref type="bibr" target="#b19">[20]</ref>. In this paper, we adopt a perspective similar to that of Saeed and Omlin <ref type="bibr" target="#b20">[21]</ref>, where explainability refers to the process of extracting insights from a model's workings through specific techniques, while interpretability refers to the understanding process of those insights, crucial to make them actionable. <ref type="bibr" target="#b0">[1]</ref>. This can undermine key principles in XAI, such as accuracy-the property of correctly reflecting the factors that led the model to a specific decision including all relevant information-and meaningfulness-the property of offering explanations that are comprehensible to the user <ref type="bibr" target="#b23">[24]</ref>. <ref type="foot" target="#foot_0">2</ref>In fields involving images or texts, feature representations are typically constrained to pixels and words, respectively. However, for speech, multiple input representations can be adopted, each emphasizing different acoustic aspects. Indeed, a sequence of speech elements does not only convey the meaning of what is said (like words in a text) but also bears a wealth of additional information useful for both human understanding and automatic processing (e.g. intonation, loudness, speaking rate). Consequently, when employing feature attribution methods, the resulting explanations can vary significantly in shape and focus on more or less informative characteristics depending on the type of speech representation used. To date, research on feature attribution for speech is notably limited to few applications-including classification <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b27">28]</ref> and generative tasks <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b30">31,</ref><ref type="bibr" target="#b31">32]</ref>-which offer a somewhat fragmented picture in the choice of speech representations, thus providing limited insights on the relation between the features considered and the explanations based upon them.</p><p>In light of the above, this paper reflects on the impact of the chosen acoustic features in explaining the rationale behind speech models, aiming to gain a deeper understanding of the trade-offs associated with acoustic features. By first offering a gentle introduction to the rich and multidimensional nature of speech and its digital representation, we identify current gaps and potential avenues for effectively incorporating this multidimensionality into XAI for speech models. Our discussion will focus on two critical factors: i) the amount of information these features provide about the model's behavior, which influences the richness of the explanations, and ii) the level of detail of such information, which determines the granularity of the explanations. We will also explore how these aspects impact both the accuracy and meaningfulness of the explanations, ultimately shaping their overall interpretability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The Correlates of Speech</head><p>To gain deeper insight into the complexities of defining informative features in speech, we explore key characteristics of speech and their implications for modeling.</p><p>Speech is a multifaceted phenomenon. It is grounded on the materiality of sound to convey linguistic content (i.e. what is said), which is modulated depending on several paralinguistic cues (i.e. how is said) entailing extensive variation-also for single individual speakers <ref type="bibr" target="#b32">[33]</ref>. As such, it comprises several dimensions, which are hard to pin down individually, but collectively amount to what we intuitively and simply perceive as spoken language.</p><p>From a linguistic perspective, the spoken communication system consists of the combination of phonemes, <ref type="foot" target="#foot_1">3</ref>which are regarded as the smallest meaningful units of sounds <ref type="bibr" target="#b33">[34,</ref><ref type="bibr" target="#b34">35]</ref>. Physically, it involves the continuous flow of sounds shaped by the movements of our phonatory organs, transmitted as sound waves <ref type="bibr" target="#b35">[36]</ref>. Perceptually, we process speech through three primary dimensions <ref type="bibr" target="#b36">[37]</ref>: i) time, or the sequential occurrence of sounds;<ref type="foot" target="#foot_2">4</ref> ii) intensity, corresponding to the energy level of the wave due to the strength of molecular vibration, which we perceived as loudness; iii) frequency, regarding the rate of vibrations produced by the vocal cordsinterpreted as pitch-and whose modulation is responsible for shaping the type of speech sound.</p><p>These three elements, known as acoustic correlates <ref type="bibr" target="#b37">[38]</ref>, are specific to both speakers and phonemes. For example, speakers possess unique characteristics, including pitch and speaking rate <ref type="bibr" target="#b32">[33]</ref>, and also exhibit high variability stemming from various sociodemographic factors such as gender, age, and dialect <ref type="bibr" target="#b38">[39]</ref>. In these cases, the speech content needs to be disentangled from the variability in its delivery. Conversely, language sounds exhibit variability in duration-e.g., /i/ in ship and sheep-and are distinguished by specific frequency ranges <ref type="bibr" target="#b35">[36]</ref>. The frequency dimension also plays a vital role in shaping suprasegmental aspects of speech-broader phenomena that span multiple segments-such as intonation, obtained by varying pitch <ref type="bibr" target="#b40">[40]</ref>. Pitch, for instance, has a distinctive function in tonal languages, where it is used to distinguish lexical or grammatical meaning <ref type="bibr" target="#b41">[41]</ref>. But even in non-tonal languages, these prosodic elements are indispensable to delivering different meanings and intents, as the reader can perceive by reading out loud two contrastive sentences such as: "You got the joke right" and "You got the joke, right?", where pauses and prosody play pivotal roles.</p><p>All these factors add to the multidimensionality of speech, which feature engineering strives to encapsulate and that cannot be overlooked in the explanatory process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Speech Representations</head><p>While various representations are used to encode speech in a digital format, three main types are commonly given as input to state-of-the-art speech models (for a review, see <ref type="bibr" target="#b42">[42,</ref><ref type="bibr" target="#b43">43]</ref>). Namely, waveforms, spectrograms, and mel-frequency cepstral coefficients (MFCCs), which are shown in Figure <ref type="figure">1</ref>.</p><p>The waveform serves as the most fundamental representation of a signal, comprising sequences of samples (e.g., 16, 000 per second), each indicating the amplitude of the signal at a specific point in time-essentially, the fluctuations in air pressure over time. This type of representation is leveraged by models like Wav2vec <ref type="bibr" target="#b5">[6]</ref>.</p><p>The spectrogram results from feature engineering operations that decompose the speech signal into its frequencies, presenting a 2D visualization of frequency distributions over time. These representations are commonly depicted as heatmaps, where color intensity corresponds to the energy of a specific frequency at a given moment. The time unit in spectrograms is represented by a fixed-length window of a few milliseconds (e.g., 25), commonly referred to as a frame, whithin which a given number of waveform samples are encompassed. Notably, the articulation of sounds produces time-frequency patterns which are visible as darker regions <ref type="bibr" target="#b35">[36]</ref>. Prominent examples of state-of-the-art models leveraging spectrograms are Whisper <ref type="bibr" target="#b8">[9]</ref> and SeamlessM4T <ref type="bibr" target="#b44">[44]</ref>.</p><p>The MFCCs offer another 2D representation where each coefficient captures important details about how the frequency content of the signal changes over time. Like spectrograms, MFCCs offer information about both frequency and time, but in a more compact form. MFCCs are commonly used in the implementation of ASR models within popular toolkits like Kaldi<ref type="foot" target="#foot_3">5</ref>  <ref type="bibr" target="#b45">[45]</ref> and Mozilla DeepSpeech <ref type="foot" target="#foot_4">6</ref> .</p><p>Overall, though different in nature, these three types of representations are all effectively exploited by current speech models. <ref type="foot" target="#foot_5">7</ref> For human understanding, however, they actually vary in terms of informativeness with respect to the acoustic correlates discussed in §2. Indeed, although both intensity and frequency are somewhat discernible in waveforms, qualitative distinctions of patterns specific to pitch or phoneme frequencies are rarely feasible <ref type="bibr" target="#b35">[36]</ref>. Comparatively, spectrograms and MFCCs are richer and more descriptive, because they capture the multiple dimensions of time, frequency, and intensity with finer detail. Still, spectrograms are more conducive to phonetic analyses, given the established knowledge in analyzing frequency patterns over time within this representation <ref type="bibr" target="#b35">[36]</ref> In contrast, MFCCs are rarely used for phonetic analysis <ref type="bibr" target="#b46">[46]</ref>.</p><p>Overall, while weighting the informativeness and selection of speech representations requires a certain exper-Figure <ref type="figure">1</ref>: Schematic illustration of the primary speech representations used by state-of-the-art speech models for the utterance "This is a waveform". The features were computed using Librosa 0.10.1 <ref type="bibr" target="#b47">[47]</ref>.</p><p>tise in speech processing, being aware of the trade-offs they intrinsically entail is crucial for carefully conducting XAI examination in speech. Indeed, it is precisely upon such input features-and their trade-offs-that explanations are built.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Richness of Explanations</head><p>Considering the foregoing, there is a causal relationship wherein explanatory possibilities in speech XAI are inherently limited by the richness of the audio features used, specifically the dimensions they encapsulate. This limitation directly correlates with the richness of the resulting explanations. Also, owing to the compatibility of current models with various representation types, the explanations generated are inevitably confined by the specific input features provided to the model. To exemplify, if models process audio as waverfoms-which poorly represent the frequency dimension for human understanding-explanations accounting for such a correlate will be out of reach. In fact, previous works by Wu et al. <ref type="bibr" target="#b30">[31]</ref> and Wu et al. <ref type="bibr" target="#b31">[32]</ref>, based on waveforms solely focus on the temporal dimension to explain ASR.</p><p>In these cases, to avoid limiting the understanding of the models' behavior to one single dimension it would be advisable to explore alternative techniques that offer deeper insights into how models process other acoustic correlates. For instance, Pastor et al. <ref type="bibr" target="#b27">[28]</ref> integrated counterfactual explanations to specifically investigate whether selected paralinguistic features such as pitch, speaking rate, and background noise were influent for the model's prediction. Additionally, various techniques exist to analyze how models extract relevant patterns from waveforms through convolutions <ref type="bibr" target="#b49">[48,</ref><ref type="bibr" target="#b50">49,</ref><ref type="bibr" target="#b51">50]</ref>.</p><p>When the selected input features represent multiple dimensions, as in the case of spectrograms or MFCCs, the decision to only account for one of these dimensions becomes arbitrary. For example, two models tested by Wu et al. <ref type="bibr" target="#b30">[31]</ref>, namely, DeepSpeech <ref type="bibr" target="#b52">[51]</ref> and Sphinx <ref type="bibr" target="#b53">[52]</ref>, are fed with spectrograms and MFCCs, respectively. However, explanations based on raw waveforms are provided for these models. This inconsistency between the features used in explanations and those used by the models inevitably offers only a partial overview of the models' behavior and limits the exploration of important acoustic aspects. This, in turn, can impact the accuracy of the explanations, which ideally should encompass all relevant information.</p><p>To prioritize explanation accuracy and conduct analyses considering the crucial role of acoustic correlates such as frequency, it is advisable to take into account all dimensions embedded in the speech representation. This approach is exemplified by the works of Markert et al. <ref type="bibr" target="#b29">[30]</ref>, who provide explanations that account for the most influential elements in MFCCs, as well as Trinh and Mandel <ref type="bibr" target="#b28">[29]</ref> and Becker et al. <ref type="bibr" target="#b26">[27]</ref>, who base the explanations on spectrograms. In the work by Markert et al. <ref type="bibr" target="#b29">[30]</ref>, however, it is challenging to connect the results with specific acoustic parameters due to the complexity of analyzing MFCCs (see §3), which significantly undermines the meaningfulness of the explanations. In contrast, explanations using spectrograms offer valuable insights into how machines process speech, producing both accurate and meaningful results. For instance, Trinh and Mandel <ref type="bibr" target="#b28">[29]</ref> demonstrated that neural ASR models focus on high-energy time-frequency regions for transcription, while Becker et al. <ref type="bibr" target="#b26">[27]</ref> found that lower frequency ranges, typically associated with pitch, exhibit higher attribution scores in speaker gender classification tasks <ref type="bibr" target="#b26">[27]</ref>, showing some alignment with human speech processing. However, interpreting these insights requires specialized expertise, which can reduce the meaningfulness of explanations for non-experts. This highlights that, even in speech, the balance between accuracy and meaningfulness can vary depending on the context <ref type="bibr" target="#b23">[24]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Granularity of Explanations</head><p>Another critical factor concerning the informativeness of input features is the level of granularity at which the features are considered during the explanatory process. This decision affects the level of detail in the resulting explanations and, consequently, accuracy-as more detailed explanations may more accurately reflect the model's behavior-and their meaningfulness-as detailed and comprehensive explanations can be more difficult to interpret <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b23">24]</ref>.</p><p>In the time domain, for example, input features are highly fine-grained. As discussed in §2, spectrograms typically contain frames spanning tens of milliseconds, capturing detailed frequency content within each frame, whereas waveforms are composed of samples taken at much shorter time intervals-for instance, as mentioned in §2, there can be 16, 000 samples in just one second. This level of detail poses great challenges for (human) comprehension, particularly for a broader audience, since mapping groups of frames/samples in an explanation to recognizable speech units is highly time-consuming and requires specialized expertise.</p><p>Accordingly, to address the issue and make explanations for speech more broadly accessible, previous works have leveraged textual transcripts within the explanation process. More specifically, Wu et al. <ref type="bibr" target="#b31">[32]</ref> and Pastor et al. <ref type="bibr" target="#b27">[28]</ref> resort to the alignment of audio to text, either for individual phonemes or words, respectively, and apply explainability techniques to such units. While this approach helps decipher the contribution of input features based on more intuitive linguistic units, it diverges from how current models process speech features in small frames and samples <ref type="bibr" target="#b43">[43]</ref>. This divergence risks overlooking the model's behavior and compromises the accuracy and effectiveness of the explanations. For instance, whether ASR systems rely on shorter or longer time intervals than individual words remains unclear <ref type="bibr" target="#b28">[29]</ref>. Therefore, analyzing this aspect requires a more granular approach at the time level.</p><p>In light of the above, explanations should be obtained with low-level units to avoid biasing explanations towards human understanding. The use of audiotranscript alignment to aid analysis of explanations can be very useful but should occur downstream of the explanation process, not upstream. In this way, we can maximize the use of all available units to generate detailed and accurate explanations, and then aggregate scores from individual frames or samples to create more compact representations at the level of phonemes or words, ensuring flexibility in the meaningfulness of the explanations according to specific needs. This bottom-up approach mirrors practices in the text domain, providing adaptability in defining attribution units that can range from subwords to words or phrases <ref type="bibr" target="#b54">[53,</ref><ref type="bibr" target="#b55">54]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This paper has examined the role of acoustic features and their selection for explaining speech models. More specifically, we considered a specific subfield of XAI, namely, feature attribution, which connects input features to outputs as a form of explanation. Previous research has not explicitly addressed how to incorporate features into the explanation process within the speech domain, where input is encoded in more varied ways compared to other fields, such as text. This has led to diverse approaches, each with different implications for what can and cannot be explained about model behavior, and with the risk of not fully or accurately representing the model's functioning.</p><p>By discussing the key characteristics of speech and the properties of the most adopted acoustic features, we argue that explanations should ideally encompass all available dimensions, particularly time and frequency, as both are essential for a comprehensive understanding of the models' rationale. We have also discussed challenges associated with aligning explanations at high granularity with human understanding, emphasizing solutions that provide flexibility in the analysis, allowing for adjustments between more or less detail as needed.</p><p>Building on these insights, our ongoing research focuses on developing feature attribution techniques that operate on spectrograms at the finest possible unit level, integrating both time and frequency dimensions. Our aim is to generate explanations that are accurate and meaningful for experts, as well as adaptable for non-expert users. More broadly, we hope that our reflections will be beneficial and thought-provoking for researchers currently working in, or entering, the field of XAI for speech models, thereby contributing to a deeper understanding of the rationale behind these models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Limitations</head><p>While exploring the relationship between the informativeness of speech features and explanations, we have deliberately not delved into the needs of specific stakeholders for XAI applications. Indeed, different stakeholders present varying needs <ref type="bibr" target="#b56">[55,</ref><ref type="bibr" target="#b57">56]</ref>, and to consider them is a research avenue of paramount importance for the growth of XAI. As a nascent area of investigations, however, XAI for speech is still relatively in its infancy, we thus prioritized more fundamental methodological and design decisions which prioritize a comprehensive and detailed understanding at a low level of model's rationale. Accordingly, our reflections might be more appealing for a range of users who engage with speech models and possess expertise in machine learning and/or speech analysis, ranging from developers to speech therapists assisted by speech models <ref type="bibr" target="#b57">[56]</ref>.</p><p>The balance of richness and granularity-which also relates to the interplay between accuracy and meaningfulness-is also relevant to common users who interact with speech technologies. However, investigating how explanations can be effectively communicated to and understood by these users in the context of daily speech technology use exceeds the scope of this paper and warrants further exploration.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="3,304.30,84.19,200.01,272.23" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">The properties of accuracy and meaningfulness can be associated with those of faithfulness and plausibility, respectively<ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">Throughout the paper, we use the abstract category of phoneme to denote individual speech sounds. However, when discussing their actual realizations, it is more accurate to refer to them as phones<ref type="bibr" target="#b33">[34]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">E.g. the order of sounds between /pAt/ (pot) or /tAp/ (top) differentiates two words.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">See https://kaldi-asr.org/doc/feat.html.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">See https://deepspeech.readthedocs.io/en/master/DeepSpeech. html.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">We are not aware of any recent study attributing higher systems performance depending on the used representation.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Acknowledgments</head><p>We acknowledge the support of the PNRR project FAIR -Future AI Research (PE00000013), under the NRRP MUR program funded by the NextGenerationEU. This paper has also received funding from the European Union's Horizon research and innovation programme under grant agreement No 101135798, project Meetween (My Personal AI Mediator for Virtual MEETtings BetWEEN People).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The Need for Interpretable Features: Motivation and Taxonomy</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zytek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Arnaldo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Berti-Equille</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Veeramachaneni</surname></persName>
		</author>
		<idno type="DOI">10.1145/3544903.3544905</idno>
		<idno>doi:10.1145/3544903. 3544905</idno>
		<ptr target="https://doi.org/10.1145/3544903.3544905" />
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explor. Newsl</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">We need to talk: HCI and the delicate topic of spoken language interaction</title>
		<author>
			<persName><forename type="first">C</forename><surname>Munteanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Oviatt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brewster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Penn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Whittaker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rajput</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nanavati</surname></persName>
		</author>
		<idno type="DOI">10.1145/2468356.2468803</idno>
		<idno>doi:10.1145/2468356.2468803</idno>
		<ptr target="https://doi.org/10.1145/2468356.2468803" />
	</analytic>
	<monogr>
		<title level="m">CHI &apos;13 Extended Abstracts on Human Factors in Computing Systems, CHI EA &apos;13</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2459" to="2464" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Continuous Authentication for Voice Assistants</title>
		<author>
			<persName><forename type="first">H</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fawaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">G</forename><surname>Shin</surname></persName>
		</author>
		<idno type="DOI">10.1145/3117811.3117823</idno>
		<idno>doi:10.1145/3117811. 3117823</idno>
		<ptr target="https://doi.org/10.1145/3117811.3117823" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, MobiCom &apos;17</title>
				<meeting>the 23rd Annual International Conference on Mobile Computing and Networking, MobiCom &apos;17<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="343" to="355" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Personal Voice Assistant Security and Privacy-A Survey</title>
		<author>
			<persName><forename type="first">P</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Roedig</surname></persName>
		</author>
		<idno type="DOI">10.1109/JPROC.2022.3153167</idno>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the IEEE</title>
		<imprint>
			<biblScope unit="volume">110</biblScope>
			<biblScope unit="page" from="476" to="507" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Why Do People Use Artificial Intelligence (AI)-Enabled Voice Assistants?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Malodia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kaur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dhir</surname></persName>
		</author>
		<idno type="DOI">10.1109/TEM.2021.3117884</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Engineering Management</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="491" to="505" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Baevski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mohamed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Auli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS&apos;20</title>
				<meeting>the 34th International Conference on Neural Information Processing Systems, NIPS&apos;20<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="12449" to="12460" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units</title>
		<author>
			<persName><forename type="first">W.-N</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bolte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-H</forename><forename type="middle">H</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lakhotia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mohamed</surname></persName>
		</author>
		<idno type="DOI">10.1109/TASLP.2021.3122291</idno>
		<ptr target="https://doi.org/10.1109/TASLP.2021.3122291.doi:10.1109/TASLP.2021.3122291" />
	</analytic>
	<monogr>
		<title level="j">IEEE/ACM Transactions on Audio, Speech, and Language Processing</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="3451" to="3460" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kanda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yoshioka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<idno type="DOI">10.1109/JSTSP.2022.3188113</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Journal of Selected Topics in Signal Processing</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="1505" to="1518" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Robust Speech Recognition via Large-Scale Weak Supervision</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Brockman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mcleavey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<idno>JMLR.org</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International Conference on Machine Learning, ICML&apos;23</title>
				<meeting>the 40th International Conference on Machine Learning, ICML&apos;23</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="28492" to="28518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Towards A Rigorous Science of Interpretable Machine Learning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Doshi-Velez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1702.08608</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Machine Learning Interpretability: A Survey on Methods and Metrics</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V</forename><surname>Carvalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Pereira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Cardoso</surname></persName>
		</author>
		<idno type="DOI">10.3390/electronics8080832</idno>
		<ptr target="https://www.mdpi.com/2079-9292/8/8/832.doi:10.3390/electronics8080832" />
	</analytic>
	<monogr>
		<title level="j">Electronics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Notions of explainability and evaluation approaches for explainable artificial intelligence</title>
		<author>
			<persName><forename type="first">G</forename><surname>Vilone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Longo</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.inffus.2021.05.009</idno>
		<ptr target="https://doi.org/10.1016/j.inffus.2021.05.009" />
	</analytic>
	<monogr>
		<title level="j">Information Fusion</title>
		<imprint>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="page" from="89" to="106" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Interpretable Data-Based Explanations for Fairness Debugging</title>
		<author>
			<persName><forename type="first">R</forename><surname>Pradhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Glavic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Salimi</surname></persName>
		</author>
		<idno type="DOI">10.1145/3514221.3517886</idno>
		<idno>doi:10.1145/ 3514221.3517886</idno>
		<ptr target="https://doi.org/10.1145/3514221.3517886" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 International Conference on Management of Data, SIG-MOD &apos;22</title>
				<meeting>the 2022 International Conference on Management of Data, SIG-MOD &apos;22<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="247" to="261" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The role of explainable AI in the context of the AI Act</title>
		<author>
			<persName><forename type="first">C</forename><surname>Panigutti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Hupont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fernandez Llorca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fano Yela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Junklewitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Scalzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mazzini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sanchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Soler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Garrido</surname></persName>
		</author>
		<author>
			<persName><surname>Gomez</surname></persName>
		</author>
		<idno type="DOI">10.1145/3593013.3594069</idno>
		<idno>doi:10.1145/3593013.3594069</idno>
		<ptr target="https://doi.org/10.1145/3593013.3594069" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT &apos;23</title>
				<meeting>the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT &apos;23<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1139" to="1150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Ferrando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bisazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Costajussà</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2405.00208</idno>
		<title level="m">A Primer on the Inner Workings of Transformer-based Language Models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Gradient-Based Attribution Methods</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ancona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ceolini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Öztireli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gross</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-28954-6_9</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-030-28954-6_9" />
	</analytic>
	<monogr>
		<title level="m">Explainable AI: Interpreting, Explaining and Visualizing Deep Learning</title>
				<editor>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="169" to="191" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Towards Explainable Artificial Intelligence</title>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-28954-6_1</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-030-28954-6_1" />
	</analytic>
	<monogr>
		<title level="m">Explainable AI: Interpreting, Explaining and Visualizing Deep Learning</title>
				<editor>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Hansen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="5" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Towards the Unification and Robustness of Perturbation and Gradient Based Explanations</title>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jabbari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Upadhyay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lakkaraju</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v139/agarwal21c.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th International Conference on Machine Learning</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Meila</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</editor>
		<meeting>the 38th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">139</biblScope>
			<biblScope unit="page" from="110" to="119" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Perturbation-based methods for explaining deep neural networks: A survey</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ivanovs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kadikis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ozols</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.patrec.2021.06.030</idno>
		<ptr target="https://doi.org/10.1016/j.patrec.2021.06.030" />
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition Letters</title>
		<imprint>
			<biblScope unit="volume">150</biblScope>
			<biblScope unit="page" from="228" to="234" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Explainable Artificial Intelligence: A Survey</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">K</forename><surname>Došilović</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brčić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Hlupić</surname></persName>
		</author>
		<idno type="DOI">10.23919/MIPRO.2018.8400040</idno>
	</analytic>
	<monogr>
		<title level="m">ternational Convention on Information and Communication Technology, Electronics and Microelectronics</title>
				<meeting><address><addrLine>MIPRO</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="210" to="215" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities</title>
		<author>
			<persName><forename type="first">W</forename><surname>Saeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Omlin</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.knosys.2023.110273</idno>
		<ptr target="https://doi.org/10.1016/j.knosys.2023.110273" />
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">263</biblScope>
			<biblScope unit="page">110273</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Do Feature Attribution Methods Correctly Attribute Features?</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Booth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shah</surname></persName>
		</author>
		<idno type="DOI">10.1609/aaai.v36i9.21196</idno>
		<ptr target="https://ojs.aaai.org/index.php/AAAI/article/view/21196.doi:10.1609/aaai.v36i9.21196" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="9623" to="9633" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Amariucai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Qiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.17945</idno>
		<title level="m">A Comprehensive and Reliable Feature Attribution Method: Double-sided Remove and Reconstruct (DoRaR)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Phillips</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fontana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">K</forename><surname>Greene</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Broniatowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Przybocki</surname></persName>
		</author>
		<idno type="DOI">10.6028/NIST.IR.8312</idno>
		<ptr target="https://doi.org/10.6028/NIST.IR.8312" />
		<title level="m">Four Principles of Explainable Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jacovi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.386</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.386.doi:10.18653/v1/2020.acl-main.386" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4198" to="4205" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Towards Faithful Model Explanation in NLP: A Survey</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Apidianaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<idno type="DOI">10.1162/coli_a_00511</idno>
		<ptr target="https://aclanthology.org/2024.cl-2.6.doi:10.1162/coli_a_00511" />
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="657" to="723" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">AudioMNIST: Exploring Explainable Artificial Intelligence for audio analysis on a simple benchmark</title>
		<author>
			<persName><forename type="first">S</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vielhaben</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ackermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lapuschkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jfranklin.2023.11.038</idno>
		<ptr target="https://doi.org/10.1016/j.jfranklin.2023.11.038" />
	</analytic>
	<monogr>
		<title level="j">Journal of the Franklin Institute</title>
		<imprint>
			<biblScope unit="volume">361</biblScope>
			<biblScope unit="page" from="418" to="428" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features</title>
		<author>
			<persName><forename type="first">E</forename><surname>Pastor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Koudounas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Attanasio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Baralis</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.eacl-long.136" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">Y</forename><surname>Graham</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Purver</surname></persName>
		</editor>
		<meeting>the 18th Conference of the European Chapter of the Association for Computational Linguistics<address><addrLine>St. Julian&apos;s, Malta</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2221" to="2238" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Directly Comparing the Listening Strategies of Humans and Machines</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename><surname>Trinh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mandel</surname></persName>
		</author>
		<idno type="DOI">10.1109/TASLP.2020.3040545</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE/ACM Transactions on Audio, Speech, and Language Processing</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="312" to="323" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Visualizing Automatic Speech Recognition -Means for a Better Understanding?</title>
		<author>
			<persName><forename type="first">K</forename><surname>Markert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Parracone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kulakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sperl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Kao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Böttinger</surname></persName>
		</author>
		<idno type="DOI">10.21437/SPSC.2021-4</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication</title>
				<meeting>2021 ISCA Symposium on Security and Privacy in Speech Communication</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="14" to="20" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Explanations for Automatic Speech Recognition</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajan</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICASSP49357.2023.10094635</idno>
	</analytic>
	<monogr>
		<title level="m">ICASSP 2023 -2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajan</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICASSP48485.2024.10445989</idno>
	</analytic>
	<monogr>
		<title level="m">ICASSP 2024 -2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="10296" to="10300" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Intra-speaker phonetic variation in read speech: comparison with inter-speaker variability in a controlled population</title>
		<author>
			<persName><forename type="first">N</forename><surname>Audibert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fougeron</surname></persName>
		</author>
		<idno type="DOI">10.21437/Interspeech.2022-10965</idno>
		<ptr target="https://hal.science/hal-03852142.doi:10.21437/Interspeech.2022-10965" />
	</analytic>
	<monogr>
		<title level="m">Interspeech 2022, ISCA, Incheon</title>
				<meeting><address><addrLine>South Korea</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="4755" to="4759" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">An Introduction to Phonetics and Phonology</title>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yallop</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1990">1990</date>
			<publisher>B. Blackwell</publisher>
			<pubPlace>Oxford, UK</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Yule</surname></persName>
		</author>
		<title level="m">The Study of Language</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>7 ed</note>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title level="m" type="main">Acoustic Phonetics</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">N</forename><surname>Stevens</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
			<publisher>The MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Effects of degradation of intensity, time, or frequency content on speech intelligibility for normal-hearing and hearing-impaired listeners</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">H</forename><surname>Van Schijndel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Houtgast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Festen</surname></persName>
		</author>
		<idno type="DOI">10.1121/1.1378345</idno>
		<idno>doi:</idno>
		<ptr target="10.1121/1.1378345" />
	</analytic>
	<monogr>
		<title level="j">The Journal of the Acoustical Society of America</title>
		<imprint>
			<biblScope unit="volume">110</biblScope>
			<biblScope unit="page" from="529" to="542" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Acoustic correlates of some phonetic categories</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">N</forename><surname>Stevens</surname></persName>
		</author>
		<idno type="DOI">10.1121/1.384823</idno>
	</analytic>
	<monogr>
		<title level="j">The Journal of the Acoustical Society of America</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="page" from="836" to="842" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Honey</surname></persName>
		</author>
		<idno type="DOI">10.1002/9781405166256.ch6</idno>
		<ptr target="https:" />
		<title level="m">Sociophonology</title>
				<imprint>
			<publisher>John Wiley &amp; Sons, Ltd</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="92" to="106" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<title/>
		<idno type="DOI">10.1002/9781405166256.ch6</idno>
		<idno>.1002/9781405166256.ch6</idno>
		<ptr target="//doi.org/10" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<monogr>
		<title level="m" type="main">Speech Prosody: from Acoustics to Interpretation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hirst</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2024">2024</date>
			<publisher>Springer</publisher>
			<pubPlace>Berlin, Heidelberg</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">The Diversity of Tone Languages and the Roles of Pitch Variation in Non-tone Languages: Considerations for Tone Perception Research</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Best</surname></persName>
		</author>
		<idno type="DOI">10.3389/fpsyg.2019.00364</idno>
		<ptr target="https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2019.00364.doi:10.3389/fpsyg.2019.00364" />
	</analytic>
	<monogr>
		<title level="j">Frontiers in Psychology</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds</title>
		<author>
			<persName><forename type="first">F</forename><surname>Alías</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Socoró</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sevillano</surname></persName>
		</author>
		<idno type="DOI">10.3390/app6050143</idno>
		<ptr target="https://www.mdpi.com/2076-3417/6/5/143.doi:10.3390/app6050143" />
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">A review of deep learning techniques for speech processing</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mehrish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bharadwaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mihalcea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Poria</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.inffus.2023.101869</idno>
		<ptr target="https://doi.org/10.1016/j.inffus.2023.101869" />
	</analytic>
	<monogr>
		<title level="j">Information Fusion</title>
		<imprint>
			<biblScope unit="volume">99</biblScope>
			<biblScope unit="page">101869</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Communication</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Barrault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-A</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Meglioli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-A</forename><surname>Duquenne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Elsahar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Heffernan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Klaiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Licht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Maillard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rakotoarison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Sadagopan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Akula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">E</forename><surname>Hachem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ellis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Haaheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hansanti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Howes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-J</forename><surname>Hwang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Inaguma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kalbassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kallet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kulikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mavlyutov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Peloquin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ramadan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Tufanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vogeti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Andrews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Balioglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Costajussà</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Celebi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Elbayad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mourachko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Popuri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ropers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saleem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tomasello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2308.11596.arXiv:2308.11596" />
		<title level="m">SeamlessM4T: Massively Multilingual &amp; Multimodal Machine Translation</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<analytic>
		<title level="a" type="main">The Kaldi Speech Recognition Toolkit</title>
		<author>
			<persName><forename type="first">D</forename><surname>Povey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ghoshal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Boulianne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Burget</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Glembek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hannemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Motlicek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Schwarz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Silovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stemmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Vesely</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, IEEE Signal Processing Society</title>
				<imprint>
			<publisher>IEEE Catalog</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
	<note>CFP11SRW-USB</note>
</biblStruct>

<biblStruct xml:id="b46">
	<analytic>
		<title level="a" type="main">The Encoding of Vowel Features in Mel-Frequency Cepstral Coefficients</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ikarous</surname></persName>
		</author>
		<idno type="DOI">10.17469/O2104AISV000001</idno>
		<ptr target="https://doi.org/10.17469/O2104AISV000001" />
	</analytic>
	<monogr>
		<title level="m">Il parlato nel contesto naturale [Speech in the natural context</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Vietti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Spreafico</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Mereu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Galatà</surname></persName>
		</editor>
		<meeting><address><addrLine>Officinaventuno, Milano</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="9" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">B</forename><surname>Mcfee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mcvicar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Faronbi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Roman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gover</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Balke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Seyfarth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Malek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lostanlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Niekirk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cwitkowitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zalkow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Nieto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ellis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mason</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Steers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Halvachs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Thomé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Robert-Stöter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bittner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Battenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Yamamoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Carr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Metsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sullivan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Friesch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krishnakumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hidaka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kowalik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Keller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mazur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chabot-Leclerc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hawthorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ramaprasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Keum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Monroe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename><surname>Morozov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Eliasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nullmightybofo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">D</forename><surname>Biberstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sergin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hennequin</surname></persName>
		</author>
		<author>
			<persName><surname>Naktinis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Beantowel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Åsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Malins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hereñú</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Struijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nickel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vollrath</surname></persName>
		</author>
		<author>
			<persName><surname>Sarroff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Xiao-Ming</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Porter</surname></persName>
		</author>
		<author>
			<persName><surname>Kranzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Voodoohop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gangi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jinoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guerrero</surname></persName>
		</author>
		<author>
			<persName><surname>Mazhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tod-Drme2178</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baratz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Kostin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Zhuang</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">P</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Campr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Semeniuc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Biswal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Moura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Brossier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><surname>Pimenta</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.8252662</idno>
		<ptr target="https://doi.org/10.5281/zenodo.8252662.doi:10.5281/zenodo.8252662" />
	</analytic>
	<monogr>
		<title level="j">librosa/librosa</title>
		<imprint>
			<biblScope unit="volume">0</biblScope>
			<biblScope unit="issue">10</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Ravanelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1811.09725</idno>
		<title level="m">Interpretable Convolutional Filters with SincNet</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b50">
	<analytic>
		<title level="a" type="main">Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings</title>
		<author>
			<persName><forename type="first">M</forename><surname>Angrick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Herff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Krusienski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schultz</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neucom.2018.10.080</idno>
		<ptr target="https://doi.org/10.1016/j.neucom.2018.10.080.doi:10.1016/j.neucom.2018.10.080" />
	</analytic>
	<monogr>
		<title level="j">Neurocomput</title>
		<imprint>
			<biblScope unit="volume">342</biblScope>
			<biblScope unit="page" from="145" to="151" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b51">
	<analytic>
		<title level="a" type="main">IIRI-Net: An interpretable convolutional front-end inspired by IIR filters for speaker identification</title>
		<author>
			<persName><forename type="first">H</forename><surname>Fayyazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shekofteh</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neucom.2023.126767</idno>
		<ptr target="https://doi.org/10.1016/j.neucom.2023.126767.doi:10.1016/j.neucom.2023.126767" />
	</analytic>
	<monogr>
		<title level="j">Neurocomput</title>
		<imprint>
			<biblScope unit="volume">558</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b52">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Hannun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Case</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Casper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Catanzaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Diamos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Elsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Prenger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satheesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sengupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Coates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.5567</idno>
		<title level="m">Deep Speech: Scaling up end-toend speech recognition</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b53">
	<analytic>
		<title level="a" type="main">Design of the CMU Sphinx-4 Decoder</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lamere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kwok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Walker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gouvea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Raj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wolf</surname></persName>
		</author>
		<idno type="DOI">10.21437/Eurospeech.2003-382</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003)</title>
				<meeting>8th European Conference on Speech Communication and Technology (Eurospeech 2003)</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="1181" to="1184" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b54">
	<analytic>
		<title level="a" type="main">Inseq: An Interpretability Toolkit for Sequence Generation Models</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Feldhus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sickert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Van Der Wal</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-demo.40</idno>
		<ptr target="https://aclanthology.org/2023.acl-demo.40.doi:10.18653/v1/2023.acl-demo.40" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Bollegala</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Ritter</surname></persName>
		</editor>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="421" to="435" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b55">
	<analytic>
		<title level="a" type="main">Using Captum to Explain Generative Language Models</title>
		<author>
			<persName><forename type="first">V</forename><surname>Miglani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Markosyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garcia-Olano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kokhlikyan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.nlposs-1.19</idno>
		<ptr target="https://aclanthology.org/2023.nlposs-1.19.doi:10.18653/v1/2023.nlposs-1.19" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Tan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Milajevs</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Chauhan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Gwinnup</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Rippeth</surname></persName>
		</editor>
		<meeting>the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), Association for Computational Linguistics<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="165" to="173" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b56">
	<analytic>
		<title level="a" type="main">What do we want from Explainable Artificial Intelligence (XAI)? -A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research</title>
		<author>
			<persName><forename type="first">M</forename><surname>Langer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Oster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Speith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hermanns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kästner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sesing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Baum</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.artint.2021.103473</idno>
		<ptr target="https://doi.org/10.1016/j.artint.2021.103473" />
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">296</biblScope>
			<biblScope unit="page">103473</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b57">
	<analytic>
		<title level="a" type="main">Speech Therapy Supported by AI and Smart Assistants</title>
		<author>
			<persName><forename type="first">M</forename><surname>Calvano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Curci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pagano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piccinno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Product-Focused Software Process Improvement</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Kadgien</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Jedlitschka</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Janes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Lenarduzzi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</editor>
		<meeting><address><addrLine>Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="97" to="104" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
