<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SignalGrad-CAM: beyond image explanation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samuele Pe</string-name>
          <email>samuele.pe01@universitadipavia.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Buonocore</string-name>
          <email>tommaso.buonocore@unipv.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanna Nicora</string-name>
          <email>giovanna.nicora@unipv.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enea Parimbelli</string-name>
          <email>enea.parimbelli@unipv.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical</institution>
          ,
          <addr-line>Computer</addr-line>
          ,
          <institution>and Biomedical Engineering, University of Pavia</institution>
          ,
          <addr-line>Via Adolfo Ferrata 5, Pavia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Telfer School of Management, University of Ottawa</institution>
          ,
          <addr-line>55 Laurier Avenue East, Ottawa, Ontario K1N 6N5</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Deep learning models have demonstrated remarkable performance across various domains; however, their blackbox nature hinders interpretability and trust. As a result, the demand for explanation algorithms has grown, driving advancements in the field of eXplainable AI (XAI). However, relatively few eforts have been dedicated to developing interpretability methods for signal-based models. In this work, we introduce SignalGrad-CAM , a versatile and eficient interpretability tool that extends the principles of Grad-CAM to both 1D- and 2Dconvolutional neural networks for signal processing. SGrad-CAM is designed to interpret models for either image or signal elaboration, supporting both PyTorch and TensorFlow/Keras frameworks, and provides diagnostic and visualization tools to enhance model transparency. The package is also designed for batch processing, ensuring eficiency even for large-scale applications, while maintaining a simple and user-friendly structure. We validated SGrad-CAM on multiple open-source models and datasets, including speech emotion recognition, cardiovascular disease classification, and human activity recognition (HAR). Results suggest that SGrad-CAM consistently highlights salient features in the inputs, aiding model understanding and bias detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Grad-CAM</kwd>
        <kwd>XAI</kwd>
        <kwd>time series</kwd>
        <kwd>CNN</kwd>
        <kwd>HAR</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Background</title>
        <p>
          The rapid evolution of deep learning (DL) architectures has outpaced the development of tools to
interpret their increasingly complex decision-making processes. The opaque nature of these systems
sufers from the critical flaw of alienating users, leading to detrimental consequences in
humanAI decision-making. This phenomenon, often termed “algorithm aversion” [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], fosters distrust in
artificial intelligence (AI), exacerbating the already contentious human-AI relationship. The need for
interpretability tools to demystify the black box and ensure trustworthiness has grown increasingly
urgent, forming the core objective of eXplainable AI (XAI) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], an innovative research trend. XAI ofers
diverse solutions emphasizing interpretability, especially in high-stakes decision-making contexts, such
as in healthcare.
        </p>
        <p>
          In image processing, the visual, post-hoc algorithm Gradient-weighted Class Activation Mapping
(Grad-CAM) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] remains a gold standard for visualizing spatial importance in convolutional networks.
By leveraging layer activations and partial derivatives of the output with respect to layer parameters,
Grad-CAM eficiently constructs saliency maps that highlight image regions that most influence the
model’s decision. Numerous extensions of Grad-CAM have emerged to refine or complement it, such
as HiResCAM [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Grad-CAM++ [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          While interpretability research in image processing has received significant attention, other input
modalities – i.e. signals [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and 3D volumetric/video data [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] – remain underexplored in the context
of XAI. For signals, basic interpretability can be achieved through example-based methods like those
leveraging Shapelets [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], while more sophisticated techniques often adapt existing algorithms originally
designed for images or tabular data. Examples include perturbation-based approaches like SHAP [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
and gradient-driven methods like the “gradient × input” technique [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] or Class Activation Mapping
(CAM) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          Similar to CAM, Grad-CAM and its descendants were initially designed for images and 2D-CNNs,
but their core principles are readily applicable to signal processing [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] since many state-of-the-art
signal processing networks rely on convolutional layers – typically 1D-CNNs that perform temporal
convolutions, but 2D models (which treat signals as pseudo-images) are also common. Extending
Grad-CAM to these domains represents a simple yet understudied approach to generating eficient,
intuitive explanations for signal-based models.
        </p>
        <p>Building on this foundation, we present SignalGrad-CAM (SGrad-CAM, https://github.com/
bmi-labmedinfo/signal_grad_cam), an easy-to-use, versatile Python package for generating class
activation maps. SGrad-CAM supports both 1D- and 2D-CNNs, accommodates image and signal data, and
eficiently processes batched inputs. Beyond producing saliency maps, the package includes diagnostic
tools for quantitative and qualitative evaluation, such as customizable visualization pipelines that enable
users to validate the model’s behavior efectively.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Related works</title>
        <p>
          Over time, the open-source community has developed numerous Python packages that implement
Grad-CAM and its variants. Building on PyTorch, we can consider, for example, the Grad-CAM oficial
repository (https://github.com/ramprs/grad-cam) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], pytorch-grad-cam (https://github.com/jacobgil/
pytorch-grad-cam), TorchCAM (https://github.com/frgfm/torch-cam), GradCam-Pytorch-Implementation
(https://github.com/irfanbykara/GradCam-Pytorch-Implementation), and OmniXAI (https://github.
com/salesforce/OmniXAI). When considering TensorFlow or Keras as a framework, we find packages
such as keras-grad-cam (https://github.com/jacobgil/keras-grad-cam), OmniXAI, and Xplique (https:
//github.com/deel-ai/xplique).
        </p>
        <p>
          Many of these tools excel in eficiency – leveraging GPUs and batching acceleration to generate
saliency maps in real time – and in flexibility, allowing users to inspect arbitrary convolutional layers
or compare multiple attribution methods (e.g., HiResCAM [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], Grad-CAM++ [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). Furthermore, modern
libraries like Xplique and pytorch-grad-cam integrate quantitative evaluation metrics to assess
explanation faithfulness, such as RemOve And Debias (ROAD) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Some of them also provide baselines (e.g.,
RandomCAM) for comparisons and additional tools for sanity checks [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          However, despite these advancements, critical limitations persist. First, modality constraints limit
existing implementations: most tools focus exclusively on 2D image data, and their support for other data
modalities (like 1D signals or 3D data) is limited, non-intuitive, or requires manual adjustments. These
shortcoming hinder the adoption of interpretability tools in emerging domains such as computational
pathology (multi-scale 3D imagery), autonomous systems (multi-modal sensor fusion), and real-time
signal processing. Second, framework lock-in hinders reproducibility, as packages like TorchCAM
(PyTorch) and tf-keras-vis (TensorFlow) enforce ecosystem-specific workflows, complicating
crossframework comparisons. Third, only a few tools are optimized for distributed computation or large
batches, limiting their utility in large-scale applications. Finally, the Grad-CAM algorithm envisions the
computation of gradients with respect to the output scores (logits) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] – i.e., before the application of
Softmax/Sigmoid – but these packages compute gradients directly on the output, regardless of its form.
        </p>
        <p>
          To address these challenges, we present SignalGrad-CAM, a Python package designed for eficiency,
lfexibility, and cross-modality compatibility. SGrad-CAM extends Grad-CAM’s core principles with
four key innovations. First, our approach has a modality-agnostic design, natively supporting 1D
(time-series) and 2D (images) through a unified API. Second, it is compatible with both PyTorch and
TensorFlow/Keras. Third, SignalGrad-CAM can be applied to any custom architecture with at least one
convolutional layer. Fourth and last, by performing back-transformation of probabilistic outputs into
their original logit form before gradient computation, our implementation faithfully adheres to the
original Grad-CAM algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>The package is designed to be easy-to-use and provide self-explanatory class methods, and capable
of computing multiple CAMs per time – e.g., for an entire batch of data, across multiple target classes
and layers, and with several algorithm variants (vanilla Grad-CAM and HiResCAM, for the moment).
Moreover, SGrad-CAM is designed to guarantee eficiency by leveraging GPU and batching acceleration.
Finally, the package is equipped with methods for CAM visualization, either standalone or overlaid
with the input data.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. SignalGrad-CAM: overview and implementation details</title>
      <p>SignalGrad-CAM is designed to provide a versatile implementation of Grad-CAM, enabling users to
interpret 2D-CNNs for image analysis and 1D- or 2D-CNNs for signal processing. Our package allows
users to explain models built with either PyTorch or TensorFlow/Keras, ofering two specific post-hoc
visual XAI algorithms: Grad-CAM and HighResCAM. Moreover, the code is optimized to ensure eficient
execution, even when generating explanations for multiple batched inputs.</p>
      <p>
        When applied to 2D networks, SGrad-CAM’s output is a bidimensional matrix, identifying the most
relevant regions (features) of the input that influence the classification of the selected class. In the case of
images, as demonstrated by Selvaraju et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the highlighted features typically correspond to highly
informative objects, structures, or patterns in the image, visually indicating the factors influencing the
classification.
      </p>
      <p>For signal processing using 2D-CNNs, the general approach treats the input as a single-channel
image (length × Nchannels). The resulting CAM remains bidimensional, and one axis represents time
(or sample indices) while the other axis corresponds to signal channels [Fig. 1(a-b) and 2(b)]. Since
these algorithms are designed to capture spatial information, their application to signals translates to
capturing temporal information along the first axis and channel correlation along the second one. The
output can thus be interpreted as a visualization of each channel’s importance over time.</p>
      <p>Finally, when applying Grad-CAM to 1D-CNNs, the focus shifts from spatial to purely temporal
information. The result is a one-dimensional array of importance scores, allowing users to assess the
average relevance of diferent time steps in the signal [Fig. 1(c-d)]. This provides insights into which
time segments contribute most significantly to the model’s predictions.</p>
      <sec id="sec-2-1">
        <title>2.1. Grad-CAM for signals</title>
        <p>
          In the context of images, the idea behind Grad-CAM (GC) is to retrieve spatial information captured
by convolutional layers, which is typically lost in the final fully connected layers of the network.
The assumption is that the last convolutional layers of a network extract semantic and class-specific
information from the input, making their outputs (i.e., layer activations) good candidates for highlighting
the most valuable spatial information in the image. Each layer produces multiple activation maps (one
per neuron), which need to be combined. The simplest and most straightforward solution proposed by
Selvaraju et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is to compute a weighted average, using the mean gradient of the output score with
respect to the activation map as a weight for the activation map itself.
        </p>
        <p>Concerning 2D-CNNs for either image or signal classification, we followed the algorithm’s
instructions precisely, obtaining the class activation map for a single, unbatched input as follows:
CAMGC(, ) = 
︃(  )︃
∑︁ A(, ) ,  =
=1</p>
        <p>1 ∑︁ 
|A| (,)∈A A(, )
where CAMGC is a bidimensional matrix, A represents the activation tensor (K × height × width, with K
the number of neurons in the layer) of the selected layer, and s is the output score for the desired class.</p>
        <p>We extended the algorithm to the case of 1D convolutions, where the output activations can be
considered 1D vectors instead of 2D feature maps:
camGC() = ReLU
︃(  )︃
∑︁ a() ,  =
=1</p>
        <p>1 ∑︁ 
|a| ∈a a()
where camGC is a one-dimensional vector, a represents the activation tensor (K × length, with K the
number of neurons in the layer) of the chosen layer, and s is the output score for the selected class.</p>
        <p>Many open-source networks do not provide raw score outputs but instead normalize them using
Softmax or Sigmoid functions to convert them into class probabilities. According to the original paper,
Grad-CAM requires the computation of partial derivatives directly on these scores, which may not
always be available. Nevertheless, retrieving the original logits from output probabilities is impossible
since the Softmax and Sigmoid functions cannot be mathematically inverted. To partially solve this
issue, we defined an approximate inversion formula:</p>
        <p>s ∼ ln(p) + constant
where p is the vector representing the output class probability and s contains the approximated logit
scores. The second term in the formula is an unknown constant, but this does not pose a problem
during gradient computation, as it cancels out. Note that, even though logits can often be stored during
the forward pass, this approach was chosen because the model is externally provided so reconstructing
or modifying it can be complex, especially when the architecture is not fully exposed or customizable.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. HiResCAM for signals</title>
        <p>
          HiResCAM (HRC) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] was developed to address a key limitation of Grad-CAM. Grad-CAM relies on
averaging gradients over spatial dimensions to obtain importance weights, but this approach is overly
simplistic. Gradients carry critical information, such as whether certain features require a sign change
or a rescaling. Averaging these values results in significant information loss. Moreover, HRC was proved
to perform better compared to Grad-CAM in various applications, including CT diagnostic imaging.
However, as noted by its authors, HRC’s results are often equivalent to Grad-CAM, depending on the
network selection.
(1)
(2)
(3)
For the 2D-CNN case, we obtained the CAM using the following formula:
︃( 
∑︁
=1
where camHRC is a one-dimensional vector, a represents the activation tensor (K × length, with K
being the number of neurons in the selected layer), and s is the output score for the target class. In
cases where outputs are probabilities, the inverse Softmax/Sigmoid solution described in the previous
section is applied.
        </p>
        <p>[Fig. 1] includes examples comparing Grad-CAM and HiResCAM, illustrating how the two algorithms
either produce almost identical CAMs (a and c) or highlight similar information, even when the maps
difer (b and d).
(5)</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Visualization tools</title>
        <p>The framework ofers a range of visualization tools to facilitate a detailed validation of the model
and its outputs. First, each CAM can be stored alongside relevant information about the examined
item: true class, predicted class, and prediction confidence for the selected class [Fig. 1]. The adopted
coloring scheme is jet, a rainbow-like gradient that smoothly transitions between multiple colors – blue
representing low-importance features (with saliency scores near zero) and red highlighting the most
salient ones. To enhance interpretability, a color bar is included, mapping colors to their corresponding
importance values assigned by the algirithm. For signal-related outputs, the horizontal axis can be
adjusted to convert sample indices into time values based on the sampling frequency, while channel
names can be displayed along the vertical axis. SGrad-CAM methods provide additional insights by
allowing users to retrieve raw CAM data, class probabilities, and importance score ranges, enabling
them to customize the display of the results.</p>
        <p>For validating 2D outputs, the package allows users to generate an image by superimposing the
CAM onto the original output [Fig. 2(c)]. This approach enhances interpretability by incorporating key
details such as prediction confidence for the selected class. While particularly useful for image-based
applications, this functionality also benefits the interpretability of signal processing tasks, especially
when dealing with numerous channels (e.g., spectrum signals, where the channel axis is continuous).</p>
        <p>For signals, both 1D- and 2D-CAMs can be used to enrich data by coloring (in a plot) each raw signal
point according to its corresponding importance score [Fig. 2(a, d)]. This output visualization modality
can be displayed for multiple channels: note that 1D-CAMs assign a uniform color per timestep across
all channels, whereas 2D-CAMs generate a unique color sequence for each channel.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Package validation</title>
      <p>To assess the functionality and versatility of SGrad-CAM, we applied the package to evaluate multiple
models. The selected models and tasks span diferent framework combinations (PyTorch or
TensorFlow/Keras), data types (image or signal), and model architectures (1D- or 2D-CNN). Most of these
models are open-source and publicly available along with their training and test datasets. The following
sections focus on validating models for signal classification. Each experiment provided insights into
the model’s functioning and helped validate SGrad-CAM’s eficacy: CAMs were always self-consistent,
highlighting meaningful areas within a signal – for example, in HAR, the body joints corresponded to
the parts actually involved in the movement.</p>
      <sec id="sec-3-1">
        <title>3.1. Experiments on open-source models</title>
        <p>The first set of tests involved open-source deep learning models. For the 2D case in signal classification,
we examined: hybrid convolutional-recurrent network with attention for speech emotion classification
(https://github.com/Data-Science-kosta/Speech-Emotion-Classification-with-PyTorch), which
leverages the Kaggle RAVDESS Emotional Speech Audio dataset, and a TensorFlow-based residual CNN for
cardiovascular disease classification (https://github.com/HaneenElyamani/ECG-classification) trained
on the PhysioNet PTB-XL dataset.</p>
        <p>In the context of the former model, [Fig. 2(c)] illustrates an example of a raw spectrum image
superimposed with the corresponding CAM, highlighting the most relevant frequency components
for detecting the “surprise” emotion in speech. Similarly, for the latter case, [Fig. 2(b)] shows the
distribution of importance scores across all channels of a sample ECG, indicating where the model
focuses for the classification of “myocardial infarction”. Note that, due to the structure of his kernels, this
model assigns equal importance scores across all channels even though the network is bidimensional.</p>
        <p>For 1D networks in signal classification, we selected a PyTorch deep residual network for ECG
classiifcation (https://github.com/hsd1503/resnet1d), originally developed for the PhysioNet/CinC Challenge
2017, and a TensorFlow/Keras ResNet-18 for classifying audio recordings from the AudioSet dataset
(https://github.com/ZFTurbo/classification_models_1D).</p>
        <p>The CAM visualization for a sample Physionet/CinC ECG signal [Fig. 2(d)] enables the identification
of the time interval where atrial fibrillation is detected by the PyTorch model. Lastly, by analyzing the
importance of diferent time steps for classifying the "Keys jangling" sound in the two signal wavelets
of an AudioSet item [Fig. 2(a)], we observe that the model primarily focuses on the final time steps.
Upon listening to the recording, we confirm that a jangling sound indeed occurs at the end of the audio.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Human Activity Recognition: a real-world use-case</title>
        <p>
          To further evaluate the framework’s capabilities, we tested four models from an ongoing study briefly
introduced in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This project aimed to compare diferent strategies for splitting datasets into training
and test sets in the field of Human Activity Recognition (HAR). The data originate from the open-source
NTU RGB+D 120 dataset and were collected using Kinect cameras. Each channel in the signal represents
one of the three coordinates (x, y, or z) of one of 25 body joints. In this context, we constructed multiple
models, including: two TensorFlow-based CNNs (1D and 2D) and two PyTorch hybrid LSTM-CNN
networks (1D and 2D).
        </p>
        <p>CAM visualizations provided valuable insights into the networks decision-making processes, helping
to identify potential biases. For instance, when analyzing specific results for 2D models [Fig. 1(a-b)], we
observe that the models focus on genuinely important features for classifying the thumbs-up gesture
– such as the positions of the right (a) and left fingers (b) – nevertheless multiple spurious features
are also considered, e.g., left knee or head positions. Additionally, we notice a significant shift in the
model’s decision-making strategy when LSTM layers are introduced [Fig. 1(b)]: for example, CAMs
highlight diferently organized patterns besides diferent significant features.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this work, we introduced SignalGrad-CAM, a versatile interpretability tool for CNN-based deep
learning models applicable to both image and signal data. By extending the principles of Grad-CAM
to support signal-oriented CNNs (1D and 2D) and accommodating multiple frameworks, SGrad-CAM
bridges a critical gap in the field of explainable AI for signal processing. Additionally, it ofers diagnostic
and visualization tools that enhance model interpretability and transparency.</p>
      <p>However, some limitations exist. Currently, SGrad-CAM does not support 3D data modalities, which
limits its applicability to volumetric medical imaging or video-based applications. The next logical
extension of this work will involve adapting Grad-CAM to 3D-CNNs, enabling its use in these more
complex domains.</p>
      <p>
        Future work will also focus on further expanding SGrad-CAM’s capabilities by integrating support
for multi-modal and unconventional input types (e.g., dictionaries or other data collections as input),
as well as further optimizing the algorithms for large-scale applications. Additionally, we plan to
implement other well-known CAM generation algorithms – such as Grad-CAM++ [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] – to provide
more options for model explainability. We also intend to develop additional CAM-based evaluation and
visualization tools to deepen the understanding of model behavior. For example, averaging CAMs over
the entire dataset or generating a “standard deviation” CAM will provide a broader view of model focus
and consistency. Furthermore, we will incorporate metrics to assess the faithfulness of the explanations
themselves, such as ROAD [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>By addressing these challenges, SGrad-CAM aims to further advance trust and transparency in
AI-driven applications, contributing to the growing demand for more interpretable and accountable
machine learning models.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Samuele Pe is a PhD student enrolled in the National PhD program in Artificial Intelligence, XXXIX
cycle, course on Health and life sciences, organized by Università Campus Bio-Medico di Roma. This
work was supported by the Italian Ministry of Research, under the complementary actions to the NRRP
“Fit4MedRob - Fit for Medical Robotics” Grant (# PNC0000007). Enea Parimbelli acknowledge funding
support provided by the Italian project PRIN PNRR 2022 InXAID - Interaction with eXplainable Artificial
Intelligence in (medical) Decision-making. CUP: H53D23008090001 funded by the European Union
Next Generation EU.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Dietvorst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Simmons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Massey</surname>
          </string-name>
          ,
          <article-title>Algorithm aversion: people erroneously avoid algorithms after seeing them err</article-title>
          ,
          <source>Journal of Experimental Psychology. General</source>
          <volume>144</volume>
          (
          <year>2015</year>
          )
          <fpage>114</fpage>
          -
          <lpage>126</lpage>
          . doi:
          <volume>10</volume>
          . 1037/xge0000033.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gohel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohanty</surname>
          </string-name>
          ,
          <string-name>
            <surname>Explainable</surname>
            <given-names>AI</given-names>
          </string-name>
          <article-title>: current status and future directions</article-title>
          ,
          <year>2021</year>
          . URL: http://arxiv.org/abs/2107.07045. doi:
          <volume>10</volume>
          .48550/arXiv.2107.07045, arXiv:
          <fpage>2107</fpage>
          .07045 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          , Grad-CAM:
          <article-title>Visual Explanations from Deep Networks via Gradient-Based Localization</article-title>
          , in: 2017
          <source>IEEE International Conference on Computer Vision</source>
          (ICCV),
          <year>2017</year>
          , pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          . URL: https://ieeexplore.ieee.org/ document/8237336. doi:
          <volume>10</volume>
          .1109/ICCV.
          <year>2017</year>
          .
          <volume>74</volume>
          , iSSN:
          <fpage>2380</fpage>
          -
          <lpage>7504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Draelos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Carin</surname>
          </string-name>
          ,
          <article-title>Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks</article-title>
          ,
          <year>2021</year>
          . URL: http://arxiv.org/abs/
          <year>2011</year>
          .08891. doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>2011</year>
          .
          <volume>08891</volume>
          , arXiv:
          <year>2011</year>
          .08891 [cs, eess].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Howlader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          , Grad-CAM+
          <article-title>+: Improved Visual Explanations for Deep Convolutional Networks</article-title>
          ,
          <source>in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>839</fpage>
          -
          <lpage>847</lpage>
          . URL: http://arxiv.org/abs/1710.11063. doi:
          <volume>10</volume>
          .1109/WACV.
          <year>2018</year>
          .
          <volume>00097</volume>
          , arXiv:
          <fpage>1710</fpage>
          .11063 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rojat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puget</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filliat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <source>Explainable Artificial Intelligence (XAI) on TimeSeries Data: A Survey</source>
          ,
          <year>2021</year>
          . URL: http://arxiv.org/abs/2104.00950. doi:
          <volume>10</volume>
          .48550/ arXiv.2104.00950, arXiv:
          <fpage>2104</fpage>
          .00950 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hiley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <article-title>Explainable Deep Learning for Video Recognition Tasks: A Framework &amp;</article-title>
          <string-name>
            <surname>Recommendations</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: http://arxiv.org/abs/
          <year>1909</year>
          .05667. doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>1909</year>
          .
          <volume>05667</volume>
          , arXiv:
          <year>1909</year>
          .05667 [cs, eess, stat].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ye</surname>
          </string-name>
          , E. Keogh,
          <article-title>Time series shapelets: a novel technique that allows accurate, interpretable and fast classification</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>22</volume>
          (
          <year>2011</year>
          )
          <fpage>149</fpage>
          -
          <lpage>182</lpage>
          . URL: https: //doi.org/10.1007/s10618-010-0179-5. doi:
          <volume>10</volume>
          .1007/s10618-010-0179-5.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nayebi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tipirneni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Foreman</surname>
          </string-name>
          , V. Subbian,
          <source>WindowSHAP: An Eficient Framework for Explaining Time-series Classifiers based on Shapley Values</source>
          ,
          <year>2023</year>
          . URL: http: //arxiv.org/abs/2211.06507. doi:
          <volume>10</volume>
          .48550/arXiv.2211.06507, arXiv:
          <fpage>2211</fpage>
          .06507 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Strodthof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Strodthof</surname>
          </string-name>
          ,
          <article-title>Detecting and interpreting myocardial infarction using fully convolutional neural networks</article-title>
          ,
          <source>Physiological Measurement</source>
          <volume>40</volume>
          (
          <year>2019</year>
          )
          <article-title>015001</article-title>
          . URL: https://dx.doi.org/10. 1088/
          <fpage>1361</fpage>
          -6579/aaf34d. doi:
          <volume>10</volume>
          .1088/
          <fpage>1361</fpage>
          -6579/aaf34d, publisher: IOP Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Oviedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Settens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. T. P.</given-names>
            <surname>Hartono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. L.</given-names>
            <surname>DeCost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. I. P.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Romano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Gilad</given-names>
            <surname>Kusne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Buonassisi</surname>
          </string-name>
          ,
          <article-title>Fast and interpretable classification of small X-ray difraction datasets using data augmentation and deep neural networks</article-title>
          ,
          <source>npj Computational Materials</source>
          <volume>5</volume>
          (
          <year>2019</year>
          )
          <article-title>60</article-title>
          . URL: https://www.nature.com/articles/s41524-019-0196-x. doi:
          <volume>10</volume>
          .1038/ s41524-019-0196-x, publisher: Nature Publishing Group.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          , X. Cheng, S. Kuang,
          <string-name>
            <given-names>A Novel</given-names>
            <surname>Two-Stage Refine</surname>
          </string-name>
          <article-title>Filtering Method for EEG-Based Motor Imagery Classification, Frontiers in Neuroscience 15 (</article-title>
          <year>2021</year>
          )
          <article-title>657540</article-title>
          . URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8440963/. doi:
          <volume>10</volume>
          .3389/fnins.
          <year>2021</year>
          .
          <volume>657540</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Leemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Borisov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Consistent</surname>
          </string-name>
          and
          <article-title>Eficient Evaluation Strategy for Attribution Methods</article-title>
          ,
          <year>2022</year>
          . URL: http://arxiv.org/abs/2202.00449. doi:
          <volume>10</volume>
          .48550/ arXiv.2202.00449, arXiv:
          <fpage>2202</fpage>
          .00449 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tomsett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gurram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <article-title>Sanity Checks for Saliency Metrics</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>6021</fpage>
          -
          <lpage>6029</lpage>
          . URL: https://ojs.aaai.org/index.php/AAAI/article/view/6064. doi:
          <volume>10</volume>
          .1609/aaai.v34i04.
          <fpage>6064</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pe</surname>
          </string-name>
          , G. Nicora,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Vittoria Guerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sozzi</surname>
          </string-name>
          , E. Parimbelli,
          <article-title>Systematic Comparison of Machine Learning for Activity Recognition in Cross-Subject vs</article-title>
          .
          <article-title>NonCross-Subject Scenarios: A Preliminary Analysis</article-title>
          ,
          <source>in: 2024 IEEE 8th Forum on Research and Technologies for Society and Industry Innovation (RTSI)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>379</lpage>
          . URL: https://ieeexplore.ieee.org/abstract/document/10761630. doi:
          <volume>10</volume>
          .1109/RTSI61910.
          <year>2024</year>
          .
          <volume>10761630</volume>
          , iSSN:
          <fpage>2687</fpage>
          -
          <lpage>6817</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>