<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward the application of XAI methods in EEG-based systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Apicella</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco isgrò</string-name>
          <email>francesco.isgro@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Pollastro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Prevete</string-name>
          <email>rprevete@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical Engineering and Information Technology, University of Naples Federico II</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratory of Artificial Intelligence, Privacy &amp; Applications, AIPA Lab</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Laboratory of Augmented Reality for Health Monitoring</institution>
          ,
          <addr-line>ARHeMLab</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>An interesting case of the well-known Dataset Shift Problem is the classification of Electroencephalogram (EEG) signals in the context of Brain-Computer Interface (BCI). The non-stationarity of EEG signals can lead to poor generalisation performance in BCI classification systems used in diferent sessions, also from the same subject. In this paper, we start from the hypothesis that the Dataset Shift problem can be alleviated by exploiting suitable eXplainable Artificial Intelligence (XAI) methods to locate and transform the relevant characteristics of the input for the goal of classification. In particular, we focus on an experimental analysis of explanations produced by several XAI methods on an ML system trained on a typical EEG dataset for emotion recognition. Results show that many relevant components found by XAI methods are shared across the sessions and can be used to build a system able to generalise better. However, relevant components of the input signal also appear to be highly dependent on the input itself.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BCI</kwd>
        <kwd>XAI</kwd>
        <kwd>EEG</kwd>
        <kwd>Dataset Shift</kwd>
        <kwd>cross-session</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        information related to the brain’s physiological, functional, and pathological status. EEG signals
are particularly suitable to this aim thanks to their important qualities such as non-invasiveness
and high temporal resolution [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Furthermore, several solutions exploiting EEG acquisition
devices more comfortable and with a low number of electrodes are being proposed [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
        ],
allowing an acquisition process less influenced by noise due to the user-device interaction.
Thanks to its properties, the EEG signal is one of the most promising candidates to become one
of the most used communication channels between man and machine.
      </p>
      <p>Several BCI solutions adopting ML methods are proposed in the literature. Generally, EEG
data acquired from persons subjected to well-known stimuli are used in the training stage.
These data are labelled following some established protocol, usually dependent on the task. For
example, in an Emotion Recognition (ER) task, stimuli can be images or videos considered able
to elicit particular emotions. Therefore the labels can be inferred by the stimuli or declared
by the subject, who will say whether or not he felt a specific emotion during the stimulus
administration. If the training stage is successful, the model can generalise on new unlabelled
data, such as new acquisition from another subject or the same subject in another session.</p>
      <p>
        However, one of the main defects of the EEG signal is that its statistical characteristics change
over time. This implies that even under the same conditions and for the same task, significantly
diferent signals can be acquired just as time passes. It is important to highlight that this
phenomenon can also occur using the same stimuli-reaction (e.g., same emotions with the same
stimuli) to the same subject at diferent times, leading to substantially diferent EEG signals
even for the same subject. This problem is even more present among diferent subjects, who,
given the same stimuli and emotions, can produce very diferent acquisitions between them.
For these reasons, EEG is considered a non-stationary signal [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. More in detail, the following
cases in an EEG-based task can arise: i) a model trained on a set of EEG data acquired from
a given subject at a specific time could not work on data acquired from the same subject at
diferent times (Cross-Session generalisation problem), or ii) a model trained on data acquired
from one or more subjects should not work as expected in classifying EEG signals acquired
from a diferent subject at diferent times (Cross-Subject generalisation problem).
      </p>
      <p>
        This type of problem can be treated as an instance of the Dataset Shift problem [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In a
nutshell, Dataset Shift arises when the distribution of the training data difers from the data
distribution used outside of the training stage (that is, running or evaluation stages); therefore
the standard ML assumption [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to have the same data distribution for both training and test
set does not hold. Consequently, standard ML approaches can produce ML systems which
exhibit poor generalisation performances.
      </p>
      <p>
        On another side, a sub-field of Artificial Intelligence, eXplainable Artificial Intelligence (XAI),
wants to explain the behaviour of AI systems, such as ML ones. In general, an explanation gives
information on why an ML model returns an output given a specific input. In particular, several
XAI methods applied to Deep Neural Networks are giving promising results [
        <xref ref-type="bibr" rid="ref15">15, 16, 17, 18</xref>
        ].
      </p>
      <p>In the XAI context, several explanations are built by inspecting the model’s inner mechanism
to understand the input-output relationships, assigning a relevance score to each input
component. However, building an explanation is particularly challenging if the model to inspect is
a DNN; this is mainly for two reasons: i) DNNs ofer excellent performances in several tasks,
but at the price of high inner complexity of the models, leading toward low interpretability,
ii) to help the ML user to understand the system behaviours, typical explanations have to be
feature evaluation</p>
      <p>XAI
method
examines</p>
      <p>ML system prediction
EEG</p>
      <p>Feature
extractor
humanly understandable.</p>
      <p>The general idea of this work is that outputs’ explanations of a trained ML model on given
inputs can help the setup of new models able to overcome/mitigate the dataset shift problem, in
general, and to generalise across subjects/sessions in case of EEG signals, in particular.</p>
      <p>More specifically, in this work, we focus on how several well-known XAI methods proposed
in literature behave in explaining decisions made by an ML system based on EEG input features
(Fig. 1). Notice that several current XAI methods are usually tested on datasets, such as image
and text recognition datasets [17, 19], where the domain shift problem is slight or not present.
Therefore, this work is a first step toward a long term goal consisting in exploiting explanations
made by XAI methods to locate and transform the main characteristics of the input for each
given output, and to build ML systems able to generalise toward diferent data coming from
diferent probability distributions (in this context, sessions and subjects). To this end, in this
paper, we evaluate and analyse the explanations produced by a set of well-known XAI methods
on an ML system trained on data taken from SEED [20], a public EEG dataset for an emotion
classification task. The results obtained show, on oneside, that only some well-known XAI
methods produce reliable explanations in the EEG domain in the analysed task. On another
side, it is shown that the relevant components found in the training data can only be partially
used on data acquired outside of the training stage. Notably, many relevant components found
in the training data are still relevant across the sessions.</p>
      <p>The paper is organised as follows: In Section 1, a brief description of the related works is
reported. In Section 2 the proposed evaluation framework is presented. In Section 3 the obtained
results are discussed. Finally, in Section 4 is devoted to final remarks and future developments.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Related works</title>
      <p>In general, Modern ML approaches, as Deep learning, are characterised by a lack of transparency
of their internal mechanisms, making it not easy for the AI scientist to understand the real
reasons behind the inner behaviours. In this case, the relationships of the classified emotion with
the EEG input are often challenging to understand. In the EEG-based applications, works based
on simple features selection strategies to choose the best EEG features are widely proposed in
the literature, such as [21, 22]. These studies, however, are based on standard feature selection
methods, without exploiting information given by XAI methods. XAI is a branch of AI interested
to “explain” ML behaviours. This is done providing methods for generating possible explanations
of the model’s outputs. XAI methods are gaining prominence in explaining several classification
systems based on several inputs, such as images [17, 23], natural language processing [24],
clinical decision support systems [25], and so on. To the best of our knowledge, however, the
number of research works which attempt to improve the performance of ML models on the
relying on XAI’s methods is enough limited, especially in the context of bio-signal classification
problems. For example, in [26, 27] feature selection procedures are performend on biomedical
data by exploiting Correlation-based Feature Selection and Chaotic Spider Monkey Optimization
methods. In [28] the authors propose to use an occlusion sensitivity analysis strategy [29] to
locate the most relevant cortical areas in a motor imagery task. In [30] the use of XAI methods
to interpret the answer of Epilepsy Detection systems is discussed.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Methods</title>
      <p>Bearing in mind that we want to use the XAI method to alleviate the dataset shift problem in
the BCI context, we conducted a series of experiments having the following goals: 1) testing the
capability of the selected XAI methods to find relevant components for this specific signal; 2)
verifying how much relevant components are dependent on the single sample of the dataset
where the relevance are computed; 3) how much relevant components can be considered shared
among samples of the same session, and finally 4) how much relevant components can be
considered shared between samples of two diferent sessions, where the data shift problem is
typically present.</p>
      <p>In the remaining of this section, a brief description of the tested XAI methods is reported,
followed by the used data and model descriptions. Finally experimental assessment and the
evaluation strategy adopted are reported.</p>
      <sec id="sec-3-1">
        <title>2.1. Investigated XAI Methods</title>
        <p>In this work, we analyse XAI methods proposing explanations in terms of relevance of the input
components on the output returned by a given classifier. More in detail, the following XAI
methods are investigated: Saliency [31], Guided Backpropagation [32], Layer-wise Relevance
Propagation (LRP) [33], Integrated Gradients [34], and DeepLIFT [35].</p>
        <sec id="sec-3-1-1">
          <title>2.1.1. Saliency</title>
          <p>Saliency method is one the of the simplest and more intuitive method to build an explanation of
a ML system. Proposed in [31], Saliency method is based on the gradient of the output function
of the ML system respect to its input. In a nutshell, an explanation of the output (x) of a ML
system fed with an input x ∈ R is built generating a saliency map leveraging on the gradient
x of  with respect to its input computed through backpropagation. The magnitude of the
gradient indicates how much the features need to be changed to afect the class score.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>2.1.2. Guided BackPropagation</title>
          <p>Guided BackPropagation (Guided BP) [32] can be seen as a slight variation of Saliency method
proposed in [31]. The main diference is in the value used as gradient in case of rectified
activation functions (ReLU): in Saliency method, the real gradient is used in computing the
features relevance. Instead, Guided BP starts from the hypothesis that the user is not interested
if a feature "decreases" (i.e., negative value) a neuron activation, but only in the most relevant
ones. Therefore, instead of the true gradient, in guided BP a gradient transformation is used
to prevent backward flow of negative values, avoiding to decrease the neuron activations and
highlighting the most relevant features. Obviously, Guided BP can fail to highlight inputs that
contribute negatively to the output due to "zero-ing" the negative values.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>2.1.3. Layer-wise Relevance Propagation</title>
          <p>Layer-wise Relevance Propagation (LRP) associates a relevance value to each input element
(pixels in case of images) to build explanations for the ML model answer. In a nutshell, the output
(x) of a ML system on an input x ∈ R is decomposed as a sum of relevances on the single

features composing x, i.e. (x) ≃ ∑︀  where  is a score of the local contribution of the
=1
-th feature on the produced output. In particular, positive values denote positive contributions,
while negative values negative contributions. Applied to ANN, this principle can be generalised
across each pair of consecutive layers  and  + 1 of a network composed of  layers such
 ′
that ∑︀  () where  and ′ are the features of the layers  + 1 and  respectively.</p>
          <p>(+1) = ∑︀ 
=1 =1
Since the final network output (x) of an ANN is the output of the -th layer, it results
 ′ 
that (x) = · · · = ∑︀ (+1) = ∑︀ () = · · · = ∑︀ . This rule can be interpreted as a
=1 =1 =1
conservation rule, and leveraging on that diferent methods to compute the relevance have been
proposed, depending on the type of features involved. In case of densely connected layers, the
most known rule is the  −  [33], which takes care of the neuron activations of each layer
to compute the final relevance of each layer.</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>2.1.4. Integrated Gradients</title>
          <p>
            One of the main drawbacks of simple gradient-based method is that the gradient respect to
the input should be small in the neighbourhood of the input features also for relevant ones.
Instead of using only the gradient respect to the original input, [34] proposed to average all the
gradients between the original input x and a baseline input x (that is, an input s.t. (x )
results in a neutral prediction). In this way, if features of inputs closer to the baseline have
higher gradient magnitudes, they are taken into account thanks to the average operator. More
formally, the importance of each feature  computed by Integrated Gradient (IG) is defined as
() = ( −  ) ∫︀ 1=0 (︀  +(−  ))︀  . In other words, IG aggregates the gradients
along the intermediate inputs on the straight-line between the baseline and the input, selected
as  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] changes.
          </p>
        </sec>
        <sec id="sec-3-1-5">
          <title>2.1.5. DeepLIFT</title>
          <p>In [35] a method consisting in assigning feature relevance scores according to the diference
between the neurons activation and a reference activation (such as the baseline for Integrated
Gradient method) is proposed. The authors proposed to compute for each feature a multiplier
entity similar to a partial derivative, but leveraging over finite diferences instead of infinitesimal
ones. Each multiplier can be defined as ΔΔ = ΔΔ and represents the ratio between i) the
Δ
contribution ΔΔ of the diference ∆  =  −  from the reference  of each feature 
to the diference ∆  =  −  between the output  and the reference output  , and ii) the
diference ∆ . Therefore, the authors proposed a set of rules to compute the features relevance
based on the proposed multipliers exploiting a Back Propagation-based approach.
2.2. Data
The SEED dataset consists of EEG signals recorded from 15 subjects stimulated by 15 film clips
carefully chosen to induce negative, neutral and positive emotions. Each film clip has a duration
of approximately 4 minutes. Three sessions of 15 trials were collected for each subject. EEG
signals were recorded in 62 channels using the ESI Neuroscan System1. During our experiments,
we considered the pre-computed diferential entropy (DE) features smoothed by linear dynamic
systems (LDS) for each second, in each channel, over the following five bands: delta (1–3 Hz);
theta (4–7 Hz); alpha (8–13 Hz); beta (14–30 Hz); gamma (31–50 Hz).</p>
          <p>In this work, the relevant components of an EEG signal can be considered taking into account
three diferent aspects of the signal: i) considering each single feature composing the input,
ii) considering each single band composing the EEG signal, that are alpha, beta, theta, and
delta, and iii) considering each single channel/electrode from which the input EEG signal was
acquired. Cases ii) and iii) can be viewed as diferent aggregations of fixed features of the EEG
signals. In the following of this work, we refer generically with the term "components" where it
is not necessary to specify if we are talking about features, bands or channels.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.3. Experimental assessment</title>
        <p>To achieve the goals defined at the beginning of this section, the following experiments are made:
ifrstly, to evaluate the capability of the selected XAI methods to find relevant components, we
analysed the explanations of model responses on data coming from the same session where the
1https://compumedicsneuroscan.com
training data was extracted; then, to evaluate how much relevant components can be considered
shared among samples of the same session, we analysed the explanations of the model responses
on data belonging to a session diferent from the training one. Finally, to evaluate if relevant
components can be considered shared between samples of two diferent sessions and how
much relevant components are dependent on the single data sample where the relevance are
computed, the components’ average relevance of data coming from the training session are
used as sorting score and select the components belonging to another session.</p>
        <p>Summarising, the following cases are considered: i) intra-session case: given a model 
trained on data coming from a session , explanations of the responses on input data belonging
to the same session  are built. ii) inter-session case: given a model  trained on data coming
from a session , explanation of responses on inputs belonging to a sessions  diferent from
 are built. Each of these cases can be in turn evaluated considering two diferent relevance:
a) real relevance: we assume that it is possible to compute the relevance of the input, since the
classification output is known; b) presumed relevance: we assume that the relevance of the
input is not available, since we are outside the training stage. In this case, we use the average of
the same component relevance obtained on training data as component relevance.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.4. Evaluation</title>
        <p>For each case, we investigated the explanations returned by XAI method in order to analyse if the
explanations built can correctly identify the impact that i) each input feature, ii) each electrode,
and iii) each frequency band has on the classification performances. To this aim, we consider as
relevance for each feature the relevance score returned by the XAI method, for each electrode
the mean relevance score of all the feature belonging to the electrode, and for each frequency
bands the mean average score of all the features belonging to the frequency band. Therefore, the
following evaluation strategies are then adopted and repeated considering features, electrodes,
and frequency bands as EEG components in turn: a) analysis of the MoRF (Most Relevant
First) curve, proposed in [33, 36]. In case of evaluating the components relevance returned by
the explanation method, the MoRF curve can be computed as follows: given a classifier, an
input EEG signal x and the respective classification output (x), the EEG components are
iteratively replaced by zeros, following the descending order with respect to the relevance
values returned by the explanation method. In other words, performances were analysed by
removing (i.e. setting to zero) components in a decreasing order of impact on the predictions
supplied by the explanation. In this way, the expected curve is such that more relevant the
identified components are for the classification output, steepest is the curve. Furthermore, the
change in the AOPC (Area Over Perturbation Curve) value is reported for each MoRF iteration.
AOPC is computed as   = 1+1 ⟨ ∑︀ (x(0)) − (x()))⟩ where  is the total number
=0
of iterations, x(0) is the original input, x() is the input at the iteration , and ⟨·⟩ is the average
operator over a set of inputs. MoRFs and AOPCs are reported also considering channels and
bands as characteristics to analyse.</p>
        <p>b) the analysis of the LeRF (Least Relevant First) curve, proposed in [36]. Diferently from the
MoRF curve, in this case the EEG components are iteratively removed following the ascending
order with respect to the relevance values returned by the explanation method. In the resulting
curve, we expect that the classification output should be very close to the original value when
the less relevant components are removed (corresponding to the first iterations), dropping
quickly to zero as the process goes toward the remotion of relevant elements. While the MoRFs
report how much the classifier output is destroyed removing highly relevant components, LeRFs
report how much the least relevant components leave the output intact. These indications
can be combined in the ABPC (Area Between Perturbation Curves, [36]) quantity, defined as
  = 1+1 ⟨ ∑︀ (x() ) − (x() ))⟩ where x() , x() are the values of the
=0
MoRF and LeRF values obtained at the -th iteration step. ABPC is an indicator of how good
the XAI method is. The larger the ABPC value, the better the XAI method. LeRFs and ABPCs
are reported also for channels and bands analysis.</p>
        <p>c) an analysis of the discriminative power of each component alone is made. Signals composed
of only one component following the relevance order given by the XAI method are fed to the
ML system in an iterative manner, and the relative performance curves are plotted.</p>
        <p>All the experiments were carried out only on correctly classified samples.</p>
      </sec>
      <sec id="sec-3-4">
        <title>2.5. Classification model</title>
        <p>The XAI methods are evaluated on a feed-forward fully connected multi layered neural networks.
Hyperparameters were tuned through bayesian optimisation [37]: the number of layers was
constrained to a maximum of 3; for each layer, the number of nodes was searched in the
space {2| ∈ {4, 5, ..., 10}} having the ReLU as activation function. Each experiment was
run having early stopping as convergence criterion with 20 epochs of patience. The 10 % of
the training set was extracted using stratified sampling [ 38] on class labels and considered as
validation set. Network optimisation was performed using Adam optimiser [39], whose learning
rate that was searched in the space {0.1, 0.01, ..., 0.0001}.</p>
        <p>As a result from the model selection stage, the best setting consisted in ANN having 3 layers
with 128, 256 and 128 neurons respectively. The learning rate was set to 0.01, and reduced to
its 10 % whenever the loss on validation set plateaus for 10 consecutive epochs.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Results &amp; discussions</title>
      <p>Since the behaviour of the explored XAI methods resulted in being similar across all the subjects,
we report only the results obtained on just one subject. In Fig. 2, 3, and 4 MoRF and LeRF
curves using the tested XAI methods are reported for both intra-session and inter-session cases,
considering as components to remove at each step features (Fig. 2), bands (Fig. 3), and channels
(Fig. 4), respectively. Results related to the intra-session cases are reported with solid lines,
while those regarding the inter-session case are marked with dotted lines. On the  axis and
 axis are reported the iteration step in the curve generation and the accuracy level reached,
respectively. With blue lines, results scoring the input components using efective relevance are
reported; with orange lines, results scoring the components using averaged relevance computed
on training data are reported; with green lines, results related to random choice.</p>
      <p>All the curves were compared with the random curve obtained by removing the components
in random order. Several interesting points can be highlighted:</p>
      <p>1) In all the cases, LRP, IG and Deep LIFT resulted in being more reliable XAI methods with
respect to Saliency and Guided BP. Indeed, MoRF curves of LRP, IG and Deep LIFT have high
slopes, however similar to each other, diferently from Saliency and Guided BP. In particular,
the latter is the only method among those tested whose explanations do not always seem to
capture the relevant components, especially in the case of intra-session. These considerations
seem consistent with what is reported in LeRF, AOPC, and ABPC.</p>
      <p>2) counterintuitively, in almost all the cases, explanations built in inter-session cases seem to
be more reliable (i.e., highlighting more relevant features) with respect to intra-session cases.
This behaviour can be explained by a more significant "robustness" to input changes of the
trained classifier toward data from the same training session (intra-session case), where with
"robustness" we mean the ability of the classifier to give a much higher score to the chosen
class respect to the other ones. Instead, data coming from diferent sessions (inter-session case)
leads the classifier toward more borderline class scores, therefore minimum perturbations of
the input data can lead to diferent classes, influencing the final performance and the resulting
MoRF curve during the features’ removal.</p>
      <p>3) Although the best XAI methods can locate relevant features/channels/bands for each
input data sample, they don’t seem able to locate a set of relevant components for all the
samples. In other words, the examined XAI methods fail to "generalise" to a set of general
features/channels/bands relevant to the most significant part of the possible inputs. Indeed,
removing the components following the average relevance (obtained in the training stage) in
reverse order (MoRF orange curves) does not lead to a steep drop in performance, as in the
other case (MORF blue curves). Even in some cases, such as using bands as a component to
assign the relevance (Fig. 3), the obtained curves overlap with the random ones, highlighting
that removing bands in random order is almost the same that following the relevance assigned
by the XAI method. This is confirmed by the other evaluation metrics adopted, i.e. MeRF, AOPC
and ABPC curves.</p>
      <p>In Fig. 5 a first analysis of the discriminative power of the components alone is made. Signals
composed of only one component following the relevance order given by the XAI method are
iteratively fed to the ML system. We limit the analysis only to the best XAI methods identified
in the previous step: DeepLIFT, IG and LRP. From the obtained results, it is interesting to notice
that the components considered most relevant for each sample fed to the classifier are enough
to reach high performances. However, considering the average relevance detected during the
training stage, the best components do not seem to lead toward similar performance, although
they are still better than a random choice.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>In this work, the performances of several XAI methods proposed in the literature in the context
of Brain-Computer Interface (BCI) problems using EEG input-based Machine Learning (ML)
algorithms are experimentally evaluated. The focus was on how much the relevant components
selected by XAI methods be shared between diferent samples of the same dataset (in this case,
same session) or samples of diferent datasets (in this case, diferent sessions). The final results
show that the components considered most relevant for each sample fed to the classifier are
enough to achieve high performances. However, the components detected considering the best
average relevance during the training stage do not seem to lead toward performance returned
by components scored according to their efective relevance returned by the XAI method.</p>
      <p>This work is the first step toward developing a BCI system able to exploit XAI methods to
alleviate the dataset shift problem. However, in this work, only data belonging to diferent
sessions but acquired from the same subjects are taken into account. In future work, we plan to
analyse the behaviour of XAI methods with inter-subject classifiers. Several benefits can be
obtained in the EEG-based BCI applications by the proposed project. For example, a BCI system
can work across diferent subjects without retraining the model on each new unseen subject
(subject-independent model). Furthermore, a better understanding of the relationships between
the system inputs and outputs provided by XAI explanations can lead to the developing and
producing more efective EEG acquisition devices.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the European Union - FSE-REACT-EU, PON Research and Innovation
2014-2020 DM1062/2021 contract number 18-I-15350-2 and by the Ministry of University and
Research, PRIN research project "BRIO – BIAS, RISK, OPACITY in AI: design, verification and
development of Trustworthy AI.", Project no. 2020SSKZ7R .
[16] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. Müller, Layer-wise relevance
propagation: an overview, Explainable AI: interpreting, explaining and visualizing deep
learning (2019) 193–209.
[17] M. T. Ribeiro, S. Singh, C. Guestrin, " why should i trust you?" explaining the predictions
of any classifier, in: Proceedings of the 22nd ACM SIGKDD international conference on
knowledge discovery and data mining, 2016, pp. 1135–1144.
[18] A. Apicella, F. Isgro, R. Prevete, G. Tamburrini, A. Vietri, Sparse dictionaries for the
explanation of classification systems, in: PIE, 2015, p. 009.
[19] A. Apicella, S. Giugliano, F. Isgrò, R. Prevete, Exploiting auto-encoders and segmentation
methods for middle-level explanations of image classification systems, Knowledge-Based
Systems 255 (2022) 109725.
[20] W.-L. Zheng, B.-L. Lu, Investigating critical frequency bands and channels for EEG-based
emotion recognition with deep neural networks, IEEE Transactions on Autonomous
Mental Development 7 (2015) 162–175. doi:10.1109/TAMD.2015.2431497.
[21] A. Wosiak, A. Dura, Hybrid method of automated eeg signals’ selection using reversed
correlation algorithm for improved classification of emotions, Sensors 20 (2020) 7083.
[22] X. Zheng, X. Liu, Y. Zhang, L. Cui, X. Yu, A portable hci system-oriented eeg feature
extraction and channel selection for emotion recognition, International Journal of Intelligent
Systems 36 (2021) 152–176.
[23] A. Apicella, F. Isgrò, R. Prevete, A. Sorrentino, G. Tamburrini, Explaining classification
systems using sparse dictionaries, ESANN 2019 - Proceedings, 27th European Symposium
on Artificial Neural Networks, Computational Intelligence and Machine Learning (2019)
495 – 500.
[24] K. Qian, M. Danilevsky, Y. Katsis, B. Kawas, E. Oduor, L. Popa, Y. Li, Xnlp: A living survey
for xai research in natural language processing, in: 26th International Conference on
Intelligent User Interfaces-Companion, 2021, pp. 78–80.
[25] T. A. Schoonderwoerd, W. Jorritsma, M. A. Neerincx, K. Van Den Bosch, Human-centered
xai: Developing design patterns for explanations of clinical decision support systems,
International Journal of Human-Computer Studies 154 (2021) 102684.
[26] E. Laxmi Lydia, C. Anupama, N. Sharmili, Modeling of explainable artificial intelligence
with correlation-based feature selection approach for biomedical data analysis, in:
Biomedical Data Analysis and Processing Using Explainable (XAI) and Responsive Artificial
Intelligence (RAI), Springer, 2022, pp. 17–32.
[27] R. P. Selvam, A. S. Oliver, V. Mohan, N. Prakash, T. Jayasankar, Explainable artificial
intelligence with metaheuristic feature selection technique for biomedical data classification,
in: Biomedical Data Analysis and Processing Using Explainable (XAI) and Responsive
Artificial Intelligence (RAI), Springer, 2022, pp. 43–57.
[28] C. Ieracitano, N. Mammone, A. Hussain, F. C. Morabito, A novel explainable machine
learning approach for eeg-based brain-computer interface systems, Neural Computing
and Applications 34 (2022) 11347–11360.
[29] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in:</p>
      <p>European conference on computer vision, Springer, 2014, pp. 818–833.
[30] P. Rathod, S. Naik, Review on epilepsy detection with explainable artificial intelligence, in:
2022 10th International Conference on Emerging Trends in Engineering and
TechnologySignal and Information Processing (ICETET-SIP-22), IEEE, 2022, pp. 1–6.
[31] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising
image classification models and saliency maps, arXiv preprint arXiv:1312.6034 (2013).
[32] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all
convolutional net, arXiv preprint arXiv:1412.6806 (2014).
[33] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On pixel-wise
explanations for non-linear classifier decisions by layer-wise relevance propagation, PloS
one 10 (2015) e0130140.
[34] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: International
conference on machine learning, PMLR, 2017, pp. 3319–3328.
[35] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating
activation diferences, in: International conference on machine learning, PMLR, 2017, pp.
3145–3153.
[36] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, K.-R. Müller, Evaluating the visualization
of what a deep neural network has learned, IEEE transactions on neural networks and
learning systems 28 (2016) 2660–2673.
[37] J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning
algorithms, Advances in neural information processing systems 25 (2012).
[38] J. Neyman, On the two diferent aspects of the representative method: the method of
stratified sampling and the method of purposive selection, in: Breakthroughs in statistics,
Springer, 1992, pp. 123–150.
[39] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint
arXiv:1412.6980 (2014).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Apicella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          , G. Mastrati,
          <string-name>
            <given-names>N.</given-names>
            <surname>Moccaldi</surname>
          </string-name>
          ,
          <article-title>Eeg-based detection of emotional valence towards a reproducible measurement of emotions</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Apicella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frosolone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Moccaldi</surname>
          </string-name>
          ,
          <article-title>High-wearable eeg-based distraction detection in motor rehabilitation</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Criscuolo</surname>
          </string-name>
          , E. De Benedetto,
          <string-name>
            <given-names>N.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Duraccio</surname>
          </string-name>
          ,
          <article-title>A wearable ar-based bci for robot control in adhd treatment: Preliminary evaluation of adherence to therapy</article-title>
          ,
          <source>in: 2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Natalizio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Parvis</surname>
          </string-name>
          ,
          <article-title>How to successfully classify eeg in motor imagery bci: a metrological analysis of the state of the art</article-title>
          ,
          <source>Journal of Neural Engineering</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Marshall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Coyle</surname>
          </string-name>
          , S. Wilson, M. Callaghan, Games, gameplay, and
          <article-title>bci: the state of the art</article-title>
          ,
          <source>IEEE Transactions on Computational Intelligence and AI in Games</source>
          <volume>5</volume>
          (
          <year>2013</year>
          )
          <fpage>82</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Apicella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frosolone</surname>
          </string-name>
          , G. Improta,
          <string-name>
            <given-names>N.</given-names>
            <surname>Moccaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pollastro</surname>
          </string-name>
          ,
          <article-title>Eeg-based measurement system for monitoring student engagement in learning 4.0</article-title>
          ,
          <string-name>
            <surname>Scientific</surname>
            <given-names>Reports</given-names>
          </string-name>
          12 (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F. R.</given-names>
            <surname>Mashrur</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Rahman</surname>
            ,
            <given-names>M. T. I. Miya</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vaidyanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Anwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Mamun</surname>
          </string-name>
          ,
          <article-title>Bci-based consumers' choice prediction from eeg signals: An intelligent neuromarketing framework</article-title>
          ,
          <source>Frontiers in Human Neuroscience</source>
          <volume>16</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Subha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Joseph</surname>
          </string-name>
          , R. Acharya U,
          <string-name>
            <surname>C. M. Lim</surname>
          </string-name>
          , et al.,
          <article-title>Eeg signal analysis: a survey</article-title>
          ,
          <source>Journal of medical systems 34</source>
          (
          <year>2010</year>
          )
          <fpage>195</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Callegaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cultrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortolano</surname>
          </string-name>
          ,
          <article-title>Metrological characterization of a low-cost electroencephalograph for wearable neural interfaces in industry 4.0 applications</article-title>
          , in: 2021
          <source>IEEE International Workshop on Metrology for Industry 4</source>
          .0 &amp;
          <string-name>
            <surname>IoT (MetroInd4</surname>
          </string-name>
          . 0&amp;IoT), IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Casson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Duncan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodriguez-Villegas</surname>
          </string-name>
          ,
          <article-title>Wearable electroencephalography</article-title>
          ,
          <source>IEEE engineering in medicine and biology magazine 29</source>
          (
          <year>2010</year>
          )
          <fpage>44</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Apicella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          , E. De Benedetto,
          <string-name>
            <given-names>N.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Duraccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Giugliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prevete</surname>
          </string-name>
          ,
          <article-title>Enhancement of ssveps classification in bci-based wearable instrumentation through machine learning techniques</article-title>
          ,
          <source>IEEE Sensors Journal</source>
          <volume>22</volume>
          (
          <year>2022</year>
          )
          <fpage>9087</fpage>
          -
          <lpage>9094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Apicella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Arpaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Isgrò</surname>
          </string-name>
          , G. Mastrati,
          <string-name>
            <given-names>N.</given-names>
            <surname>Moccaldi</surname>
          </string-name>
          ,
          <article-title>A survey on eeg-based solutions for emotion recognition with a low number of channels</article-title>
          , IEEE Access (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          . doi:
          <volume>10</volume>
          . 1109/ACCESS.
          <year>2022</year>
          .
          <volume>3219844</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Fingelkurts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Fingelkurts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Borisov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Darkhovsky</surname>
          </string-name>
          ,
          <article-title>Nonstationary nature of the brain activity as revealed by eeg/meg: methodological, practical and conceptual challenges</article-title>
          ,
          <source>Signal processing 85</source>
          (
          <year>2005</year>
          )
          <fpage>2190</fpage>
          -
          <lpage>2212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Quinonero-Candela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schwaighofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          ,
          <article-title>Dataset shift in machine learning</article-title>
          , Mit Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>A. A.</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. F.</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          R.,
          <string-name>
            <surname>T. G.</surname>
          </string-name>
          ,
          <article-title>Contrastive explanations to classification systems using sparse dictionaries</article-title>
          ,
          <source>Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11751 LNCS</source>
          (
          <year>2019</year>
          )
          <fpage>207</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>