<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Subject Agnostic Afective Emotion Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amit Kumar Jaiswal</string-name>
          <email>a.jaiswal@surrey.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haiming Liu</string-name>
          <email>h.liu@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prayag Tiwari</string-name>
          <email>prayag.tiwari@ieee.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Technology, Halmstad University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Southampton</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Surrey</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper focuses on afective emotion recognition, aiming to perform in the subject-agnostic paradigm based on EEG signals. However, EEG signals manifest subject instability in subject-agnostic afective Brain-computer interfaces (aBCIs), which led to the problem of distributional shift. Furthermore, this problem is alleviated by approaches such as domain generalisation and domain adaptation. Typically, methods based on domain adaptation confer comparatively better results than the domain generalisation methods but demand more computational resources given new subjects. We propose a novel framework, meta-learning based augmented domain adaptation for subject-agnostic aBCIs. Our domain adaptation approach is augmented through meta-learning, which consists of a recurrent neural network, a classifier, and a distributional shift controller based on a sum-decomposable function. Also, we present that a neural network explicating a sum-decomposable function can efectively estimate the divergence between varied domains. The network setting for augmented domain adaptation follows meta-learning and adversarial learning, where the controller promptly adapts to new domains employing the target data via a few self-adaptation steps in the test phase. Our proposed approach is shown to be efective in experiments on a public aBICs dataset and achieves similar performance to state-of-the-art domain adaptation methods while avoiding the use of additional computational resources.</p>
      </abstract>
      <kwd-group>
        <kwd>Emotion recognition</kwd>
        <kwd>EEG</kwd>
        <kwd>Domain adaptation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The human emotional experience and the understanding of its intricate interplay can be a
challenging task. Recent technological advancements in brain-computer interaction systems,
specifically afective Brain-computer interfaces, have enabled the automatic identification of
user emotions and facilitated a more humanised mode of interaction [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The
electroencephalography (EEG) signal, in particular, has been utilised in subject-dependent emotion
models to recognise user emotions, with both the training and test data derived from a single
subject [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Despite the potential of EEG signals in aBCIs, their application is limited by the
non-stationary nature of these signals and the structural variability among diferent subjects
impose significant challenges in the development of subject-independent models. These models
are often following the assumption of independent and identically distributed samples 1 and
typically exhibit poor generalisation performance in practical aBCI applications due to the
problem of domain shift [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9</xref>
        ]. To address the problem of domain shift in subject-agnostic
EEG-based emotion recognition, an emergent approach of Domain adaptation can be
leveraged, which utilises data from both the source and target domains to enhance the adaptation
performance. A key domain adaptation technique involves mapping the two distributions to a
shared feature space, where they have identical marginal distributions. Despite its considerable
success in subject-independent EEG-based emotion recognition [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 10, 9</xref>
        ], domain adaptation
approaches can be computationally intensive and time-consuming, which poses a vexing
challenge leading to suboptimal user experiences in real-world applications. To address this issue,
the notion of domain generalisation has emerged, particularly in scenarios in which multiple
source domains are accessible with a lack of unlabelled target samples. The subject-agnostic
emotion recognition models can be constructed using domain generalisation techniques [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
Nonetheless, given that there is no prior knowledge pertaining to the target domain during
training, it becomes arduous for domain generalisation to achieve performance on par with that
domain adaptation. A potential approach is to leverage adaptive subspace feature matching
(ASFM), which pre-trains the primary model and utilises a limited number of test samples
to adjust eficiently [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. While the ASFM approach can evade the time-consuming nature
of adaptation, most of them necessitate the retention of both source and target domains in
the test phase, which leads to additional storage requirements and reduces portability [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Nevertheless, in real-world applications, the ability of an EEG-based afective model to rapidly
adapt to diferent subjects while maintaining its portability is crucial.
      </p>
      <p>
        This paper presents a novel framework, namely, meta-learning based augmented domain
adaptation (MeLaDA) for subject-agnostic EEG-based emotion recognition. Unlike the traditional
adaptive subspace feature matching (ASFM), MeLaDA only demands the target domain during
the test phase. Therefore, MeLaDA can generate predictions more rapidly compared to domain
adaptation and ASFM. Based on the viewpoint of real-world applications, MeLaDA is better
suited to constructing emotion models for subject-agnostic aBCIs. The proposed meta-learning
based augmented domain adaptation (MeLaDA) framework is implemented by formulating the
equivalence of a network with a sum-decomposable structure to domain discrepancy metrics
utilised in classical domain adaptation techniques such as maximum mean discrepancy [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]
or ℋ-divergence [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. Using our formulation, we present the MeLaDA framework, which
incorporates a classifier, a feature extractor, and a sum-decomposable structure termed domain
shift regulator. By leveraging the benefits of adversarial learning and meta-learning, the
regulator facilitates the MeLaDA model’s rapid generalisation to new domains by using the target data
through a few self-adaptive steps during the test phase. The key contributions of our approach
are three-fold:
1. MeLaDA derives a pertinent approach to develop a subject-agnostic EEG-based emotion
recognition model and a way to incorporate any type of domain discrepancy explicated by a
sum-decomposable network.
2. Our proposed framework is constructed to be portable and able to quickly adapt to various
1https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables
      </p>
      <p>subjects for EEG-driven emotion recognition.
3. We carried out extensive experiments on the publicly available EEG-based aBCIs dataset,
SEED2. The results of the experiments indicate that our proposed approach outperforms
domain generalisation methods. Moreover, our proposed approach, MeLaDA, exhibits
comparable time and storage costs to domain generalisation methods.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>
        The key motivation behind our MeLaDA framework is to simplify the estimation of domain shift
by leveraging a basic network that only requires the target domain as input. This is in contrast
to traditional domain adaptation approaches which compare the target domain with a specific
source domain, requiring additional storage space for source data and complex methods such
as generative adversarial network (GAN) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] to represent domain shift during the test phase.
These limitations make the practical application of domain adaptation approaches dificult
in EEG-driven emotion recognition. However, in a multi-source scenario, we demonstrate
that minimising the discrepancy between all pairwise domains is equivalent to minimising the
discrepancy between each domain and an implicit domain. Moreover, we prove that any domain
shift metrics can be represented theoretically by a network with a sum-decomposition form.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Prior Work</title>
      <p>
        EEG-based Emotion Recognition: The inherent non-stationarity of EEG signals and
variability across individuals, developing a subject-independent model for EEG-based emotion
recognition using conventional machine learning methods is challenging. Recently, attention has been
directed towards afective brain-computer interfaces [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which explicated the concept of aBCIs
by integrating afective factors into traditional brain-computer interfaces [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Subsequently,
there has been a focus on the application of aBCIs in EEG-based emotion recognition [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] which
involves 15 participants who watched selected Chinese movie clips to elicit three emotions i.e.,
happy, neutral, and sad. They curated an EEG emotion recognition dataset called SEED, in which
they recorded the EEG signals of the participants. Building upon the SEED dataset, researchers
have made significant advancements in developing models for EEG-based emotion recognition,
particularly in the context of subject-dependent models. To address this issue, researchers
have turned their attention to domain adaptation and domain generalisation techniques for
subject-independent EEG-based emotion recognition. Domain adaptation approaches primarily
focus on reducing domain shift by minimising discrepancies between diferent domains using
established metrics such as maximum mean discrepancy (MMD) [
        <xref ref-type="bibr" rid="ref15 ref16 ref22">15, 16, 22</xref>
        ], the
KullbackLeibler divergence [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], and ℋ-divergence [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Existing work [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] among varied domains that
employed transfer component analysis [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to minimise MMD [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] by constructing a kernel
matrix, which led to the successful development of personalised EEG-based emotion models.
Domain Adaptation and Domain Generalisation: Adversarial domain adaptation methods
      </p>
      <sec id="sec-3-1">
        <title>2https://bcmi.sjtu.edu.cn/home/seed/seed.html</title>
        <p>
          have gained significant attention and emerged as successful approaches across various
application [
          <xref ref-type="bibr" rid="ref25 ref26 ref27">25, 26, 27</xref>
          ]. These methods draw inspiration from the concept of GAN, which involves
adversarial training to align the generated distribution with the real distribution. In the field of
aBCIs, researchers have also embraced adversarial domain adaptation approaches with
successful outcomes. For instance, the usage of domain-adversarial neural networks (DANN) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] for
EEG-based emotion recognition, in subject-independent models [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Furthermore, the adoption
of Wasserstein GAN [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] for domain adaptation (WGAN-DA) has been successfully utilised
for facilitating subject-independent emotion recognition models [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. From the viewpoint of
practical scenarios pertaining to aBCIs, each subject represents an individual domain. Domain
adaptation (DA) approaches, although computationally intensive for new domains, have been
commonly employed in aBCIs. However, domain generalisation techniques, which generalise
to unseen target domains without requiring additional target domain data, have gained traction
in aBCIs [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. The domain residual network (DResNet) [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] extends the structure of DANN [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
for subject-independent EEG-based vigilance estimation and emotion recognition, showcasing
improved generalisation ability without target domain data. While domain adaptation
methods often yield better results than domain generalisation techniques in aBCIs, an alternative
approach called ASFM [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] has been adopted, which has been integrated into an EEG-based
emotion recognition setting, referred to as Plug-and-Play domain adaptation framework [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
Meta-learning: The notion of meta-learning is to learn functional prior knowledge and
involves episode-level learning [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], has gained significant traction, particularly in the context
of domain generalisation. Meta-learning for domain generalisation (MLDG) [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] presented
the first meta-learning strategy to domain generalisation. Subsequently, MetaReg [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] and
Feature-Critic [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] were proposed to enhance the generalisation capability of the model by
incorporating auxiliary losses during training. Unlike previous domain generalisation approaches
that design specific models, meta-learning-based schemes focus on a model-agnostic training
strategy that exposes the model to domain shifts during training.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Problem Formulation</title>
      <p>We describe the key aspects of our problem that encompass a few components for our framework
settings. The input space for EEG data is represented by ℰ , and the output space is represented
by ℰ . A domain  is described as a joint distribution ℙℰ ℰ over the space ℰ × ℰ . As the
distribution is subject to change due to various factors, we assume that it follows a distribution  .
However, it should be noted that domains are not directly observable, and we can only observe
samples   of domains, where each   refers to a set of {ℰ  , ℰ  }. The presence of inconsistency
between domains may lead to a suboptimal generalisation capability. In order to address this
issue, one approach is to employ a functional mapping  that transforms one domain into
another while minimising the divergence between the domains. The selection of a divergence
loss function (⋅, ⋅) is typically necessary, as it considers the marginal or joint distribution. The
ultimate selection of the optimal  is determined by minimising the Equation 1.</p>
      <p>= arg min ((
 ), (  ))
(1)
space, thereby ensuring that the model trained on ()
issue. This approach is known as alignment.</p>
      <sec id="sec-4-1">
        <title>4.1. Shift-Independent Domain</title>
        <p>The utilisation of a functional   enables the transfer of various domains to a shared feature
does not encounter any domain shift
In the context of multi-source domain adaptation or domain generalisation, a common
approach to incorporate domain adaptation methods involves the concurrent minimisation of the
divergence between each pair of source domains, as expressed by Equation 2.
 
= arg min</p>
        <p>∑
 ,  ∈ shift

((</p>
        <p>), (  ))converges to zero, resulting in a convergence of all domains to a homogeneous
state. This alternative domain, characterised by the absence of variations, is referred to as
shift-independent domain .</p>
        <p>Definition 4.1.
∑≠ ((
 ), (  )) → 0</p>
        <p>A shift-independent domain  shift is any (  ) asymptotically provided
Theorem 4.1. Given the asymptotic behaviour, the overall variation ∑ ,  ∈ shift

optimisation is identical to a loss function optimisation ∑  ∈ shift
ℒ(  ), where
((
.
the dissimilarity among each individual domain and the shift-independent domain.
the following sections, we will elaborate on the construction of this network.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Sum Decomposable Component</title>
        <p>
          Permutation-invariant constraints are an essential criterion for a function to capture domain
discrepancy, which implies that the arrangement or order of the source domain data should not
impact the output. Extensive research has been conducted on this property in prior studies [
          <xref ref-type="bibr" rid="ref39 ref40">39,
40</xref>
          ]. Typically, summation is commonly employed to enforce permutation invariance, leading
to the concept of sum-decomposition. Definition 4.2 provides a formal definition of
sumdecomposable.
        </p>
        <p>Definition 4.2. A function  is said to possess sum-decomposability through ℝ if there exist two
functions,  ∶ ℝ → ℝ  and  ∶ ℝ  → ℝ, provided  ( ) can be expressed as ( ∑∈  ( )) .
Theorem 4.2. A continuous map   ∶ ℝ → ℝ exhibits permutation invariance if and only if it
can be represented as a continuous sum-decomposition through ℝ .</p>
        <p>Proposition 4.3. The established measures of domain discrepancy (or domain variation), such as
Maximum Mean Discrepancy (MMD) or ℋ-divergence, can be derived in an equivalent manner
using a function that possesses the property of sum-decomposability.</p>
        <p>
          The detailed proof of Theorem 4.2 can be referred in a prior work [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] conducted on arbitrary
functions representation on sets. It is worth noting that the introduction of a summation layer or
averaging layer allows for straightforward enforcement of the permutation-invariant property.
Theorem 4.2 indicates that a sum-decomposable network, operating within a latent space
of adequate dimensionality is capable of efectively representing any permutation-invariant
function, including (⋅,  shift ) as delineated in Theorem 4.1.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. The Proposed Method</title>
      <p>
        In accordance with the theoretical framework, our proposed approach, referred to as MeLaDA,
integrates a sum-decomposable domain shift controller with the temporal multi-layer perceptron
(MLP) network. The architecture, depicted in Figure 2, consists of a feature extractor  () and
a classifier () as constituent components of the temporal MLP network. The domain shift
controller   () utilises the features extracted by  () to assess the dissimilarity between
the current domain and the shift-independent domain. When applying this network for the
classification of target domain data,   propagates forward to compute the domain shift and
subsequently propagates backward to fine-tune the feature extractor  . This behaviour resembles
the actions of an intelligent controller who dynamically adjusts the network based on its
performance in generating shift-independent features. Under the guidance of   , the entire
network exhibits the ability to generalise to unseen domains through meta-learning augmented
domain adaptation. By leveraging the trained controller, the feature extractor efectively
mitigates the domain shift present in the data from individual subjects. Consequently, data
samples with identical emotion labels originating from diverse domains exhibit a comparable
distribution within the shared space. The subsequent sections will outline the design principles
for the domain shift controller and provide an overview of our training strategy.
5.1. Model
We describe our proposed modelling approach based on the aforementioned components.
In order to satisfy the permutation-invariance requirement of the domain shift controller, a
straightforward approach is to incorporate a summation layer into a neural network, similar to
the method employed by Feature-Critic [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] networks. The Feature-Critic (FC) network appends
a summation layer to the end of a multi-layer perceptron, disregarding the external mapping 
defined in Definition
      </p>
      <p>
        4.2. Consequently, it may not fully embody the characteristics of a domain
shift controller. Furthermore, our experimental findings indicate that introducing adversarial
elements to the network can enhance its capabilities. Specifically, we incorporate two gradient
reversal layers (GRL) [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] before and after a two-layer MLP, followed by an additional layer to
further augment its performance. The rationale behind incorporating GRL into the network
draws inspiration from traditional methods used to represent domain discrepancy, such as MMD
or adversarial-based approaches [
        <xref ref-type="bibr" rid="ref25 ref27">25, 27</xref>
        ]. These methods share a common principle, which
entails utilising the “largest” diference between two domains to depict their divergence. To
imbue our network with the same capacity for simulating domain shift as these traditional
methods, we introduce a novel divergence measure, akin to MMD and ℋ-divergence, that we
term maximum mean norm discrepancy ( MND).
inputs a new domain, where the long short term memory (LSTM) as a feature extractor that initially
transforms the data into its corresponding feature representation. Subsequently, the controller examines
the disparity between the new domain and the shift-independent domain, ultimately computing the
loss function ℒ . Guided by the controller   , the feature extractor promptly adjusts itself through a

series of adaptation steps. Furthermore, the data  is forwarded to the updated feature extractor, which
subsequently performs a forward propagation through the classifier.
      </p>
      <p>Definition 5.1.
  is characterised as</p>
      <p>Given the maximum mean norm discrepancy  MND among two domains   and
(3)
(4)
, where a function  maps  into a vector space.</p>
      <p>Based on Theorem 4.1, the selection of an implicit domain as the shift-independent domain is a
viable approach to circumvent the need for direct domain comparison. The implicit domain
choice aims to minimise the overall divergence, facilitating eficient optimisation. Consequently,
our proposed objective function encompasses both minimisation and maximisation components,
as indicated by Equation 4,
ℒ =</p>
      <p>∑ max
  ∈ shift ∈</p>
      <p>
        ∈mlsipnace ‖ ∈  () −  ‖2
 MND(  ,   ) ∶= m∈ ax ‖ ∈  () −  ∈  () ‖
where the loss function of the controller’s output is represented by ℒ . Within the domain shift
controller framework, the function  corresponds to the inner mapping  in Equation 2 and is
represented by the initial layers of the network. The subsequent summation layer calculates
the mean value of () , while the final layer of the domain shift controller computes the norm
of the diference. To address the maximisation and minimisation objectives, we employ the
method proposed by GAN [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which involves the inclusion of gradient reversal layers in the
controller network. During forward propagation, the GRL operates as an identity map, but
during backward propagation, it reverses the direction of gradients. In contrast to the domain
adaptation regulariser utilised in domain-adversarial networks [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], our domain shift controller
incorporates an additional minimisation task. Consequently, two GRL layers are employed, the
left GRL facilitates the controller into identifying the most significant domain diference, while
the right GRL adjusts the feature extractor to generate consistent features across varied domains.
It is worth noting that the two gradient reversal layers structure may exhibit instability during
experiments. To mitigate this, we employ a strategy where we freeze a portion of the network
between the two GRL layers once a predefined iteration threshold is reached, ensuring stability
in the training process.
      </p>
      <sec id="sec-5-1">
        <title>5.2. Meta-learning based Training Strategy</title>
        <p>
          In this section, we present a meta-learning based strategy for the parameter learning process.
In order to ensure the domain shift controller’s ability for generalisation, we formulate the
algorithmic approach for training settings. The overall algorithm can be divided into two
distinct components, the training of the domain shift controller and the training of the network.
These components are delineated in Algorithm 1 and Algorithm 2, respectively.
Training Procedure for the Domain Shift Controller: In order to augment the model’s
capability for generalisation, we adopt a two-fold approach that involves optimising the output
ℒ of the domain shift controller to mitigate domain shift, while simultaneously incorporating
meta-learning to facilitate the generalisation process. Initially, the available domains are
randomly divided into meta-train domains denoted as  train and meta-test domains, alternatively
referred to as meta-validation domains, denoted as  valid. The controller is leveraged to optimise
the feature extractor  () specifically on the meta-train domains. Subsequently, we assess
the eficacy of the optimised feature extractor  ( ′) on the meta-test domains, evaluating
its performance and ability to generalise beyond the trained domains. After updating the
parameter  to  ′, the classification loss function, represented by ℓ, undergoes a transformation
from ℓ(  ,   ; ) to ℓ(  ,   ;  ′). Inspired by the approach developed in Feature-critic networks [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ],
we establish the meta loss function as depicted in Equation 5.
        </p>
        <p>ℒmeta =</p>
        <p>∑
(  ,  )∈ valid</p>
        <p>tanh (ℓ(  ,   ;  ′) − (ℓ(  ,   ; ) )
The overall loss for updating the parameter  is expressed as depicted in Equation 6.
′
ℒ (, , ;  train) + ℒ meta( , , ;  valid)
The hyperparameter  , as well as the parameters  ,  , and  corresponding to  ,  , and  
respectively, are involved in Equation 6. Through the optimisation of Equation 6, the parameter
 of   () is ultimately updated accordingly.</p>
        <p>
          Training Procedure for the MeLaDA framework: In contrast to the training approach
employed by MetaReg [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] or Feature-Critic networks [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ], which involves training the auxiliary
network before training the task network, our proposed method MeLaDA adopts an alternative
training scheme for the domain shift controller and temporal MLP network. This is necessary
because the controller network   needs to remain functional during the test phase, requiring
continuous updates even while other parts of the network are being trained. Additionally, to
fully adhere to the principles of meta-learning, we leverage the model-agnostic meta-learning
(MAML) [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] framework to train the network, rather than directly optimising it. We treat the
domain shift controller and classification as two distinct tasks and utilise episodic training [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ]
to update their respective parameters. The dataset is divided into two subsets,  train (meta-train
domains) and  valid (meta-validation domains). The domain shift controller,   , utilises data
from  train to compute the domain shift loss, ℒ (, ;  train), as well as the classification loss,
ℒctlraasisnif. These losses are then used to update the parameters of the feature extractor,  () .
Subsequently, with the updated parameters,  ( ′), processes data from  valid, and the temporal
MLP network computes the corresponding classification loss, ℒcvlaalsisdif. The overall loss function
for optimising  () and () is defined as follows
ℒ  (, ;  train) + ℒctlraasisnif(, ) + ℒ cvlaalsisdif( ′, )
(5)
(6)
(7)
Algorithm 1 Training the Domain Shift Controller
Input: Given a domain  and  ,  ,  are parameters.
        </p>
        <p>Output: 
1:  ∶ ( train,  valid) ← 
32:: ℒ← (, ;′  −  t′raℒin)(←,;    (t r(ai)n))
4: ℒmeta(,  , ;  valid) ←</p>
        <p>↪ ∑(  ,  )∈ valid tanh (ℓ(  ,   ;  ′) − (ℓ(  ,   ; ) )
5: Update  utilising ℒ + ℒ meta
▷ Random partitioning
▷ Meta-training stage</p>
        <p>▷ Meta-training stage
▷ Meta-validation stage
▷ Optimisation</p>
        <p>
          The introduction of MeLaDA adopts an alignment-based domain adaptation perspective.
However, an alternative explanation of this method can be provided through the lens of
metalearning. Recent domain generalisation approaches [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] suggest that meta-learning involves
Algorithm 2 Training the MeLaDA framework
Input: Given a domain  and  ,  ,  are parameters.
        </p>
        <p>Output:  ,  , 
1:  ∶ ( train,  valid) ← 
2: ℒ (, ;  train) ←   ( ( ))
3: ℒctlraasisnif(, ) ← ℓ(( ( train)),  train)
54:: ℒ←cvl aalsisd′if−( ′,)ℒ←(ℓ, (; ( ( trainv)alid)),  valid)</p>
        <p>↪
6: Update , ,  utilising ℒ  + ℒcvlaalsisdif + ℒctlraasisnif
↪
▷
▷</p>
        <p>Random partitioning</p>
        <p>Meta-training stage
▷ Meta-training stage
▷ Meta-training stage
▷ Meta-validation stage
▷ Optimisation
linking various tasks by aligning their gradients in a shared direction. In our scenario, the domain
shift controller task is deliberately designed to be coupled with the “domain generalisation”
process. Consequently, when the model encounters a new domain, the optimisation objective
of adapting to this domain aligns with the optimisation objective propelled by the controller.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <sec id="sec-6-1">
        <title>6.1. Dataset and Feature Extraction</title>
        <p>
          In our evaluation of the MeLaDA framework, we employ the SEED dataset [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] which was
created for emotion recognition and aBCIs using EEG signals. This dataset encompasses EEG
signals collected from 15 subjects who were enlisted to watch carefully curated 4 minutes of
iflm clips. These clips were specifically chosen to elicit one of three distinct emotions which
are happiness, neutrality, and sadness. Each subject have been subjected to an experiment 3
times in intervals of one week. During the selection process of film clips, stringent criteria were
applied to ensure that each clip was well-edited, enabling the creation of coherent emotion
elicitation while maximising emotional significance. The EEG signals were recorded using the
ESI NeuroScan system, employing a 62-electrode headset. The sampling rate for the signals
was set at 1000 Hz. By utilising the SEED dataset, we were able to assess the performance and
efectiveness of our MeLaDA approach in the context of emotion recognition. The dataset’s
comprehensive nature and carefully designed stimuli provide a valuable resource for training,
testing, and validating algorithms and models aimed at understanding and interpreting emotions
from EEG signals. The feature extraction process follows the similar strategies by deep belief
networks for EEG-driven emotion recognition [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Given that the SEED dataset has already
undergone preprocessing, we are able to directly extract the features. Specifically, we employ the
diferential entropy feature [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ], which has been previously shown to be efective for EEG-based
emotion recognition in several studies [
          <xref ref-type="bibr" rid="ref21 ref44">21, 44</xref>
          ]. Existing work [45] have demonstrated that the
diferential entropy feature corresponds to the logarithmic spectral energy of a fixed-length
EEG sequence within a specific frequency band. To obtain the spectral energy, we apply the
short-time Fourier transform using a non-overlapping Hanning window of 1 second to the EEG
signal, considering five frequency bands, which are  ranging from 1 Hz to 3 Hz,  from 4 Hz to
7 Hz,  from 8 Hz to 13 Hz,  from 14 Hz to 30 Hz, and  from 31 Hz to 50 Hz. Subsequently,
we compute the diferential entropy feature. Considering the inherent dynamism observed in
EEG-based emotion recognition tasks, we integrate the linear dynamic system methodology
to efectively filter the diferential entropy feature. Each sample has a dimension of 310 (62
channels × 5 frequency bands). Since the EEG data consist of time series, we resample the
feature with a time-step of 15 and a 1-second overlap, resulting in 3184 samples per subject.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Parameter and Implementation Settings</title>
        <p>
          In line with the Plug-and-play (PnP) approach [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], we have adopted the leave-one-subject-out
(LOSO) strategy to assess the generalisation capability of the MeLaDA framework. For each
iteration, one subject is selected as the target, while the remaining 14 subjects are used to train
our model. During the test phase, the prediction results obtained after 10 steps of self-adaptation
are utilised. The feature extractor component of MeLaDA consists of a two-layer LSTM network
with an output dimension of 256 and a time step of 15. The classifier is implemented as a
two-layer MLP with a hidden size of 100. Both the temporal MLP network and the domain shift
controller undergo optimisation using the Adam optimizer with a learning rate of 0.0002 and a
weight decay of 0.0001. The parameter  is assigned a value of 0.1. Initially, the temporal MLP
network is pre-trained until it achieves an accuracy of over 85% on the training set. Subsequently,
MeLaDA is employed to jointly train the domain shift controller and the temporal MLP network.
The threshold for freezing a portion of the controller within the gradient reversal layers is set
to 40, and the maximum number of iterations is set to 200.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Results and Discussion</title>
      <p>
        To assess the eficacy of our proposed MeLaDA framework, we employ the
leave-one-subjectout cross-validation evaluation scheme and conduct a comparative analysis between MeLaDA
and various domain adaptation and domain generalisation approaches using the SEED dataset.
The evaluation results, comprising the mean accuracy (MA) and standard deviation (SD), are
presented in Table 1. In contrast to the baseline approach, which involves aggregating data
from all source domains and training a single model using the support vector machine (SVM),
all the evaluated methods exhibit a significant improvement in accuracy of at least 13%.
Notably, MeLaDA surpasses all domain generalisation methods in terms of performance. When
compared to the domain adaptation methods, MeLaDA still achieves commendable results.
Although WGAN-DA [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] and Plug-and-Play method [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] exhibit marginally higher accuracy than
MeLaDA. It should be noted that WGAN-DA requires all source domains and the Plug-and-Play
method necessitates the utilisation of a subset of domains for adaptation, thereby limiting the
fast generalisation capability of PnP method. It is important to note that our proposed method,
being an implementation of a meta-learning strategy, achieves a superior accuracy compared to
MLDG [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] and Feature-Critic [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] by approximately 7% and 6% respectively. These results
indicate that MLDG, which directly employs episodic training to generalise the model is insuficient
in efectively addressing the subject variability inherent in EEG-based emotion recognition.
Similarly, the Feature-Critic network, despite utilising a sum-decomposable MLP to simulate
domain shift during the training phase, does not lead to a significant improvement in the results.
This suggests that the application of a domain shift controller during the testing phase proves
Models
SVM [46]
TCA [46]
TPT [46]
DAN [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
      </p>
      <p>
        DANN [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
WGAN-DA [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]
MeLaDA (Ours)
MA
0.567
0.640
0.752
0.838
0.792
0.871
0.864
to be beneficial. As an augmented domain adaptation method, MeLaDA demonstrates the
capability to predict any target set with only a few steps of self-adaptation, thereby leveraging
the advantages of both domain adaptation and domain generalisation. Also, we examine the
domain shift controller performance by measuring the accuracy of self-adaptation during the
test phase. Figure 3 demonstrates the self-adaptation capabilities of MeLaDA when confronted
with a new domain. Remarkably, the model achieves favourable performance within a limited
number of self-adaptation steps. The left subplot illustrates the adaptation performance on the
target domain during the training phase, where each self-adaptation process relies solely on
the input data for prediction without requiring any additional information. The right subplot
compares the performance of the controller with and without the GRL. This indicates that the
inclusion of the adversarial strategy enhances the stability and eficiency of the domain shift
controller, as evidenced by reduced fluctuations and improved overall performance, whereas
the absence of GRL may result in fluctuations or performance deterioration.
      </p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>In this study, we present an augmented domain adaptation approach, MeLaDA for dealing
with a subject-agnostic model for EEG-based emotion recognition without the need for source
domain data in the test phase. Our proposed approach adopted a sum-decomposable domain
shift controller to facilitate augmented domain adaptation. By integrating adversarial learning
and meta-learning techniques, MeLaDA demonstrates the ability to generalise to new domains
with minimal self-adaptive iterations. Experimental results conducted on the SEED dataset
showcase the superiority of MeLaDA over traditional domain generalisation methods in terms
of performance. This highlights the suitability of MeLaDA for constructing subject-agnostic
afective models, surpassing conventional domain adaptation, domain generalisation, and ASFM
methods.
tion from eeg, IEEE Transactions on Afective Computing 10 (2017) 417–429.
[45] L.-C. Shi, Y.-Y. Jiao, B.-L. Lu, Diferential entropy feature for eeg-based vigilance estimation,
in: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), IEEE, 2013, pp. 6627–6630.
[46] W.-L. Zheng, B.-L. Lu, Personalizing eeg-based afective models with transfer learning,
in: Proceedings of the twenty-fith international joint conference on artificial intelligence,
2016, pp. 2732–2738.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mühl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Allison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nijholt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chanel</surname>
          </string-name>
          ,
          <article-title>A survey of afective brain computer interfaces: principles, state-of-the-art, and challenges, Brain-Computer Interfaces 1 (</article-title>
          <year>2014</year>
          )
          <fpage>66</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. M. Shanechi</surname>
          </string-name>
          ,
          <article-title>Brain-machine interfaces from motor to mood</article-title>
          ,
          <source>Nature neuroscience 22</source>
          (
          <year>2019</year>
          )
          <fpage>1554</fpage>
          -
          <lpage>1564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jenke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Buss</surname>
          </string-name>
          ,
          <article-title>Feature extraction and selection for emotion recognition from eeg</article-title>
          ,
          <source>IEEE Transactions on Afective computing 5</source>
          (
          <year>2014</year>
          )
          <fpage>327</fpage>
          -
          <lpage>339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>A systematic review on afective computing: Emotion models</article-title>
          , databases, and recent advances,
          <source>Information Fusion</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krauledat</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          ,
          <article-title>Covariate shift adaptation by importance weighted cross validation</article-title>
          .,
          <source>Journal of Machine Learning Research</source>
          <volume>8</volume>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Samek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Meinecke</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          ,
          <article-title>Transferring subspaces between subjects in brain-computer interfacing</article-title>
          ,
          <source>IEEE Transactions on Biomedical Engineering</source>
          <volume>60</volume>
          (
          <year>2013</year>
          )
          <fpage>2289</fpage>
          -
          <lpage>2298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sussillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Stavisky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Kao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. I.</given-names>
            <surname>Ryu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V.</given-names>
            <surname>Shenoy</surname>
          </string-name>
          ,
          <article-title>Making brain-machine interfaces robust to future neural variability</article-title>
          ,
          <source>Nature communications 7</source>
          (
          <year>2016</year>
          )
          <fpage>13749</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.-P.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-P.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <article-title>Improving eeg-based emotion classification using conditional transfer learning</article-title>
          ,
          <source>Frontiers in human neuroscience 11</source>
          (
          <year>2017</year>
          )
          <fpage>334</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fdez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Guttenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Witkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pasquali</surname>
          </string-name>
          ,
          <article-title>Cross-subject eeg-based emotion recognition through neural networks with stratified normalization</article-title>
          ,
          <source>Frontiers in neuroscience 15</source>
          (
          <year>2021</year>
          )
          <fpage>626277</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sourina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Scherer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Müller-Putz</surname>
          </string-name>
          ,
          <article-title>Domain adaptation techniques for eeg-based emotion recognition: a comparative study on two public datasets</article-title>
          ,
          <source>IEEE Transactions on Cognitive and Developmental Systems</source>
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>W.-C. L. Lew</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Shylouskaya</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.
          <string-name>
            <surname>-H. Lim</surname>
            ,
            <given-names>K. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ang</surname>
            ,
            <given-names>A.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Eegbased emotion recognition using spatial-temporal representation via bi-gru</article-title>
          ,
          <source>in: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine &amp; Biology Society (EMBC)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>116</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Dagam: A domain adversarial graph attention model for subject independent eeg-based emotion recognition</article-title>
          ,
          <source>Journal of Neural Engineering</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <article-title>A fast, eficient domain adaptation technique for cross-domain electroencephalography (eeg)-based emotion recognition</article-title>
          ,
          <source>Sensors</source>
          <volume>17</volume>
          (
          <year>2017</year>
          )
          <fpage>1014</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Qian</surname>
          </string-name>
          <article-title>, Multi-source domain transfer discriminative dictionary learning modeling for electroencephalogram-based emotion recognition</article-title>
          ,
          <source>IEEE Transactions on Computational Social Systems</source>
          <volume>9</volume>
          (
          <year>2022</year>
          )
          <fpage>1604</fpage>
          -
          <lpage>1612</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. W.</given-names>
            <surname>Tsang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Kwok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Domain adaptation via transfer component analysis</article-title>
          ,
          <source>IEEE transactions on neural networks 22</source>
          (
          <year>2011</year>
          )
          <fpage>199</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. I. Jordan</surname>
          </string-name>
          ,
          <article-title>Deep transfer learning with joint adaptation networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2208</fpage>
          -
          <lpage>2217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ben-David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blitzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Crammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kulesza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Vaughan</surname>
          </string-name>
          ,
          <article-title>A theory of learning from diferent domains</article-title>
          ,
          <source>Machine learning 79</source>
          (
          <year>2010</year>
          )
          <fpage>151</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T. Des</given-names>
            <surname>Combes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Gordon,
          <article-title>On learning invariant representations for domain adaptation</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>7523</fpage>
          -
          <lpage>7532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Generative adversarial networks</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>139</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T. O.</given-names>
            <surname>Zander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jatzev</surname>
          </string-name>
          ,
          <article-title>Context-aware brain-computer interfaces: exploring the information space of user, technical system and environment</article-title>
          ,
          <source>Journal of Neural Engineering</source>
          <volume>9</volume>
          (
          <year>2011</year>
          )
          <fpage>016003</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>W.-L. Zheng</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Investigating critical frequency bands and channels for eeg-based emotion recognition with deep neural networks</article-title>
          ,
          <source>IEEE Transactions on autonomous mental development 7</source>
          (
          <year>2015</year>
          )
          <fpage>162</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Visual domain adaptation with manifold embedded distribution alignment</article-title>
          ,
          <source>in: Proceedings of the 26th ACM international conference on Multimedia</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>402</fpage>
          -
          <lpage>410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , X. Cheng, P. Luo,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Supervised representation learning: Transfer learning with deep autoencoders</article-title>
          , in: Twenty-fourth
          <source>international joint conference on artificial intelligence</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gretton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Borgwardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rasch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smola</surname>
          </string-name>
          ,
          <article-title>A kernel method for the two-sample-</article-title>
          <string-name>
            <surname>problem</surname>
          </string-name>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>19</volume>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ganin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          ,
          <article-title>Unsupervised domain adaptation by backpropagation</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1180</fpage>
          -
          <lpage>1189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tzeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          , T. Darrell,
          <article-title>Adversarial discriminative domain adaptation</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>7167</fpage>
          -
          <lpage>7176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Wasserstein distance guided representation learning for domain adaptation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ganin</surname>
          </string-name>
          , E. Ustinova,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ajakan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Germain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Laviolette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marchand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          ,
          <article-title>Domain-adversarial training of neural networks</article-title>
          ,
          <source>The journal of machine learning research 17</source>
          (
          <year>2016</year>
          )
          <fpage>2096</fpage>
          -
          <lpage>2030</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-M.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Cross-subject emotion recognition using deep adaptation networks</article-title>
          ,
          <source>in: Neural Information Processing: 25th International Conference, ICONIP</source>
          <year>2018</year>
          ,
          <string-name>
            <given-names>Siem</given-names>
            <surname>Reap</surname>
          </string-name>
          , Cambodia,
          <source>December 13-16</source>
          ,
          <year>2018</year>
          , Proceedings, Part V 25, Springer,
          <year>2018</year>
          , pp.
          <fpage>403</fpage>
          -
          <lpage>413</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arjovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          , L. Bottou,
          <article-title>Wasserstein generative adversarial networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          , S.-Y. Zhang, W.-L. Zheng,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Wgan domain adaptation for eeg-based emotion recognition</article-title>
          , in: L. Cheng,
          <string-name>
            <given-names>A. C. S.</given-names>
            <surname>Leung</surname>
          </string-name>
          , S. Ozawa (Eds.),
          <source>Neural Information Processing</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lan</surname>
          </string-name>
          , C. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Generalizing to unseen domains: A survey on domain generalization</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>B.-Q.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Reducing the subject variability of eeg signals with adversarial domain generalization</article-title>
          ,
          <source>in: Neural Information Processing: 26th International Conference, ICONIP</source>
          <year>2019</year>
          ,
          <article-title>Sydney</article-title>
          ,
          <string-name>
            <surname>NSW</surname>
          </string-name>
          , Australia,
          <source>December 12-15</source>
          ,
          <year>2019</year>
          , Proceedings,
          <source>Part I 26</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>L.-M. Zhao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Plug-</surname>
          </string-name>
          and
          <article-title>-play domain adaptation for cross-subject eeg-based emotion recognition</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>35</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>863</fpage>
          -
          <lpage>870</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <article-title>Model-agnostic meta-learning for fast adaptation of deep networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1126</fpage>
          -
          <lpage>1135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <article-title>Learning to generalize: Meta-learning for domain generalization</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chellappa</surname>
          </string-name>
          , Metareg:
          <article-title>Towards domain generalization using meta-regularization</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>31</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , T. Hospedales,
          <article-title>Feature-critic networks for heterogeneous domain generalization</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3915</fpage>
          -
          <lpage>3924</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaheer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kottur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravanbakhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Poczos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Smola</surname>
          </string-name>
          , Deep sets,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Guibas</surname>
          </string-name>
          , Pointnet++:
          <article-title>Deep hierarchical feature learning on point sets in a metric space</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>E.</given-names>
            <surname>Wagstaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Engelcke</surname>
          </string-name>
          , I. Posner,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <article-title>On the limitations of representing functions on sets</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6487</fpage>
          -
          <lpage>6494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Y.-
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <article-title>Episodic training for domain generalization</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1446</fpage>
          -
          <lpage>1455</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>R.-N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Diferential entropy feature for eeg-based emotion classification</article-title>
          ,
          <source>in: 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>W.-L. Zheng</surname>
            ,
            <given-names>J.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Identifying stable patterns over time for emotion recogni-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>