1. Introduction

Towards Subject Agnostic Afective Emotion Recognition

Amit Kumar Jaiswal

a.jaiswal@surrey.ac.uk 2

Haiming Liu

h.liu@soton.ac.uk 1

Prayag Tiwari

prayag.tiwari@ieee.org 0 0 School of Information Technology, Halmstad University , Sweden 1 University of Southampton , United Kingdom 2 University of Surrey , United Kingdom

This paper focuses on afective emotion recognition, aiming to perform in the subject-agnostic paradigm based on EEG signals. However, EEG signals manifest subject instability in subject-agnostic afective Brain-computer interfaces (aBCIs), which led to the problem of distributional shift. Furthermore, this problem is alleviated by approaches such as domain generalisation and domain adaptation. Typically, methods based on domain adaptation confer comparatively better results than the domain generalisation methods but demand more computational resources given new subjects. We propose a novel framework, meta-learning based augmented domain adaptation for subject-agnostic aBCIs. Our domain adaptation approach is augmented through meta-learning, which consists of a recurrent neural network, a classifier, and a distributional shift controller based on a sum-decomposable function. Also, we present that a neural network explicating a sum-decomposable function can efectively estimate the divergence between varied domains. The network setting for augmented domain adaptation follows meta-learning and adversarial learning, where the controller promptly adapts to new domains employing the target data via a few self-adaptation steps in the test phase. Our proposed approach is shown to be efective in experiments on a public aBICs dataset and achieves similar performance to state-of-the-art domain adaptation methods while avoiding the use of additional computational resources.

Emotion recognition EEG Domain adaptation

1. Introduction

The human emotional experience and the understanding of its intricate interplay can be a challenging task. Recent technological advancements in brain-computer interaction systems, specifically afective Brain-computer interfaces, have enabled the automatic identification of user emotions and facilitated a more humanised mode of interaction [ 1, 2 ]. The electroencephalography (EEG) signal, in particular, has been utilised in subject-dependent emotion models to recognise user emotions, with both the training and test data derived from a single subject [ 3, 4 ]. Despite the potential of EEG signals in aBCIs, their application is limited by the non-stationary nature of these signals and the structural variability among diferent subjects impose significant challenges in the development of subject-independent models. These models are often following the assumption of independent and identically distributed samples 1 and typically exhibit poor generalisation performance in practical aBCI applications due to the problem of domain shift [ 5, 6, 7, 8, 9 ]. To address the problem of domain shift in subject-agnostic EEG-based emotion recognition, an emergent approach of Domain adaptation can be leveraged, which utilises data from both the source and target domains to enhance the adaptation performance. A key domain adaptation technique involves mapping the two distributions to a shared feature space, where they have identical marginal distributions. Despite its considerable success in subject-independent EEG-based emotion recognition [ 8, 10, 9 ], domain adaptation approaches can be computationally intensive and time-consuming, which poses a vexing challenge leading to suboptimal user experiences in real-world applications. To address this issue, the notion of domain generalisation has emerged, particularly in scenarios in which multiple source domains are accessible with a lack of unlabelled target samples. The subject-agnostic emotion recognition models can be constructed using domain generalisation techniques [ 11, 12 ]. Nonetheless, given that there is no prior knowledge pertaining to the target domain during training, it becomes arduous for domain generalisation to achieve performance on par with that domain adaptation. A potential approach is to leverage adaptive subspace feature matching (ASFM), which pre-trains the primary model and utilises a limited number of test samples to adjust eficiently [ 13 ]. While the ASFM approach can evade the time-consuming nature of adaptation, most of them necessitate the retention of both source and target domains in the test phase, which leads to additional storage requirements and reduces portability [ 14 ]. Nevertheless, in real-world applications, the ability of an EEG-based afective model to rapidly adapt to diferent subjects while maintaining its portability is crucial.

This paper presents a novel framework, namely, meta-learning based augmented domain adaptation (MeLaDA) for subject-agnostic EEG-based emotion recognition. Unlike the traditional adaptive subspace feature matching (ASFM), MeLaDA only demands the target domain during the test phase. Therefore, MeLaDA can generate predictions more rapidly compared to domain adaptation and ASFM. Based on the viewpoint of real-world applications, MeLaDA is better suited to constructing emotion models for subject-agnostic aBCIs. The proposed meta-learning based augmented domain adaptation (MeLaDA) framework is implemented by formulating the equivalence of a network with a sum-decomposable structure to domain discrepancy metrics utilised in classical domain adaptation techniques such as maximum mean discrepancy [ 15, 16 ] or ℋ-divergence [ 17, 18 ]. Using our formulation, we present the MeLaDA framework, which incorporates a classifier, a feature extractor, and a sum-decomposable structure termed domain shift regulator. By leveraging the benefits of adversarial learning and meta-learning, the regulator facilitates the MeLaDA model’s rapid generalisation to new domains by using the target data through a few self-adaptive steps during the test phase. The key contributions of our approach are three-fold: 1. MeLaDA derives a pertinent approach to develop a subject-agnostic EEG-based emotion recognition model and a way to incorporate any type of domain discrepancy explicated by a sum-decomposable network. 2. Our proposed framework is constructed to be portable and able to quickly adapt to various 1https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables

subjects for EEG-driven emotion recognition. 3. We carried out extensive experiments on the publicly available EEG-based aBCIs dataset, SEED2. The results of the experiments indicate that our proposed approach outperforms domain generalisation methods. Moreover, our proposed approach, MeLaDA, exhibits comparable time and storage costs to domain generalisation methods.

2. Motivation

The key motivation behind our MeLaDA framework is to simplify the estimation of domain shift by leveraging a basic network that only requires the target domain as input. This is in contrast to traditional domain adaptation approaches which compare the target domain with a specific source domain, requiring additional storage space for source data and complex methods such as generative adversarial network (GAN) [ 19 ] to represent domain shift during the test phase. These limitations make the practical application of domain adaptation approaches dificult in EEG-driven emotion recognition. However, in a multi-source scenario, we demonstrate that minimising the discrepancy between all pairwise domains is equivalent to minimising the discrepancy between each domain and an implicit domain. Moreover, we prove that any domain shift metrics can be represented theoretically by a network with a sum-decomposition form.

3. Prior Work

EEG-based Emotion Recognition: The inherent non-stationarity of EEG signals and variability across individuals, developing a subject-independent model for EEG-based emotion recognition using conventional machine learning methods is challenging. Recently, attention has been directed towards afective brain-computer interfaces [ 1 ], which explicated the concept of aBCIs by integrating afective factors into traditional brain-computer interfaces [ 20 ]. Subsequently, there has been a focus on the application of aBCIs in EEG-based emotion recognition [ 21 ] which involves 15 participants who watched selected Chinese movie clips to elicit three emotions i.e., happy, neutral, and sad. They curated an EEG emotion recognition dataset called SEED, in which they recorded the EEG signals of the participants. Building upon the SEED dataset, researchers have made significant advancements in developing models for EEG-based emotion recognition, particularly in the context of subject-dependent models. To address this issue, researchers have turned their attention to domain adaptation and domain generalisation techniques for subject-independent EEG-based emotion recognition. Domain adaptation approaches primarily focus on reducing domain shift by minimising discrepancies between diferent domains using established metrics such as maximum mean discrepancy (MMD) [ 15, 16, 22 ], the KullbackLeibler divergence [ 23 ], and ℋ-divergence [ 17 ]. Existing work [ 21 ] among varied domains that employed transfer component analysis [ 15 ] to minimise MMD [ 24 ] by constructing a kernel matrix, which led to the successful development of personalised EEG-based emotion models. Domain Adaptation and Domain Generalisation: Adversarial domain adaptation methods

2https://bcmi.sjtu.edu.cn/home/seed/seed.html

have gained significant attention and emerged as successful approaches across various application [ 25, 26, 27 ]. These methods draw inspiration from the concept of GAN, which involves adversarial training to align the generated distribution with the real distribution. In the field of aBCIs, researchers have also embraced adversarial domain adaptation approaches with successful outcomes. For instance, the usage of domain-adversarial neural networks (DANN) [ 28 ] for EEG-based emotion recognition, in subject-independent models [ 29 ]. Furthermore, the adoption of Wasserstein GAN [ 30 ] for domain adaptation (WGAN-DA) has been successfully utilised for facilitating subject-independent emotion recognition models [ 31 ]. From the viewpoint of practical scenarios pertaining to aBCIs, each subject represents an individual domain. Domain adaptation (DA) approaches, although computationally intensive for new domains, have been commonly employed in aBCIs. However, domain generalisation techniques, which generalise to unseen target domains without requiring additional target domain data, have gained traction in aBCIs [ 32 ]. The domain residual network (DResNet) [ 33 ] extends the structure of DANN [ 28 ] for subject-independent EEG-based vigilance estimation and emotion recognition, showcasing improved generalisation ability without target domain data. While domain adaptation methods often yield better results than domain generalisation techniques in aBCIs, an alternative approach called ASFM [ 13 ] has been adopted, which has been integrated into an EEG-based emotion recognition setting, referred to as Plug-and-Play domain adaptation framework [ 34 ]. Meta-learning: The notion of meta-learning is to learn functional prior knowledge and involves episode-level learning [ 35 ], has gained significant traction, particularly in the context of domain generalisation. Meta-learning for domain generalisation (MLDG) [ 36 ] presented the first meta-learning strategy to domain generalisation. Subsequently, MetaReg [ 37 ] and Feature-Critic [ 38 ] were proposed to enhance the generalisation capability of the model by incorporating auxiliary losses during training. Unlike previous domain generalisation approaches that design specific models, meta-learning-based schemes focus on a model-agnostic training strategy that exposes the model to domain shifts during training.

4. Problem Formulation

We describe the key aspects of our problem that encompass a few components for our framework settings. The input space for EEG data is represented by ℰ , and the output space is represented by ℰ . A domain is described as a joint distribution ℙℰ ℰ over the space ℰ × ℰ . As the distribution is subject to change due to various factors, we assume that it follows a distribution . However, it should be noted that domains are not directly observable, and we can only observe samples of domains, where each refers to a set of {ℰ , ℰ }. The presence of inconsistency between domains may lead to a suboptimal generalisation capability. In order to address this issue, one approach is to employ a functional mapping that transforms one domain into another while minimising the divergence between the domains. The selection of a divergence loss function (⋅, ⋅) is typically necessary, as it considers the marginal or joint distribution. The ultimate selection of the optimal is determined by minimising the Equation 1.

= arg min (( ), ( )) (1) space, thereby ensuring that the model trained on () issue. This approach is known as alignment.

4.1. Shift-Independent Domain

The utilisation of a functional enables the transfer of various domains to a shared feature does not encounter any domain shift In the context of multi-source domain adaptation or domain generalisation, a common approach to incorporate domain adaptation methods involves the concurrent minimisation of the divergence between each pair of source domains, as expressed by Equation 2. = arg min

∑ , ∈ shift ((

), ( ))converges to zero, resulting in a convergence of all domains to a homogeneous state. This alternative domain, characterised by the absence of variations, is referred to as shift-independent domain .

Definition 4.1. ∑≠ (( ), ( )) → 0

A shift-independent domain shift is any ( ) asymptotically provided Theorem 4.1. Given the asymptotic behaviour, the overall variation ∑ , ∈ shift optimisation is identical to a loss function optimisation ∑ ∈ shift ℒ( ), where (( . the dissimilarity among each individual domain and the shift-independent domain. the following sections, we will elaborate on the construction of this network.

4.2. Sum Decomposable Component

Permutation-invariant constraints are an essential criterion for a function to capture domain discrepancy, which implies that the arrangement or order of the source domain data should not impact the output. Extensive research has been conducted on this property in prior studies [ 39, 40 ]. Typically, summation is commonly employed to enforce permutation invariance, leading to the concept of sum-decomposition. Definition 4.2 provides a formal definition of sumdecomposable.

Definition 4.2. A function is said to possess sum-decomposability through ℝ if there exist two functions, ∶ ℝ → ℝ and ∶ ℝ → ℝ, provided ( ) can be expressed as ( ∑∈ ( )) . Theorem 4.2. A continuous map ∶ ℝ → ℝ exhibits permutation invariance if and only if it can be represented as a continuous sum-decomposition through ℝ .

Proposition 4.3. The established measures of domain discrepancy (or domain variation), such as Maximum Mean Discrepancy (MMD) or ℋ-divergence, can be derived in an equivalent manner using a function that possesses the property of sum-decomposability.

The detailed proof of Theorem 4.2 can be referred in a prior work [ 41 ] conducted on arbitrary functions representation on sets. It is worth noting that the introduction of a summation layer or averaging layer allows for straightforward enforcement of the permutation-invariant property. Theorem 4.2 indicates that a sum-decomposable network, operating within a latent space of adequate dimensionality is capable of efectively representing any permutation-invariant function, including (⋅, shift ) as delineated in Theorem 4.1.

5. The Proposed Method

In accordance with the theoretical framework, our proposed approach, referred to as MeLaDA, integrates a sum-decomposable domain shift controller with the temporal multi-layer perceptron (MLP) network. The architecture, depicted in Figure 2, consists of a feature extractor () and a classifier () as constituent components of the temporal MLP network. The domain shift controller () utilises the features extracted by () to assess the dissimilarity between the current domain and the shift-independent domain. When applying this network for the classification of target domain data, propagates forward to compute the domain shift and subsequently propagates backward to fine-tune the feature extractor . This behaviour resembles the actions of an intelligent controller who dynamically adjusts the network based on its performance in generating shift-independent features. Under the guidance of , the entire network exhibits the ability to generalise to unseen domains through meta-learning augmented domain adaptation. By leveraging the trained controller, the feature extractor efectively mitigates the domain shift present in the data from individual subjects. Consequently, data samples with identical emotion labels originating from diverse domains exhibit a comparable distribution within the shared space. The subsequent sections will outline the design principles for the domain shift controller and provide an overview of our training strategy. 5.1. Model We describe our proposed modelling approach based on the aforementioned components. In order to satisfy the permutation-invariance requirement of the domain shift controller, a straightforward approach is to incorporate a summation layer into a neural network, similar to the method employed by Feature-Critic [ 38 ] networks. The Feature-Critic (FC) network appends a summation layer to the end of a multi-layer perceptron, disregarding the external mapping defined in Definition

4.2. Consequently, it may not fully embody the characteristics of a domain shift controller. Furthermore, our experimental findings indicate that introducing adversarial elements to the network can enhance its capabilities. Specifically, we incorporate two gradient reversal layers (GRL) [ 28 ] before and after a two-layer MLP, followed by an additional layer to further augment its performance. The rationale behind incorporating GRL into the network draws inspiration from traditional methods used to represent domain discrepancy, such as MMD or adversarial-based approaches [ 25, 27 ]. These methods share a common principle, which entails utilising the “largest” diference between two domains to depict their divergence. To imbue our network with the same capacity for simulating domain shift as these traditional methods, we introduce a novel divergence measure, akin to MMD and ℋ-divergence, that we term maximum mean norm discrepancy ( MND). inputs a new domain, where the long short term memory (LSTM) as a feature extractor that initially transforms the data into its corresponding feature representation. Subsequently, the controller examines the disparity between the new domain and the shift-independent domain, ultimately computing the loss function ℒ . Guided by the controller , the feature extractor promptly adjusts itself through a series of adaptation steps. Furthermore, the data is forwarded to the updated feature extractor, which subsequently performs a forward propagation through the classifier.

Definition 5.1. is characterised as

Given the maximum mean norm discrepancy MND among two domains and (3) (4) , where a function maps into a vector space.

Based on Theorem 4.1, the selection of an implicit domain as the shift-independent domain is a viable approach to circumvent the need for direct domain comparison. The implicit domain choice aims to minimise the overall divergence, facilitating eficient optimisation. Consequently, our proposed objective function encompasses both minimisation and maximisation components, as indicated by Equation 4, ℒ =

∑ max ∈ shift ∈

∈mlsipnace ‖ ∈ () − ‖2 MND( , ) ∶= m∈ ax ‖ ∈ () − ∈ () ‖ where the loss function of the controller’s output is represented by ℒ . Within the domain shift controller framework, the function corresponds to the inner mapping in Equation 2 and is represented by the initial layers of the network. The subsequent summation layer calculates the mean value of () , while the final layer of the domain shift controller computes the norm of the diference. To address the maximisation and minimisation objectives, we employ the method proposed by GAN [ 19 ], which involves the inclusion of gradient reversal layers in the controller network. During forward propagation, the GRL operates as an identity map, but during backward propagation, it reverses the direction of gradients. In contrast to the domain adaptation regulariser utilised in domain-adversarial networks [ 25 ], our domain shift controller incorporates an additional minimisation task. Consequently, two GRL layers are employed, the left GRL facilitates the controller into identifying the most significant domain diference, while the right GRL adjusts the feature extractor to generate consistent features across varied domains. It is worth noting that the two gradient reversal layers structure may exhibit instability during experiments. To mitigate this, we employ a strategy where we freeze a portion of the network between the two GRL layers once a predefined iteration threshold is reached, ensuring stability in the training process.

5.2. Meta-learning based Training Strategy

In this section, we present a meta-learning based strategy for the parameter learning process. In order to ensure the domain shift controller’s ability for generalisation, we formulate the algorithmic approach for training settings. The overall algorithm can be divided into two distinct components, the training of the domain shift controller and the training of the network. These components are delineated in Algorithm 1 and Algorithm 2, respectively. Training Procedure for the Domain Shift Controller: In order to augment the model’s capability for generalisation, we adopt a two-fold approach that involves optimising the output ℒ of the domain shift controller to mitigate domain shift, while simultaneously incorporating meta-learning to facilitate the generalisation process. Initially, the available domains are randomly divided into meta-train domains denoted as train and meta-test domains, alternatively referred to as meta-validation domains, denoted as valid. The controller is leveraged to optimise the feature extractor () specifically on the meta-train domains. Subsequently, we assess the eficacy of the optimised feature extractor ( ′) on the meta-test domains, evaluating its performance and ability to generalise beyond the trained domains. After updating the parameter to ′, the classification loss function, represented by ℓ, undergoes a transformation from ℓ( , ; ) to ℓ( , ; ′). Inspired by the approach developed in Feature-critic networks [ 38 ], we establish the meta loss function as depicted in Equation 5.

ℒmeta =

∑ ( , )∈ valid

tanh (ℓ( , ; ′) − (ℓ( , ; ) ) The overall loss for updating the parameter is expressed as depicted in Equation 6. ′ ℒ (, , ; train) + ℒ meta( , , ; valid) The hyperparameter , as well as the parameters , , and corresponding to , , and respectively, are involved in Equation 6. Through the optimisation of Equation 6, the parameter of () is ultimately updated accordingly.

Training Procedure for the MeLaDA framework: In contrast to the training approach employed by MetaReg [ 37 ] or Feature-Critic networks [ 38 ], which involves training the auxiliary network before training the task network, our proposed method MeLaDA adopts an alternative training scheme for the domain shift controller and temporal MLP network. This is necessary because the controller network needs to remain functional during the test phase, requiring continuous updates even while other parts of the network are being trained. Additionally, to fully adhere to the principles of meta-learning, we leverage the model-agnostic meta-learning (MAML) [ 35 ] framework to train the network, rather than directly optimising it. We treat the domain shift controller and classification as two distinct tasks and utilise episodic training [ 42 ] to update their respective parameters. The dataset is divided into two subsets, train (meta-train domains) and valid (meta-validation domains). The domain shift controller, , utilises data from train to compute the domain shift loss, ℒ (, ; train), as well as the classification loss, ℒctlraasisnif. These losses are then used to update the parameters of the feature extractor, () . Subsequently, with the updated parameters, ( ′), processes data from valid, and the temporal MLP network computes the corresponding classification loss, ℒcvlaalsisdif. The overall loss function for optimising () and () is defined as follows ℒ (, ; train) + ℒctlraasisnif(, ) + ℒ cvlaalsisdif( ′, ) (5) (6) (7) Algorithm 1 Training the Domain Shift Controller Input: Given a domain and , , are parameters.

Output: 1: ∶ ( train, valid) ← 32:: ℒ← (, ;′ − t′raℒin)(←,; (t r(ai)n)) 4: ℒmeta(, , ; valid) ←

↪ ∑( , )∈ valid tanh (ℓ( , ; ′) − (ℓ( , ; ) ) 5: Update utilising ℒ + ℒ meta ▷ Random partitioning ▷ Meta-training stage

▷ Meta-training stage ▷ Meta-validation stage ▷ Optimisation

The introduction of MeLaDA adopts an alignment-based domain adaptation perspective. However, an alternative explanation of this method can be provided through the lens of metalearning. Recent domain generalisation approaches [ 36 ] suggest that meta-learning involves Algorithm 2 Training the MeLaDA framework Input: Given a domain and , , are parameters.

Output: , , 1: ∶ ( train, valid) ← 2: ℒ (, ; train) ← ( ( )) 3: ℒctlraasisnif(, ) ← ℓ(( ( train)), train) 54:: ℒ←cvl aalsisd′if−( ′,)ℒ←(ℓ, (; ( ( trainv)alid)), valid)

↪ 6: Update , , utilising ℒ + ℒcvlaalsisdif + ℒctlraasisnif ↪ ▷ ▷

Random partitioning

Meta-training stage ▷ Meta-training stage ▷ Meta-training stage ▷ Meta-validation stage ▷ Optimisation linking various tasks by aligning their gradients in a shared direction. In our scenario, the domain shift controller task is deliberately designed to be coupled with the “domain generalisation” process. Consequently, when the model encounters a new domain, the optimisation objective of adapting to this domain aligns with the optimisation objective propelled by the controller.

6. Experiments 6.1. Dataset and Feature Extraction

In our evaluation of the MeLaDA framework, we employ the SEED dataset [ 21 ] which was created for emotion recognition and aBCIs using EEG signals. This dataset encompasses EEG signals collected from 15 subjects who were enlisted to watch carefully curated 4 minutes of iflm clips. These clips were specifically chosen to elicit one of three distinct emotions which are happiness, neutrality, and sadness. Each subject have been subjected to an experiment 3 times in intervals of one week. During the selection process of film clips, stringent criteria were applied to ensure that each clip was well-edited, enabling the creation of coherent emotion elicitation while maximising emotional significance. The EEG signals were recorded using the ESI NeuroScan system, employing a 62-electrode headset. The sampling rate for the signals was set at 1000 Hz. By utilising the SEED dataset, we were able to assess the performance and efectiveness of our MeLaDA approach in the context of emotion recognition. The dataset’s comprehensive nature and carefully designed stimuli provide a valuable resource for training, testing, and validating algorithms and models aimed at understanding and interpreting emotions from EEG signals. The feature extraction process follows the similar strategies by deep belief networks for EEG-driven emotion recognition [ 21 ]. Given that the SEED dataset has already undergone preprocessing, we are able to directly extract the features. Specifically, we employ the diferential entropy feature [ 43 ], which has been previously shown to be efective for EEG-based emotion recognition in several studies [ 21, 44 ]. Existing work [45] have demonstrated that the diferential entropy feature corresponds to the logarithmic spectral energy of a fixed-length EEG sequence within a specific frequency band. To obtain the spectral energy, we apply the short-time Fourier transform using a non-overlapping Hanning window of 1 second to the EEG signal, considering five frequency bands, which are ranging from 1 Hz to 3 Hz, from 4 Hz to 7 Hz, from 8 Hz to 13 Hz, from 14 Hz to 30 Hz, and from 31 Hz to 50 Hz. Subsequently, we compute the diferential entropy feature. Considering the inherent dynamism observed in EEG-based emotion recognition tasks, we integrate the linear dynamic system methodology to efectively filter the diferential entropy feature. Each sample has a dimension of 310 (62 channels × 5 frequency bands). Since the EEG data consist of time series, we resample the feature with a time-step of 15 and a 1-second overlap, resulting in 3184 samples per subject.

6.2. Parameter and Implementation Settings

In line with the Plug-and-play (PnP) approach [ 34 ], we have adopted the leave-one-subject-out (LOSO) strategy to assess the generalisation capability of the MeLaDA framework. For each iteration, one subject is selected as the target, while the remaining 14 subjects are used to train our model. During the test phase, the prediction results obtained after 10 steps of self-adaptation are utilised. The feature extractor component of MeLaDA consists of a two-layer LSTM network with an output dimension of 256 and a time step of 15. The classifier is implemented as a two-layer MLP with a hidden size of 100. Both the temporal MLP network and the domain shift controller undergo optimisation using the Adam optimizer with a learning rate of 0.0002 and a weight decay of 0.0001. The parameter is assigned a value of 0.1. Initially, the temporal MLP network is pre-trained until it achieves an accuracy of over 85% on the training set. Subsequently, MeLaDA is employed to jointly train the domain shift controller and the temporal MLP network. The threshold for freezing a portion of the controller within the gradient reversal layers is set to 40, and the maximum number of iterations is set to 200.

7. Results and Discussion

To assess the eficacy of our proposed MeLaDA framework, we employ the leave-one-subjectout cross-validation evaluation scheme and conduct a comparative analysis between MeLaDA and various domain adaptation and domain generalisation approaches using the SEED dataset. The evaluation results, comprising the mean accuracy (MA) and standard deviation (SD), are presented in Table 1. In contrast to the baseline approach, which involves aggregating data from all source domains and training a single model using the support vector machine (SVM), all the evaluated methods exhibit a significant improvement in accuracy of at least 13%. Notably, MeLaDA surpasses all domain generalisation methods in terms of performance. When compared to the domain adaptation methods, MeLaDA still achieves commendable results. Although WGAN-DA [ 31 ] and Plug-and-Play method [ 34 ] exhibit marginally higher accuracy than MeLaDA. It should be noted that WGAN-DA requires all source domains and the Plug-and-Play method necessitates the utilisation of a subset of domains for adaptation, thereby limiting the fast generalisation capability of PnP method. It is important to note that our proposed method, being an implementation of a meta-learning strategy, achieves a superior accuracy compared to MLDG [ 36 ] and Feature-Critic [ 38 ] by approximately 7% and 6% respectively. These results indicate that MLDG, which directly employs episodic training to generalise the model is insuficient in efectively addressing the subject variability inherent in EEG-based emotion recognition. Similarly, the Feature-Critic network, despite utilising a sum-decomposable MLP to simulate domain shift during the training phase, does not lead to a significant improvement in the results. This suggests that the application of a domain shift controller during the testing phase proves Models SVM [46] TCA [46] TPT [46] DAN [ 29 ]

DANN [ 29 ] WGAN-DA [ 31 ] MeLaDA (Ours) MA 0.567 0.640 0.752 0.838 0.792 0.871 0.864 to be beneficial. As an augmented domain adaptation method, MeLaDA demonstrates the capability to predict any target set with only a few steps of self-adaptation, thereby leveraging the advantages of both domain adaptation and domain generalisation. Also, we examine the domain shift controller performance by measuring the accuracy of self-adaptation during the test phase. Figure 3 demonstrates the self-adaptation capabilities of MeLaDA when confronted with a new domain. Remarkably, the model achieves favourable performance within a limited number of self-adaptation steps. The left subplot illustrates the adaptation performance on the target domain during the training phase, where each self-adaptation process relies solely on the input data for prediction without requiring any additional information. The right subplot compares the performance of the controller with and without the GRL. This indicates that the inclusion of the adversarial strategy enhances the stability and eficiency of the domain shift controller, as evidenced by reduced fluctuations and improved overall performance, whereas the absence of GRL may result in fluctuations or performance deterioration.

8. Conclusion

In this study, we present an augmented domain adaptation approach, MeLaDA for dealing with a subject-agnostic model for EEG-based emotion recognition without the need for source domain data in the test phase. Our proposed approach adopted a sum-decomposable domain shift controller to facilitate augmented domain adaptation. By integrating adversarial learning and meta-learning techniques, MeLaDA demonstrates the ability to generalise to new domains with minimal self-adaptive iterations. Experimental results conducted on the SEED dataset showcase the superiority of MeLaDA over traditional domain generalisation methods in terms of performance. This highlights the suitability of MeLaDA for constructing subject-agnostic afective models, surpassing conventional domain adaptation, domain generalisation, and ASFM methods. tion from eeg, IEEE Transactions on Afective Computing 10 (2017) 417–429. [45] L.-C. Shi, Y.-Y. Jiao, B.-L. Lu, Diferential entropy feature for eeg-based vigilance estimation, in: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2013, pp. 6627–6630. [46] W.-L. Zheng, B.-L. Lu, Personalizing eeg-based afective models with transfer learning, in: Proceedings of the twenty-fith international joint conference on artificial intelligence, 2016, pp. 2732–2738.

[1]

Mühl ,

Allison ,

Nijholt ,

Chanel , A survey of afective brain computer interfaces: principles, state-of-the-art, and challenges, Brain-Computer Interfaces 1 ( 2014 ) 66 - 84 .

[2] M. M. Shanechi , Brain-machine interfaces from motor to mood , Nature neuroscience 22 ( 2019 ) 1554 - 1564 .

[3]

Jenke ,

Peer ,

Buss , Feature extraction and selection for emotion recognition from eeg , IEEE Transactions on Afective computing 5 ( 2014 ) 327 - 339 .

[4]

Wang ,

Song ,

Tao ,

Liotta ,

Yang ,

Li ,

Gao ,

Sun ,

Ge ,

Zhang , et al., A systematic review on afective computing: Emotion models , databases, and recent advances, Information Fusion ( 2022 ).

[5]

Sugiyama ,

Krauledat , K.-R. Müller , Covariate shift adaptation by importance weighted cross validation ., Journal of Machine Learning Research 8 ( 2007 ).

[6]

Samek ,

F. C.

Meinecke , K.-R. Müller , Transferring subspaces between subjects in brain-computer interfacing , IEEE Transactions on Biomedical Engineering 60 ( 2013 ) 2289 - 2298 .

[7]

Sussillo ,

S. D.

Stavisky ,

J. C.

Kao ,

S. I.

Ryu ,

K. V.

Shenoy , Making brain-machine interfaces robust to future neural variability , Nature communications 7 ( 2016 ) 13749 .

[8]

Y.-P.

Lin ,

T.-P.

Jung , Improving eeg-based emotion classification using conditional transfer learning , Frontiers in human neuroscience 11 ( 2017 ) 334 .

[9]

Fdez ,

Guttenberg ,

Witkowski ,

Pasquali , Cross-subject eeg-based emotion recognition through neural networks with stratified normalization , Frontiers in neuroscience 15 ( 2021 ) 626277 .

[10]

Lan ,

Sourina ,

Wang ,

Scherer ,

G. R.

Müller-Putz , Domain adaptation techniques for eeg-based emotion recognition: a comparative study on two public datasets , IEEE Transactions on Cognitive and Developmental Systems 11 ( 2018 ) 85 - 94 .

[11] W.-C. L. Lew , D.

Wang , K.

Shylouskaya , Z.

Zhang , J. -H. Lim , K. K.

Ang , A.-H.

Tan , Eegbased emotion recognition using spatial-temporal representation via bi-gru , in: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) , IEEE, 2020 , pp. 116 - 119 .

[12]

Xu ,

Dang ,

Wang ,

Zhou , Dagam: A domain adversarial graph attention model for subject independent eeg-based emotion recognition , Journal of Neural Engineering ( 2022 ).

[13]

Chai ,

Wang ,

Zhao ,

Li ,

Liu ,

Bai , A fast, eficient domain adaptation technique for cross-domain electroencephalography (eeg)-based emotion recognition , Sensors 17 ( 2017 ) 1014 .

[14]

Gu ,

Cai ,

Gao ,

Jiang ,

Ning ,

Qian , Multi-source domain transfer discriminative dictionary learning modeling for electroencephalogram-based emotion recognition , IEEE Transactions on Computational Social Systems 9 ( 2022 ) 1604 - 1612 .

[15]

S. J.

Pan ,

I. W.

Tsang ,

J. T.

Kwok ,

Yang , Domain adaptation via transfer component analysis , IEEE transactions on neural networks 22 ( 2011 ) 199 - 210 .

[16]

Long ,

Zhu ,

Wang , M. I. Jordan , Deep transfer learning with joint adaptation networks , in: International conference on machine learning, PMLR , 2017 , pp. 2208 - 2217 .

[17]

Ben-David ,

Blitzer ,

Crammer ,

Kulesza ,

Pereira ,

J. W.

Vaughan , A theory of learning from diferent domains , Machine learning 79 ( 2010 ) 151 - 175 .

[18]

Zhao ,

R. T. Des

Combes ,

Zhang , G. Gordon, On learning invariant representations for domain adaptation , in: International conference on machine learning, PMLR , 2019 , pp. 7523 - 7532 .

[19]

Goodfellow ,

Pouget-Abadie ,

Mirza ,

Xu ,

Warde-Farley ,

Ozair ,

Courville ,

Bengio , Generative adversarial networks , Communications of the ACM 63 ( 2020 ) 139 - 144 .

[20]

T. O.

Zander ,

Jatzev , Context-aware brain-computer interfaces: exploring the information space of user, technical system and environment , Journal of Neural Engineering 9 ( 2011 ) 016003 .

[21] W.-L. Zheng , B.-L. Lu , Investigating critical frequency bands and channels for eeg-based emotion recognition with deep neural networks , IEEE Transactions on autonomous mental development 7 ( 2015 ) 162 - 175 .

[22]

Wang ,

Feng ,

Chen ,

Yu ,

Huang ,

P. S.

Yu , Visual domain adaptation with manifold embedded distribution alignment , in: Proceedings of the 26th ACM international conference on Multimedia , 2018 , pp. 402 - 410 .

[23]

Zhuang , X. Cheng, P. Luo,

S. J.

Pan ,

He , Supervised representation learning: Transfer learning with deep autoencoders , in: Twenty-fourth international joint conference on artificial intelligence , 2015 .

[24]

Gretton ,

Borgwardt ,

Rasch ,

Schölkopf ,

Smola , A kernel method for the two-sample- problem , Advances in neural information processing systems 19 ( 2006 ).

[25]

Ganin ,

Lempitsky , Unsupervised domain adaptation by backpropagation , in: International conference on machine learning, PMLR , 2015 , pp. 1180 - 1189 .

[26]

Tzeng ,

Hofman ,

Saenko , T. Darrell, Adversarial discriminative domain adaptation , in: Proceedings of the IEEE conference on computer vision and pattern recognition , 2017 , pp. 7167 - 7176 .

[27]

Shen ,

Qu ,

Zhang ,

Yu , Wasserstein distance guided representation learning for domain adaptation , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 32 , 2018 .

[28]

Ganin , E. Ustinova,

Ajakan ,

Germain ,

Larochelle ,

Laviolette ,

Marchand ,

Lempitsky , Domain-adversarial training of neural networks , The journal of machine learning research 17 ( 2016 ) 2096 - 2030 .

[29]

Li ,

Y.-M.

Jin ,

W.-L.

Zheng ,

B.-L.

Lu , Cross-subject emotion recognition using deep adaptation networks , in: Neural Information Processing: 25th International Conference, ICONIP 2018 ,

Siem

Reap , Cambodia, December 13-16 , 2018 , Proceedings, Part V 25, Springer, 2018 , pp. 403 - 413 .

[30]

Arjovsky ,

Chintala , L. Bottou, Wasserstein generative adversarial networks , in: International conference on machine learning, PMLR , 2017 , pp. 214 - 223 .

[31]

Luo , S.-Y. Zhang, W.-L. Zheng,

B.-L.

Lu , Wgan domain adaptation for eeg-based emotion recognition , in: L. Cheng,

A. C. S.

Leung , S. Ozawa (Eds.), Neural Information Processing , Springer International Publishing, Cham, 2018 , pp. 275 - 286 .

[32]

Wang ,

Lan , C. Liu,

Ouyang ,

Qin ,

Lu ,

Chen ,

Zeng ,

Yu , Generalizing to unseen domains: A survey on domain generalization , IEEE Transactions on Knowledge and Data Engineering ( 2022 ).

[33]

B.-Q.

Ma ,

Li ,

W.-L.

Zheng ,

B.-L.

Lu , Reducing the subject variability of eeg signals with adversarial domain generalization , in: Neural Information Processing: 26th International Conference, ICONIP 2019 , Sydney , NSW , Australia, December 12-15 , 2019 , Proceedings, Part I 26 , Springer, 2019 , pp. 30 - 42 .

[34] L.-M. Zhao , X.

Yan , B.-L.

Lu , Plug- and -play domain adaptation for cross-subject eeg-based emotion recognition , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 35 , 2021 , pp. 863 - 870 .

[35]

Finn ,

Abbeel ,

Levine , Model-agnostic meta-learning for fast adaptation of deep networks , in: International conference on machine learning, PMLR , 2017 , pp. 1126 - 1135 .

[36]

Li ,

Yang ,

Y.-Z.

Song ,

Hospedales , Learning to generalize: Meta-learning for domain generalization , in: Proceedings of the AAAI conference on artificial intelligence , volume 32 , 2018 .

[37]

Balaji ,

Sankaranarayanan ,

Chellappa , Metareg: Towards domain generalization using meta-regularization , Advances in neural information processing systems 31 ( 2018 ).

[38]

Li ,

Yang ,

Zhou , T. Hospedales, Feature-critic networks for heterogeneous domain generalization , in: International Conference on Machine Learning, PMLR , 2019 , pp. 3915 - 3924 .

[39]

Zaheer ,

Kottur ,

Ravanbakhsh ,

Poczos ,

R. R.

Salakhutdinov ,

A. J.

Smola , Deep sets, Advances in neural information processing systems 30 ( 2017 ).

[40]

C. R.

Qi ,

Yi ,

Su ,

L. J.

Guibas , Pointnet++: Deep hierarchical feature learning on point sets in a metric space , Advances in neural information processing systems 30 ( 2017 ).

[41]

Wagstaf ,

Fuchs ,

Engelcke , I. Posner,

M. A.

Osborne , On the limitations of representing functions on sets , in: International Conference on Machine Learning, PMLR , 2019 , pp. 6487 - 6494 .

[42]

Li ,

Zhang ,

Yang ,

Liu , Y.-

Song ,

T. M.

Hospedales , Episodic training for domain generalization , in: Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019 , pp. 1446 - 1455 .

[43]

R.-N.

Duan ,

J.-Y.

Zhu ,

B.-L.

Lu , Diferential entropy feature for eeg-based emotion classification , in: 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER) , IEEE, 2013 , pp. 81 - 84 .

[44] W.-L. Zheng , J.-Y.

Zhu , B.-L.

Lu , Identifying stable patterns over time for emotion recogni-