<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unimodal vs Multimodal Coupling in Emotion Recognition: An Explainable Framework using Physiological States and Transitions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anubhav</string-name>
          <email>anubhav2901@g.ecc.u-tokyo.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kantaro Fujiwara</string-name>
          <email>kantaro@g.ecc.u-tokyo.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Tokyo</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Emotion recognition from physiological signals has traditionally prioritized classification accuracy over interpretability, often relying on complex black-box models such as deep neural networks. In contrast, this work proposes a lightweight and explainable framework for emotion recognition using multimodal physiological data from the DEAP dataset. We focus on all four emotional dimensions, valence, arousal, dominance, and liking, by analyzing binarized state and transition dynamics across seven physiological modalities: EEG, EOG, EMG, GSR, respiration rate, PPG, and temperature. Our framework explores three layers of explainability: modality-wise signal relevance, pairwise multimodal coupling, and the dynamics of state and state transitions. Using Spearman rank correlation with subject-wise and inter-subject aggregation, we identify interpretable physiological patterns and transition signatures correlating with emotional states. Results show that transition-based features from modality couplings consistently outperform unimodal analyses across all emotion dimensions. In the subject-wise setting, the strongest correlations were observed for valence also via GSR-Resp ( = 0.578,  &lt; 0.0001), arousal via EOG-EMG ( = 0.580,  &lt; 0.0001), dominance via EEG-PPG ( = 0.594,  &lt; 0.0001), and liking via GSR-Resp ( = 0.589,  &lt; 0.0001). However, during inter-subject aggregation, the strength of these correlations diminished, suggesting that emotional signatures in physiological signals exhibit significant inter-individual variability and benefit from subject-specific modeling. These findings underscore the utility of explainable multimodal coupling for real-time, interpretable prediction systems and lay the groundwork for future integration into adaptive, personalized frameworks with the potential to scale toward more generalizable, context-aware emotion recognition systems across diverse user populations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multimodal AI</kwd>
        <kwd>XAI</kwd>
        <kwd>Physiological Signals</kwd>
        <kwd>Emotion Recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Afective computing using physiological signals has gained significant momentum in recent years,
driven by advances in wearable sensor technology, increased afordability, and improved signal fidelity.
These developments have enabled continuous, non-invasive monitoring of internal states such as stress,
arousal, and emotional reactivity. The importance of such tools has become particularly salient in the
post-COVID-19 era, where rates of anxiety and depression have risen sharply, prompting renewed
interest in unobtrusive methods for emotional well-being monitoring and regulation. In this context,
physiological emotion recognition systems ofer a promising foundation for real-time mental health
support, biofeedback applications, and emotionally intelligent interfaces.</p>
      <p>
        Among various sensing modalities, physiological signals such as electroencephalography (EEG),
electrooculography (EOG), galvanic skin response (GSR), photoplethysmography (PPG), and respiration
rate are increasingly recognized for their ability to provide objective and continuous indicators of
emotional state. Unlike vision or audio-based emotion recognition, these biosignals can be measured
in silent, private, or wearable settings without reliance on external context. In particular, the DEAP
dataset has become a benchmark resource for physiological emotion research, ofering a multimodal
collection of EEG and peripheral signals recorded across 32 subjects and annotated using four core
emotion dimensions: valence, arousal, dominance, and liking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Physiological signal-based models are especially valuable for wearable embedded BCI applications
due to their low latency, low noise, and potential for unobtrusive acquisition. With the rise of real-time
applications, there is a growing demand for interpretable emotion recognition models that balance
performance with explainability and resource eficiency. While earlier studies have shown high
classification accuracy using deep learning models such as Long Short-Term Memory networks (LSTM)
[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], or non-linear dynamics methods based on reservoir computing [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], these approaches often
lack transparency in revealing why specific physiological features contribute to emotion recognition
outcomes.
      </p>
      <p>
        Recent work in multimodal learning has highlighted the potential of fusing multiple physiological
signals to improve classification robustness [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. However, most models focus only on valence and
arousal, neglecting DEAP’s complete label set, which includes dominance and liking. Furthermore,
fusion techniques treat multimodal input holistically without systematically examining pairwise
modality interactions or temporal dynamics. This omission represents a missed opportunity to explore how
unimodal signals contribute individually and how their pairwise combinations, multimodal couplings,
can reveal interpretable emotion-specific interactions and transition patterns.
      </p>
      <p>In this context, there remains a clear gap in the literature. First, most existing approaches ofer
limited support for explainability, particularly in identifying interpretable biomarkers within or across
modalities. Second, the dominance and liking dimensions remain underexplored emotion dimensions.
Third, while state dynamics and transitions are implicit in sequential models, few studies extract
these features explicitly in a computationally lightweight and interpretable fashion. In particular, prior
research has not systematically compared unimodal versus multimodal coupling strategies in the context
of emotion recognition.</p>
      <p>To address these gaps, we propose a novel, explainable framework for multimodal physiological
emotion recognition using the DEAP dataset. Our method is centered around three dimensions of
analysis: (1) signal-level representation (unimodal), (2) pairwise modality coupling (multimodal), and (3)
temporal dynamics in terms of binary state transitions. Each signal is binarized based on standardized
trial-level statistics and transformed into interpretable features such as state frequency, transition
count, and directional state-transition sequences. Using Spearman correlation, we relate these features
to all four emotional dimensions: valence, arousal, dominance, and liking, at both intra-subject and
inter-subject levels.</p>
      <p>Our framework yields interpretable correlation scores and visual summaries identifying dominant
modalities and modality pairs associated with specific emotional experiences. We demonstrate that
our transition-based multimodal coupling features correlate more with emotional labels than unimodal
models while preserving computational eficiency. The remainder of this paper is organized as follows:
Section 2 reviews prior work on interpretable emotion recognition. Section 3 describes our proposed
framework. Section 4 presents our empirical findings across the three analytical layers. Section 5
discusses the implications of these findings, and Section 6 concludes the paper with future research
directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Unimodal EEG-Based Emotion Recognition</title>
        <p>
          Emotion recognition using EEG has been extensively studied, particularly within subject-dependent
settings using supervised learning. Traditional approaches extract frequency-domain or time-frequency
features from EEG signals and use classifiers such as SVMs or deep networks. For instance, Nath et
al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and Anubhav and Fujiwara [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] achieved classification accuracies exceeding 93% on valence and
arousal using LSTM-based models. However, these methods are generally black-box in nature and rely
on high-dimensional input, limiting their interpretability.
        </p>
        <p>
          Alternative modeling frameworks such as Reservoir Computing (RC) ofer biologically inspired
processing of EEG dynamics with reduced training complexity. Anubhav and Fujiwara proposed a
reservoir splitting strategy [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to isolate brain-lobe-specific activity. Their subsequent multi-reservoir
framework [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] evaluated EEG generalization across trials, subjects, and contexts. While these approaches
improve eficiency, they are still constrained to spatial EEG features and do not explicitly capture
modality interactions or transitions.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Multimodal Physiological Fusion and Signal Coupling</title>
        <p>
          Multimodal learning strategies integrate EEG with peripheral physiological signals such as EOG,
EMG, GSR, and PPG to capture richer emotional information. Bălan et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] used Random Forests
to classify V-A-D emotional states from EEG and GSR, analyzing feature contributions from each
modality. Gohumpu et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] further demonstrated that incorporating multiple peripheral modalities
enhances both classification performance and interpretability. Yet, most fusion strategies treat modalities
holistically, without analyzing their pairwise interactions.
        </p>
        <p>
          Hierarchical and attention-based fusion models have been introduced to address modality
relationships. Zhang et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] proposed a hierarchical fusion network, while more recent transformer models
such as ST-SHAP [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and ERTNet [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] provide partial explainability through SHAP and attention
weights. These architectures improve accuracy but rely on computationally intensive training and lack
transparency in physiological terms. Saxena et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] summarized such trends, highlighting that the
performance-interpretability trade-of remains unresolved in multimodal emotion recognition.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Interpretability and Temporal Dynamics in Emotion Modeling</title>
        <p>
          Despite increasing focus on explainability, most models fall short of delivering physiologically
meaningful interpretations. Attention mechanisms and SHAP values indicate signal importance but do not
reveal inter-modality dynamics or temporal structure. Shu et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] surveyed entropy and mutual
information metrics as proxies for EEG signal complexity, yet their application in multimodal temporal
analysis remains rare.
        </p>
        <p>Temporal aspects of emotional experience, such as transitions between physiological states, are
underexplored. While deep models like LSTMs implicitly capture sequence dynamics, explicit modeling
of temporal transitions and their relation to emotion has not been central in existing frameworks.
Furthermore, existing approaches primarily address valence and arousal, neglecting dominance and
liking due to modeling complexity.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Our Contribution: Interpretable Multimodal Coupling with Temporal</title>
      </sec>
      <sec id="sec-2-5">
        <title>Transitions</title>
        <p>We address these limitations by proposing a fully interpretable and computationally eficient framework
for emotion recognition using all four DEAP emotional labels: valence, arousal, dominance, and
liking. Our approach departs from high-dimensional, opaque models and instead focuses on three core
contributions:
• Binarized signal representation across modalities: We discretize seven physiological
channels to enable direct analysis of state activation and transitions.
• Pairwise modality coupling: Rather than treating the multimodal signal as a fused whole, we
examine synergistic interactions between specific modality pairs to uncover functional
relationships.
• Temporal transition modeling: We statistically quantify cross-time transitions using a
correlation-based framework, making temporal structure and dynamics explicit and interpretable.</p>
        <p>
          Our method does not require deep networks or supervised training, making it suitable for real-time
applications. By analyzing modality transitions and their correlations with emotional states, we ofer a
novel pathway toward understanding the physiological basis of emotion in a multimodal and temporally
resolved manner. Importantly, our work builds on conceptual foundations from Energy Landscape
Analysis (ELA), which has been used to model brain state transitions [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]. Unlike traditional ELA,
which relies on maximum entropy modeling and is limited to unimodal EEG, our framework extends
this perspective to multimodal binarized transitions, providing a scalable and interpretable alternative
for emotion research.
        </p>
        <p>To contextualize our contribution, Table 1 summarizes selected studies emphasizing interpretable
frameworks using unimodal and multimodal physiological signals. As shown, most approaches focus
on valence and arousal only and rarely include modality-level transition features or pairwise signal
coupling.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset and Preprocessing</title>
        <p>
          The present study uses the DEAP dataset [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], a widely adopted multimodal emotion corpus that includes
physiological recordings from 32 participants across 40 video trials. Each trial includes self-reported
ratings on four emotional dimensions: valence, arousal, dominance, and liking, scored on a 9-point
Likert scale.
        </p>
        <p>We utilize all seven physiological modalities recorded in the dataset: 32 EEG channels, 2 EOG channels,
2 EMG channels, GSR, respiration (RESP), photoplethysmography (PPG), and skin temperature (TEMP).
To ensure uniformity across modalities and reduce dimensionality, we average signals across the
corresponding channels: EEG, EOG, and EMG signals are each reduced to a single average vector per
trial, resulting in a final 7-dimensional time series per trial.</p>
        <p>All signals are standardized on a per-trial basis using z-score normalization. We do not downsample
the signals and maintain the 128 Hz sampling rate provided in the preprocessed DEAP dataset. The
emotional labels are also standardized within subjects but are preserved in their continuous form for
correlation analysis.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Binarization and Feature Representation</title>
        <p>Each of the seven physiological signals is binarized within a trial such that a value of 1 is assigned when
the standardized signal exceeds 0 (i.e., above the mean), and 0 otherwise. This binarization allows us to
extract interpretable features based on binary state logic.</p>
        <p>We extract three categories of features from each modality and modality-pair:
• State-based: Frequency of signal being in state 1 over the trial duration.
• Transition-based: Counts of 0 → 1 and 1 → 0 transitions within each trial.
• State-transition-based: Joint transition features over time, including:
– For unimodal: transitions such as 0 → 0, 0 → 1, 1 → 0, 1 → 1.
– For multimodal pairs: coupled transitions such as 00 → 01, 10 → 11, 01 → 00, etc., over
time-aligned signals.</p>
        <p>This feature structure enables both intra- and inter-modality interpretability.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Analysis Framework</title>
        <p>Our methodology is structured along three analytical layers:
• Modality-Level Analysis: We examine both unimodal signals and all pairwise combinations
of modalities. For 7 modalities, this results in 21 pairwise modality combinations. For each
combination, we evaluate the informativeness of state and transition features independently.
• Subject-Level Analysis: We conduct both intra-subject and inter-subject analyses. In the
intrasubject setup, features and emotion labels are correlated per subject and then aggregated. In the
inter-subject analysis, features across all trials and subjects are pooled before correlation analysis.
• Feature-Type Analysis: For each modality or modality-pair, we compute features using the
three schemes above (state-based, transition-based, state-transition-based). This layered setup
allows us to assess the importance of dynamic signal behavior as opposed to static values.</p>
        <p>We use Spearman rank correlation to assess the relationship between extracted features and the four
continuous emotion labels. Spearman’s  is chosen due to its non-parametric nature and robustness to
outliers. For each feature and label pair, we compute both the correlation coeficient  and associated
-value. The feature with the highest  (with  &lt; 0.05) for each modality and modality-pair is retained
for further visualization.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Unimodal Analysis</title>
        <sec id="sec-4-1-1">
          <title>4.1.1. State-Based Analysis</title>
          <p>We begin by analyzing the binary state frequency for each of the seven physiological modalities across
all trials. Figure 1 shows the best-performing modality per subject and emotion label. The heatmaps
indicate that EOG and PPG consistently emerge as top modalities for liking and dominance, while EEG
and GSR dominate in the valence and arousal dimensions.</p>
          <p>To assess statistical significance across modalities, Figure 2 shows boxplots of Spearman correlation
coeficients (  ) between state frequency and each emotion label across all subjects. Overall, the
distribution of correlation values is centered near zero, with the strongest modality-label combinations
reaching  ≈ 0.10. Notably, the corrected -values from pairwise comparisons indicate no statistically
significant diferences between modalities (all  &gt; 0.12), as summarized in Appendix A.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Transition-Based Analysis</title>
          <p>We next evaluated unimodal transitions between states, particularly the counts of 1 → 0 and 0 → 1
events. Figure 3 summarizes the transition feature that yields the highest correlation with each emotion
label.</p>
          <p>From the inter-subject analysis, we observe:
• For arousal, the 1 → 0 transition in GSR exhibits the highest correlation ( = 0.10,  &lt; 0.01).
• For liking and valence, the 1 → 0 transition in EOG shows the highest correlations ( = 0.09
and  = 0.07 respectively, both  &lt; 0.05).
• For dominance, the strongest correlation was again from GSR’s 1 → 0 transition ( = 0.05,
 &lt; 0.05).</p>
          <p>These findings highlight that while the absolute strength of unimodal correlations remains modest,
incorporating transition dynamics allows for capturing signal patterns not revealed in state-based
analysis alone.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Multimodal Analysis</title>
        <sec id="sec-4-2-1">
          <title>4.2.1. State-Based Analysis</title>
          <p>In the multimodal setting, we evaluate all (︀ 7)︀ = 21 pairwise combinations of physiological modalities.</p>
          <p>2
For each modality pair, we extract joint binary state frequencies (e.g., fraction of time both signals are
in state 0).</p>
          <p>Figure 4 visualizes the best-performing state for each pair across emotion labels. In the inter-subject
analysis, we observe:
• Valence: highest correlation from state_00 of EMG–RESP ( = 0.09,  &lt; 0.01)
• Arousal: best feature is state_00 of EEG–EOG ( = 0.07,  &lt; 0.05)
• Dominance: state_10 of EEG–EOG ( = 0.07,  &lt; 0.05)
• Liking: state_01 of EOG–TEMP ( = 0.07,  &lt; 0.05)</p>
          <p>The corresponding network diagrams in Figure 5 show the top modality pairs for each label across
intra- and inter-subject aggregations. Particularly, EEG–EOG and EOG–TEMP consistently appear as
high-weight nodes, suggesting strong modality coupling for valence and liking.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Transition-Based Analysis</title>
          <p>Finally, we analyze state-transition dynamics in modality pairs. Transition pairs are defined as
timealigned 2-tuple changes across two binarized signals. The best transitions across the combined dataset
are visualized in Figure 6.</p>
          <p>• Valence: trans_0_2 from GSR–RESP ( = 0.578,  &lt; 0.001)
• Arousal: trans_0_2 from EOG–EMG ( = 0.580,  &lt; 0.001)
• Dominance: trans_1_1 from EEG–PPG ( = 0.594,  &lt; 0.001)
• Liking: trans_0_2 from GSR–RESP ( = 0.589,  &lt; 0.001)</p>
          <p>Inter-subject network structures visualized in Figure 7 further emphasize the strength of
transitionbased features, especially in PPG and GSR. These features not only surpass all unimodal results but also
introduce interpretable coupling efects between central (EEG/EOG) and peripheral (GSR/PPG) signals.</p>
          <p>In summary, state-transition-based multimodal features provide the highest interpretability and efect
sizes, demonstrating strong, statistically significant associations with all four emotional labels.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Our findings show that transition-based multimodal features consistently outperform unimodal and
static state-based features. Notable pairings, such as GSR–RESP ( = 0.578,  &lt; 0.001 for valence)
and EEG–PPG ( = 0.594,  &lt; 0.001 for dominance, demonstrate clear physiological relevance.
Our framework also highlights the importance of transition patterns, such as 0 → 2 and 1 → 1, as
discriminative features for emotion recognition.</p>
      <p>
        Importantly, our analysis includes dominance and liking, which are often omitted in afective
computing literature, providing interpretable correlations for previously underexplored modality pairs like
EEG–PPG (dominance) and EOG–TEMP (liking) [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. This result extends prior studies that focused on
EEG-only data [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] or employed multimodal fusion without interpretability [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <sec id="sec-5-1">
        <title>5.1. Comparison with Other Methods</title>
        <p>
          In terms of methodology, our framework is computationally eficient and avoids model training,
parameter tuning, or high-dimensional feature extraction. This stands in contrast to deep learning models
such as LSTM [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and reservoir computing approaches [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], making our method suitable for real-time,
low-power applications. Unlike SHAP-based transformer models [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] or attention-based explanations
in ERTNet [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], our correlation-based analysis ofers direct interpretability by linking each  score
to simple physiological patterns. Furthermore, our framework conceptually aligns with ELA, which
models metastable brain dynamics using binarized EEG [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]. While ELA traditionally requires
complex parameter fitting and is limited to unimodal data, our method ofers a tractable and extensible
alternative for future multimodal applications in emotional state modeling.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Predictive Performance</title>
        <p>To complement our correlation-based analysis, we conducted a lightweight predictive comparison using
logistic regression over the four interpretable feature families (unimodal state, unimodal transitions,
multimodal state, multimodal transitions) with both within-subject cross-validation and
leave-onesubject-out (LOSO) evaluation. Within-subject, valence achieved AUROC = 0.557 [0.523, 0.592] with
multimodal transition features, and arousal reached 0.550 [0.515, 0.584] with unimodal transitions,
both significantly above chance. Dominance and liking were weaker (AUROC ≈ 0.52–0.54, confidence
intervals overlapping 0.5). Under LOSO, performance was modest (most AUROCs ≈ 0.51–0.54),
reflecting strong inter-subject variability, in line with our correlation analyses. These results indicate
that the proposed interpretable transition features are not only statistically associated with emotion
labels but are also predictive while remaining computationally lightweight.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Implications for Personalization and Generalization</title>
        <p>The predictive trends above dovetail with our correlation findings and reinforce a central takeaway of
this work: subject-specific modeling matters. The clear within-subject gains alongside modest LOSO
performance indicate that interpretable transition features capture stable, person-dependent signatures
that dilute under cross-subject pooling. Practically, this favors lightweight, on-device personalization
(e.g., brief calibration or adaptive thresholds) for real-time use, while encouraging future work on
transfer and normalization strategies that preserve interpretability. Taken together with our correlation
results, these baselines strengthen the case that multimodal transition structure is a meaningful, human
parsable substrate for emotion modeling even when absolute cross-subject accuracy remains challenging.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Limitations and Future Work</title>
        <p>
          However, several limitations must be acknowledged. First, the binarization process, while useful for
interpretability, simplifies continuous signals and may overlook finer-grained patterns. Second, while
Spearman correlation captures monotonic relationships, it does not model conditional dependencies or
interactions involving three or more modalities. Third, our analysis does not yet incorporate subject-level
variability modeling or personalized baselines, which have shown utility in previous EEG studies [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Furthermore, we currently analyze each trial as an independent unit, which may limit generalizability
in time-locked or context-rich scenarios.
        </p>
        <p>
          Future work will address these limitations in multiple directions. One promising extension is to
explore multi-threshold or ternary state representations that preserve more information without
compromising interpretability. Additionally, integrating this framework with shallow learning classifiers
(e.g., logistic regression with binary state features) could bridge the gap between interpretable analysis
and predictive modeling. Incorporating personalized baselines or adapting transition features to dynamic
resting-state estimates can further improve robustness across subjects. Lastly, a key future direction
involves explicitly linking our statistical transition framework to multimodal energy landscape models.
This would allow us to define basins of attraction, compute energy gradients across modalities, and
map emotion trajectories in a formalized state-space. Applying this methodology to other benchmark
datasets such as DREAMER [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] or AMIGOS [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] would validate and extend the current interpretations
across acquisition conditions and emotional contexts, ofering better generalizability.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper presents an interpretable and computationally eficient framework for emotion recognition
using multimodal physiological signals. Using DEAP dataset, we conduct a comprehensive
correlationbased analysis across four emotion dimensions: valence, arousal, dominance, and liking, by examining
both unimodal and multimodal signal pairings through state- and transition-based features. The
framework operates entirely on binarized representations and interpretable signal dynamics, requiring
no complex learning algorithms, and is therefore well-suited for low-latency, real-time emotion tracking
applications.</p>
      <p>Our results consistently demonstrate that transition-based multimodal features outperform unimodal
and state-based counterparts in both intra- and inter-subject analyses. These findings underscore
the importance of examining temporal dynamics and cross-modal interactions to understand the
physiological basis of emotional experience. The approach not only confirms known relationships, such
as EEG’s relevance to valence and arousal, but also identifies underutilized modalities such as GSR,
TEMP, and PPG as critical in characterizing dominance and liking, which often overlooked in prior
studies.</p>
      <p>Beyond the empirical results, this study contributes to the field of explainable artificial intelligence
(XAI) by ofering a framework grounded in transparent, domain-relevant features. The correlation
outputs are directly interpretable, and the modality-level analysis provides physiologically meaningful
insights. In contrast to black-box deep models, our method reveals interpretable transition structures
that align with human-understandable physiological behavior. The proposed methodology also opens
up new possibilities for bridging physiological emotion recognition with statistical physics-based
frameworks such as energy landscape analysis. Our observations of recurring stable transition motifs
and modality-pair coupling patterns suggest a latent structure in emotional state space that mirrors
metastable attractor dynamics in energy landscapes.</p>
      <p>Future research should explore more expressive discrete encodings, dynamic baseline adjustments,
and integration with shallow predictive models. Importantly, a promising extension involves modeling
the transition networks as multimodal energy landscapes, identifying basins of emotional stability, and
capturing the temporal evolution of emotional states through energy gradients. Such an approach could
provide a powerful formalism for describing emotion dynamics in physiological data while maintaining
interpretability. Extending this analysis to other datasets and contextual conditions will further enhance
its generalizability and applicability to real-world emotion-aware systems.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>JSPS KAKENHI Grant Numbers JP22K18419, JP24K15161, JP25H00451, JST Moonshot RD Grant No.
JPMJMS2021</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4o and Grammarly in order to: Grammar
and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Appendix</title>
      <p>This appendix provides comprehensive correlation results referenced in the main paper. We include
intra-subject and inter-subject correlation tables for both state-based and transition-based features,
covering unimodal and multimodal analyses. Additionally, we report normalized state distributions by
modality and emotion label.</p>
      <sec id="sec-9-1">
        <title>A.1. Unimodal State-Based Correlations</title>
        <p>Table 2 presents the top and statistically significant unimodal state-based correlations between
physiological modalities and emotional dimensions across inter-subject aggregation. Notably, the EOG signal
shows weak but consistent correlations for all four emotion labels, especially for dominance ( = 0.07,
 = 0.011). GSR signals also correlate with dominance, albeit with lower efect sizes.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>B. Unimodal Transition-Based Correlations</title>
      <p>This section presents the correlation-based results for transition patterns within individual physiological
modalities, independently assessed for their association with emotional dimensions: valence, arousal,
dominance, and liking. For each modality, we compute transition frequencies of the form  → 
(where ,  ∈ {0, 1}) over binarized state representations. The resulting features are evaluated using
Spearman correlation with emotion scores, aggregated at the inter-subject level and presented in
Table 3. Among unimodal analyses, only the EOG and GSR modalities yielded statistically significant
results. The EOG transition from high to low (1 → 0) was moderately correlated with liking ( = 0.06,
 = 0.039). However, across all emotion labels, unimodal transition-based features generally showed
weak correlations, reinforcing the necessity of multimodal coupling for robust emotional inference.</p>
    </sec>
    <sec id="sec-11">
      <title>Multimodal State-Based Correlations</title>
      <p>This appendix presents detailed inter-subject correlation results for the state-based multimodal emotion
recognition framework using binarized state combinations across modality pairs. For each emotion
label—valence, arousal, dominance, and liking—we report statistically significant correlations between
binary state patterns and emotion ratings, using Spearman’s rho and the corresponding -values.</p>
      <sec id="sec-11-1">
        <title>C.1. Top Correlations by Emotion Dimension</title>
        <p>Table 4 summarizes the highest correlations observed for each emotion label in the inter-subject
multimodal state-based analysis with Table 5 shows of statistically significant (  &lt; 0.05) correlations
observed in the inter-subject analysis for each emotion label. These results reflect the joint state patterns
across modality pairs that are most informative of emotional states.
Top multimodal state-based correlations (inter-subject) per emotion label.</p>
        <p>Emotion Label</p>
        <p>Modality Pair
Valence
Arousal
Dominance
Liking</p>
        <p>EMG–RESP
GSR–TEMP
EEG–EOG
EOG–TEMP</p>
        <p>State
state_00
state_01
state_10</p>
        <sec id="sec-11-1-1">
          <title>Correlation ( , )</title>
          <p>= 0.09,  = 0.002
 = 0.07,  = 0.008
 = − 0.08,  = 0.005
state_01  = − 0.08,  = 0.004</p>
        </sec>
      </sec>
      <sec id="sec-11-2">
        <title>C.2. Interpretation</title>
        <p>While the overall correlation values are moderate (| | &lt; 0.1), the consistency of specific modality pairs
across emotional dimensions, especially involving EEG–EOG, GSR–TEMP, and EMG–RESP, suggests
robust inter-modality state dependencies. Interestingly, state_01 and state_11 emerge as frequently
informative binary configurations, indicating patterns where both modalities are either concurrently
low or concurrently high. These results complement the intra-subject and transition-based findings in
the main text by reinforcing the value of analyzing simple, interpretable joint states across physiological
signals.</p>
      </sec>
      <sec id="sec-11-3">
        <title>C.3. Multimodal Transition-Based Correlations</title>
        <sec id="sec-11-3-1">
          <title>C.3.1. Intra-Subject</title>
          <p>Table 6 lists the top-performing multimodal transition features based on intra-subject correlation
analysis. These pairs exhibit significantly stronger correlations (  &gt;
0.57,  &lt; 0.001), underscoring the
utility of modeling cross-modality transitions for capturing afective states.</p>
          <p>Transition
EOG–EMG
EEG–PPG
GSR–RESP
GSR–RESP</p>
          <p>Transition
EOG–EMG
EEG–PPG
GSR–RESP
GSR–RESP
 (-value)</p>
        </sec>
        <sec id="sec-11-3-2">
          <title>C.3.2. Inter-Subject</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Koelstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Muhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soleymani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yazdani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nijholt</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Patras</surname>
          </string-name>
          ,
          <article-title>Deap: A database for emotion analysis using physiological signals</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
          <fpage>18</fpage>
          -
          <lpage>31</lpage>
          . doi:
          <volume>10</volume>
          .1109/T-AFFC.
          <year>2011</year>
          .
          <volume>15</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nath</surname>
          </string-name>
          , Anubhav,
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sethia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Indu</surname>
          </string-name>
          ,
          <article-title>A comparative study of subject-dependent and subject-independent strategies for eeg-based emotion recognition using lstm network</article-title>
          ,
          <source>in: Proceedings of the 2020 4th International Conference on Compute and Data Analysis, ACM</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          . URL: https://doi.org/10.1145/3388142.3388167. doi:
          <volume>10</volume>
          .1145/3388142.3388167.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Anubhav</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Nath</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Sethia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kalra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Indu</surname>
          </string-name>
          ,
          <article-title>An eficient approach to eeg-based emotion recognition using lstm network</article-title>
          ,
          <source>in: 2020 16th IEEE International Colloquium on Signal Processing &amp; Its Applications (CSPA)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>92</lpage>
          . URL: https://doi.org/10.1109/CSPA48992.
          <year>2020</year>
          .
          <volume>9068691</volume>
          . doi:
          <volume>10</volume>
          .1109/CSPA48992.
          <year>2020</year>
          .
          <volume>9068691</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Anubhav</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Fujiwara</surname>
          </string-name>
          ,
          <article-title>Reservoir splitting method for eeg-based emotion recognition</article-title>
          ,
          <source>in: 2023 11th International Winter Conference on Brain-Computer Interface (BCI)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: https://doi.org/10.1109/BCI57258.
          <year>2023</year>
          .
          <volume>10078629</volume>
          . doi:
          <volume>10</volume>
          .1109/BCI57258.
          <year>2023</year>
          .
          <volume>10078629</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Anubhav</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Fujiwara</surname>
          </string-name>
          ,
          <article-title>Across trials vs subjects vs contexts: A multi-reservoir computing approach for eeg variations in emotion recognition</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on Multimodal Interaction</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2024</year>
          , pp.
          <fpage>518</fpage>
          -
          <lpage>525</lpage>
          . URL: https://doi.org/10. 1145/3678957.3685730. doi:
          <volume>10</volume>
          .1145/3678957.3685730.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Atkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>Improving bci-based emotion recognition by combining eeg, peripheral physiological signals, and eye-related measures</article-title>
          ,
          <source>in: Proceedings of the International Conference on Physiological Computing Systems, SCITEPRESS</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>72</lpage>
          . doi:
          <volume>10</volume>
          .5220/ 0006022100610072.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Balan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rajagopal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Emotion recognition using EEG and peripheral physiological signals: A review</article-title>
          ,
          <source>in: 2019 11th International Conference on Advanced Computing (ICoAC)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>150</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICoAC48765.
          <year>2019</year>
          .
          <volume>246841</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gohumpu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <article-title>Emotion recognition with multi-modal peripheral physiological signals</article-title>
          ,
          <source>Frontiers in Computer Science</source>
          <volume>5</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3389/fcomp.
          <year>2023</year>
          .
          <volume>1264713</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. Cheng, Y. Zhang,
          <article-title>Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network, IEEE Access 9 (</article-title>
          <year>2021</year>
          )
          <fpage>7943</fpage>
          -
          <lpage>7951</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3049516</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sheng</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <surname>ST-SHAP</surname>
          </string-name>
          :
          <article-title>A hierarchical and explainable attention network for emotional EEG representation learning and decoding</article-title>
          ,
          <source>Journal of Neuroscience Methods</source>
          <volume>414</volume>
          (
          <year>2025</year>
          )
          <article-title>110317</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.jneumeth.
          <year>2024</year>
          .
          <volume>110317</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>ERTNet:</surname>
          </string-name>
          <article-title>An interpretable transformer-based framework for EEG emotion recognition</article-title>
          ,
          <source>Frontiers in Neuroscience</source>
          <volume>18</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .3389/ fnins.
          <year>2024</year>
          .
          <volume>1320645</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Emotion Recognition and Detection Methods: A Comprehensive Survey</article-title>
          ,
          <source>Journal of Artificial Intelligence and Systems</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>53</fpage>
          -
          <lpage>79</lpage>
          . doi:
          <volume>10</volume>
          .33969/ais.
          <year>2020</year>
          .
          <volume>21005</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A Review of Emotion Recognition Using Physiological Signals</article-title>
          ,
          <source>Sensors</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <year>2074</year>
          . doi:
          <volume>10</volume>
          .3390/s18072074.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ezaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ohzeki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Masuda</surname>
          </string-name>
          ,
          <article-title>Energy landscape analysis of neuroimaging data</article-title>
          ,
          <source>Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences</source>
          <volume>375</volume>
          (
          <year>2017</year>
          )
          <article-title>20160287</article-title>
          . doi:
          <volume>10</volume>
          .1098/rsta.
          <year>2016</year>
          .
          <volume>0287</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Masuda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Megumi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kanai</surname>
          </string-name>
          , G. Rees,
          <article-title>Energy landscape and dynamics of brain activity during human bistable perception</article-title>
          ,
          <source>Nature communications 5</source>
          (
          <year>2014</year>
          )
          <article-title>4765</article-title>
          . doi:
          <volume>10</volume>
          .1038/ ncomms5765.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Katsigiannis</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Ramzan, DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals From Wireless Low-cost Of-the-</article-title>
          <string-name>
            <surname>Shelf</surname>
            <given-names>Devices</given-names>
          </string-name>
          ,
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          <volume>22</volume>
          (
          <year>2018</year>
          )
          <fpage>98</fpage>
          -
          <lpage>107</lpage>
          . doi:
          <volume>10</volume>
          .1109/JBHI.
          <year>2017</year>
          .
          <volume>2688239</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Miranda-Correa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebe</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Patras</surname>
          </string-name>
          ,
          <article-title>AMIGOS: A Dataset for Afect, Personality and Mood Research on Individuals and Groups</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>479</fpage>
          -
          <lpage>493</lpage>
          . doi:
          <volume>10</volume>
          .1109/TAFFC.
          <year>2018</year>
          .
          <volume>2884461</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>