<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Micro-Expression Recognition Method Based on an Uncertainty -Aware Mixing Strategy and Multimodal Fusion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qian Gao</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weijia Feng</string-name>
          <email>weijiafeng@tjnu.edu.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jia Guo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiayi An</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaofeng Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuanxu Chen</string-name>
          <email>chenyuanxu641@pingan.com.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ping An Technology Shenzhen</institution>
          ,
          <addr-line>Rm1201, Bld B, Pingan IFC, Xinyuan South Rd, Chaoyang District, Beijing, 100027</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, Tianjin University of Technology</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tianjin Normal University</institution>
          ,
          <addr-line>No.393, Binshui West Road, Xiqing District, Tianjin , 300387</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Micro-expression recognition, a critical research direction in afective computing, holds significant value due to its wide-ranging applications in real-world scenarios such as interrogations, clinical diagnostics, and business negotiations. Currently, micro-expression datasets are limited in scale and challenging to annotate. The significant imbalance in sample sizes across diferent types of micro-expressions causes models to bias toward majority classes during training, while minority class samples receive insuficient attention. This results in overfitting and poor generalization performance in existing recognition methods. Furthermore, most methods rely solely on local information from micro-expression sequences, overlooking certain dynamic features, which adversely impacts recognition performance. To address potential overfitting issues, we propose a micro-expression recognition method based on uncertainty awareness and multimodal fusion. By integrating uncertainty estimation to weight mixed samples, our approach guides the multi-model to focus more on underperforming samples. Additionally, recognition eficiency is further enhanced by incorporating optical flow parameters from micro-expression images. Experimental validation demonstrates that our method achieves significant improvements across multiple key metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Micro-expression</kwd>
        <kwd>uncertainty</kwd>
        <kwd>Multi-model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Micro-expression recognition (MER) is a key area in afective computing, holding great significance
due to its wide applications in fields such as interrogation, clinical diagnosis, and business negotiations.
Micro-expressions are brief and subtle facial expressions that typically occur when individuals attempt
to conceal their true emotions. They last for a very short duration (usually no more than 0.5 seconds),
which makes MER an especially challenging task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In recent years, with the development of deep
learning, this field has made remarkable progress. However, current micro-expression datasets are
limited in size and dificult to annotate, making model training prone to overfitting and poor
generalization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For instance, datasets like CASME, CASME II, and SMIC contain a limited number of samples,
which constrains the performance of deep learning models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Additionally, some micro-expression
samples are dificult to classify due to indistinct features or high similarity with other classes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Therefore, developing more efective data augmentation methods and feature learning strategies to
improve recognition accuracy and robustness is a key research objective. For example, the MR-UAMF
method addresses class imbalance through uncertainty-aware mixing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and significantly improves
the recognition accuracy of minority classes. Meanwhile, other studies have introduced attention
mechanisms or improved neural network architectures [6] to enhance focus on critical features and
thus improve recognition performance.
      </p>
      <p>
        Existing micro-expression datasets are small and hard to annotate, causing overfitting and limited
model generalization [7]. Many current methods rely on handcrafted features like LBP and LBP-TOP.
While simple and efective, they struggle with the subtle dynamics of micro-expressions [ 6], limiting
their ability to leverage deep learning’s full potential [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Furthermore, existing approaches often fail to
adequately extract dynamic features, as they don’t make good use of temporal sequence information,
which negatively impacts recognition accuracy and robustness.
      </p>
      <p>
        Micro-expression recognition (MER) methods can be classified into three categories: The first method
is traditional feature-based method. These rely on handcrafted features such as LBP, LBP-TOP, Histogram
of Oriented Gradients (HOG), and optical flow. While these approaches are straightforward and eficient,
they struggle with capturing subtle and dynamic changes. For example, Li et al. [8] proposed an
LBP-TOPbased method that integrates temporal and spatial features but performs poorly with complex dynamic
changes. The second method is deep learning-based method. These mainly utilize Convolutional Neural
Networks (CNNs) and their variants like 3D CNNs to automatically learn features. These methods
perform well in feature learning but require large amounts of annotated data, making them prone to
overfitting due to limited dataset sizes. For instance, Zhang et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] introduced a 3D CNN method using
a multi-stream structure to capture spatiotemporal features but encountered overfitting on small datasets.
The last method is data augmentation methods. These include Generative Adversarial Networks (GANs)
and synthetic data generation techniques. While they expand datasets, issues such as authenticity
and potential data bias remain. For example, Wang et al. [9] used GANs to generate synthetic
microexpression data, efectively expanding the dataset, though the realism of the generated data still requires
improvement. Some studies have explored the use of attention mechanisms in MER. For instance, Hao
et al proposed a hierarchical spatiotemporal attention mechanism that can automatically focus on key
regions and time segments of micro-expressions, significantly improving accuracy. Despite progress,
research gaps remain. Dataset sizes are still small and annotations dificult, leading to overfitting and
limited generalization. Additionally, dynamic features are underutilized, and data imbalance remains a
challenge. MR-UAMF, for example, addresses imbalance through uncertainty-aware mixing, improving
minority class accuracy [
        <xref ref-type="bibr" rid="ref1 ref2 ref4">1, 2, 4</xref>
        ].
      </p>
      <p>To address the aforementioned challenges, we propose a micro-expression recognition method based
on uncertainty awareness and multimodal fusion(MR-UAMF). This approach weights mixed samples
based on their uncertainty, encouraging the model to focus more on samples with lower performance,
thereby mitigating overfitting. A multimodal model is used to process optical flow and image features
separately, fully leveraging the dynamic and spatial information of micro-expressions to enhance
accuracy and robustness. By integrating uncertainty quantification, micro-attention mechanisms,
and 3D CNN, this approach ofers a novel perspective and method for micro-expression recognition,
advancing the field. Extensive experiments across multiple datasets validate the efectiveness and
superiority of our approach.</p>
      <p>The remainder of this paper is structured as follows: Section 2 describes the proposed framework
of MR-UAMF, including the overall structure, core algorithms, and implementation details. Section 3
descriptions, and evaluation metrics. Section 4 interprets experimental results and discusses academic
implications and potential applications. Section 5 summarizes our contributions and outlines future
research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Overall Framework</title>
        <p>Our proposed framework integrates optical flow feature extraction, an focused uncertainty-aware
mixing strategy(FU-MIX), a micro-attention mechanism, and a shallow triple-stream 3D CNN . The
goal is to efectively capture both spatial and temporal features of micro-expressions while addressing
data imbalance and overfitting.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Optical Flow Feature Extraction</title>
        <p>Given the brief and subtle nature of micro-expressions, we first extract optical flow features from the
video sequence to obtain motion information.</p>
        <p>We compute optical flow-guided features using the onset frame and the apex frame. The optical flow
ifeld between the two frames is represented as a tuple:</p>
        <p>= {((, ), (, )) |  = 1, 2, . . . , ;  = 1, . . . ,  }
where  and  denote the width and height of the frame, and (, ) and (, ) are the horizontal
and vertical components of , respectively.</p>
        <p>We also compute optical strain to approximate the intensity of facial deformation:
The magnitude of the optical strain is:</p>
        <p>1
 = 2 ︀[ ∇ + (∇) ]︀ ,</p>
        <p>with  = [, ]
|,| =
︂(  )︂ 2 (︂  )︂ 2
+</p>
        <p>+


1 (︂ 
2 
+
 )︂ 2

Appending optical strain to the optical flow field</p>
        <p>, we form the triplet:
Θ =</p>
        <p>{, , } ∈ R3
Each video thus yields three types of optical flow-based representations: horizontal component ,
vertical component , and optical strain .</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. FU-MIX: Foucs Uncertainty-Aware Mixing</title>
        <p>Uncertainty Estimation To enhance robustness and generalization, we adopt the MR-UAMF strategy,
which estimates sample uncertainty using Bayesian sampling from the model’s posterior ( ; ). For a
given sample , its uncertainty  is defined as:
 =
∫︁</p>
        <p>(, ˆ  ())( ; )
where  (, ˆ  ()) indicates correct classification. We approximate this using Monte Carlo sampling:
 ≈ 1 ∑=︁1  (, ˆ   ())
where   is sampled by minimizing expected risk. In practice, historical training trajectory information
is used for approximation.</p>
        <p>Weighted Mixed Sample Generation
uncertainty:</p>
        <p>Each sample is assigned a weight  proportional to its
 =   + 
where  is a hyperparameter and  is a small constant to ensure  &gt; 0. Mixed samples are generated
as:
, =   + (1 −  ) ,
, =   + (1 −  )
where  ∼ Beta(,  ). The weighted loss function is:</p>
        <p>E(,),(,) [ℓ (,  , , ) +  (1 −  )ℓ(,  , ,  )]</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Micro-Attention Mechanism</title>
        <p>We adopt a parameter-eficient residual architecture with self-learned multi-scale features to compute
attention maps. Given input  ∈ R× ×  , three convolutional layers (1× 1, 3× 3, and 5× 5) produce
feature maps {1, 2, 3}.</p>
        <p>These are concatenated to form ′ ∈ R(1+2+3)× ×  , and the average feature map is generated
by:
where * is a 1× 1 convolution kernel. The residual output is:
The final output with attention is:
 () =
1 ∑︁′
′ =1</p>
        <p>(′ * * )
 () = 1 + 3
() =  () ·  ( ())
where  is a normalization function. If  () ≈ 0, the attention influence is minimized.
2.5. 3D CNN
We design a shallow triple-stream 3D CNN to learn from the optical flow cube Θ . The input is resampled
to 28 × 28 × 3. Each stream includes:
• One 3D convolutional layer with kernel counts of 3, 5, and 8, respectively;
• One max-pooling layer.</p>
        <p>Outputs from the three streams are concatenated along the channel axis, followed by a 2 × 2 average
pooling layer. A fully connected layer with 400 nodes abstracts the features, and a softmax layer
classifies the output into three compound emotion categories.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.6. Implementation Details</title>
        <p>We use the CASME1, CASME2, SAMM, and SMIC datasets for training and evaluation, splitting each
dataset into 80% training and 20% testing. MR-UAMF is employed as a data augmentation technique,
assigning greater weights to underperforming samples based on uncertainty estimates. This mitigates
overfitting and enhances recognition of minority expression categories.</p>
        <p>The 3D CNN extracts optical flow features through three parallel convolutional streams, and the
microattention mechanism highlights important regions via adaptive residual weighting. These modules
work synergistically to improve feature discriminability, thereby enhancing the model’s performance
and robustness in micro-expression recognition tasks.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Datasets and Evaluation Metrics</title>
        <p>The datasets used in this paper are as follows. The first dataset is SMIC. It’s the earliest spontaneous
micro-expression dataset, featuring recordings from three camera types: High-Speed (HS) at 100 fps,
and Visual (VIS) and Near-Infrared (NIR) at 25 fps. This study uses only HS camera data, with 164
samples from 16 participants across three classes: Negative, Positive, and Surprise.</p>
        <p>The second dataset is CASME. It was collected in a controlled lab at 60 fps and has 195 samples from
19 subjects. However, it has class imbalance. We use 154 samples across four classes: Disgust, Repression,
Surprise, and Tense.</p>
        <p>The third dataset is CASME II. It enhances CASME with higher resolution (200 fps, 280 × 340 pixels),
featuring 248 samples from 26 subjects across five classes: Disgust, Happiness, Repression, Surprise, and
Others.</p>
        <p>The last dataset is SAMM.The SAMM dataset, gathered in a well-lit, stable setting with a grayscale
camera at 200 fps and a 2040 × 1088 resolution, comprises 159 samples from 32 diverse participants.
After excluding classes with under 10 samples, like Fear and Sadness, we utilize 134 samples spanning
ifve classes: Anger, Contempt, Happiness, Surprise, and Others.</p>
        <p>To comprehensively evaluate the model’s performance, we adopt common classification metrics
including Accuracy, Precision, Recall, and F1-score. Given the class imbalance inherent in
microexpression datasets, we also report two additional metrics: Unweighted F1-score (UF1): the average of
class-wise F1-scores. Unweighted Average Recall (UAR): the average of class-wise recall rates. These
metrics ofer a more balanced view of model performance, especially in imbalanced settings.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Comparative Experiments</title>
        <p>This section compares the proposed method with existing micro-expression recognition (MER) methods
across four widely used datasets: SMIC, CASME, CASME II, and SAMM. The compared methods
are categorized as follows: LBP-TOP(Local Binary Patterns from Three Orthogonal Planes), 3DHOG
(Three-Dimensional Histogram of Oriented Gradients), HOOF (Histogram of Oriented Optical Flow),
OFF-ApexNet, STSTNet, Dual-Inception, MACNN, Micro-Attention, Mini-AORCNN.</p>
        <p>All methods were evaluated under identical settings to ensure fairness: the same number of samples,
class labels, and K-fold cross-validation protocols were used.</p>
        <p>
          We reproduced results for LBP-TOP, 3DHOG, and HOOF under consistent experimental conditions.
We also reproduced the performance of recent deep learning-based MER methods. All classifiers
were implemented using Support Vector Machines (SVM). Our method outperforms these handcrafted
baselines significantly, demonstrating the advantages of deep learning and uncertainty-aware modeling
in handling the subtle and dynamic nature of micro-expressions [
          <xref ref-type="bibr" rid="ref1 ref2 ref4">1, 2, 4</xref>
          ]. These results demonstrate
that our focused uncertainty-aware method consistently outperforms or matches the state-of-the-art
across all datasets. The simplicity of our design enables robust and discriminative learning even under
limited data scenarios—a significant advantage for micro-expression datasets, which are typically small
in size. Moreover, uncertainty modeling improves generalization by enhancing the model’s ability to
handle non-linearities and minority class variations in real-world applications.
        </p>
        <p>Accuracy
0.4728
0.4789
0.4976</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Ablation Studies</title>
        <p>To evaluate the contribution of each component in our proposed micro-expression recognition (MER)
framework, we conducted a series of ablation experiments.</p>
        <p>The first is baseline model.The base model without any enhancements, captures only fundamental
features and struggles with the complex variations inherent in micro-expressions. It achieves an
accuracy of only 69%, highlighting its limited capability in handling subtle emotional cues.</p>
        <p>The second dataset is with uncertainty-aware mixing MR-UAMF. Integrating the focus
uncertaintyaware mixing strategy improves the model’s adaptability to complex expression dynamics. The accuracy
increases by 10%, demonstrating the efectiveness of uncertainty-awareness in enhancing model
robustness and mitigating overfitting.</p>
        <p>The third dataset is with multimodal feature fusion. When the multimodal fusion module is
added—combining both optical flow and spatial image features—the accuracy further improves by
5%, achieving the best performance. This indicates that multimodal fusion enriches feature
representations by integrating both temporal and spatial cues.</p>
        <p>The last dataset is With Additional Modules. Gradually incorporating auxiliary components such as
the micro-attention mechanism and the shallow triple-stream 3D CNN architecture leads to incremental
gains in accuracy. These modules help the model focus on salient spatiotemporal features and extract
multi-scale representations efectively.</p>
        <p>The ablation study clearly shows that: Each module contributes positively to the final performance;
The MR-UAMF strategy plays a central role in improving both accuracy and generalization; Multimodal
fusion and attention mechanisms further boost model performance by enhancing feature expressiveness.
These results validate the rationality and synergy of the components in our framework.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>By examining the table data, it’s evident that the proposed method with MR-UAMF excels on the
SMIC dataset, achieving significantly higher metrics than other deep learning methods. On the other
three datasets, its performance is comparable to those methods. Notably, the optimization algorithm
demonstrates exceptional performance overall. Specicfially, the proposed method outperforms other
deep learning approaches across all datasets. It achieves an accuracy of 0.8497 on SMIC and the highest
accuracy of 0.8388 on SAMM. Table 2 shows it attains the highest accuracy of 0.8117 on CASME. On
CASME II, as per Table 3, the accuracy reaches 0.8359. In summary, compared to state-of-the-art
methods, the proposed MR-UAMF performs on par with existing algorithms in most cases, validating
its efectiveness for micro-expression recognition. Thus, the MR-UAMF is better suited for nonlinear
problems like micro-expression image recognition, enhancing the model’s nonlinear fitting ability and
ensuring superior real-world performance.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper proposes a micro-expression recognition method based on uncertainty awareness and
multimodal fusion (MR-UAMF), aiming to address key challenges in the field of micro-expression recognition ,
including limited dataset sizes, class imbalance , overfitting , and insuficient utilization of dynamic
features . By introducing a focused uncertainty -aware mixing strategy (FU-MIX), our method weights
samples based on their uncertainty , guiding the model to focus more on underperforming samples ,
thereby efectively mitigating overfitting and enhancing recognition performance for minority classes.
Furthermore , by integrating optical flow parameters and spatial features from micro-expression images,
our approach fully leverages both dynamic and static information , further improving recognition
accuracy and model robustness . The synergistic efect of the micro-attention mechanism and a shallow
triple-stream 3D convolutional neural network enables the model to eficiently extract multi-scale
spatiotemporal features, achieving superior performance on complex micro-expression data. Experimental
results demonstrate that MR-UAMF achieves significant performance improvements across four widely
used micro-expression datasets: SMIC, CASME, CASME II, and SAMM. Ablation studies further validate
the contributions of each component , with the uncertainty -aware mixing strategy and multimodal
feature fusion contributing to accuracy improvements of 10% and 5%, respectively , confirming the
positive synergistic impact of these components on overall performance.</p>
      <p>Moving forward, we plan to explore micro-expression recognition in video data, investigating the
impact of temporal sequence information on recognition performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This study was funded by the NSFC (Grant No . 61602345 , 62002263 );National Key Research and
Development (Grant No. 2019YFB2101900); TianKai Higher Education Innovation ParkEnterprise RD
Special Project (Grant No. 23YFZXYC 00046);Tianjin Science and Technology Program Projects (Grant
No. 24YDTPJC00630).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration of Generative AI</title>
      <p>During the preparation of this work, the author utilized KIMI and Deepseek for grammar and
spellchecking , as well as for text translation and paraphrasing (including translation and rewriting ).
After using these tools , the author reviewed and edited the content as needed and takes full
responsibility for the publication's content.
[6] B. Song, K. Li, Y. Zong, J. Zhu, W. Zheng, Recognizing spontaneous micro-expression using a three
-stream convolutional neural network, IEEE Access 7 (2019) 184537–184551.
[7] B. Xia , W . Wang , S. Wang , E. Chen , Learning from macro -expression : A micro -expression
recognition framework , Proceedings of the 28th ACM International Conference on Multimedia (
2020) 2936–2944.
[8] Y. Liu , et al., A main directional mean optical flow feature for spontaneous micro -expression
recognition, IEEE Transactions on Afective Computing 7 (2015) 299–310.
[9] M. Peng, Z. Wu, Z. Zhang, T. Chen, From macro to micro expression recognition : Deep learning on
small datasets using transfer learning, 2018 13th IEEE International Conference on Automatic Face
&amp; Gesture Recognition (FG 2018) (2018) 657–661.
[10] G. Zhao , M. Pietikainen , Dynamic texture recognition using local binary patterns with an applica
tion to facial expressions , IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (
2007) 915–928.
[11] S. Polikovsky, Y. Kameda, Y. Ohta, Facial micro expressions recognition using high speed camera
and 3d-gradient descriptor, in: IET Conference, 2009, p. 5.
[12] Y.-J. Liu, J.-K. Zhang , W.-J. Yan, S.-J. Wang , G. Zhao , X. Fu, A main directional mean optical flow
feature for spontaneous micro-expression recognition, IEEE Transactions on Afective Computing 7 (
2016) 299–310. doi:10.1109/TAFFC.2015.2485205.
[13] Y. S. Gan , S.-T. Liong , W .-C. Yau , Y.-C. Huang , L.-K. Tan , Of -apexnet on micro -expression
recognition system, Signal Processing : Image Communication 74 (2019) 129–139. doi:10.1016/j.
image.2019.02.005.
[14] S.-T. Liong, Y. S. Gan, J. See, H.-Q. Khor, Y.-C. Huang, Shallow triple stream three-dimensional cnn
(ststnet ) for micro -expression recognition , in: 2019 14th IEEE International Conference on
Automatic Face &amp; Gesture Recognition (FG 2019), IEEE, 2019, pp. 1–5.
[15] L. Zhou, Q. Mao, L. Xue, Dual-inception network for cross-database micro-expression recognition, in:
2019 14th IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG 2019), IEEE,
2019, pp. 1–5.
[16] Z. Lai, R. Chen , J. Jia, Y. Qian , Real -time micro expression recognition based on resnet and atrous
convolutions, Journal of Ambient Intelligence and Humanized Computing (2020). doi:10.1007/ s
12652-020-01779-5.
[17] C. Wang, M. Peng, T. Bi, T. Chen, Micro-attention for micro-expression recognition,
Neurocomputing 410 (2020) 354–362. doi:10.1016/j.neucom.2020.06.005.
[18] L. Feng, Z. Jiahao, Q. Jiayin, Lightweight micro-expression recognition architecture based on
bottleneck transformer, Computer Science 49 (2022) 370–377. doi:10.11896/jsjkx.210500023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kauttonen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . Zhao,
          <article-title>Deep learning for micro-expression recognition: A survey</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>2028</fpage>
          -
          <lpage>2046</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <article-title>A review of research on micro-expression recognition algorithms based on deep learning</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>17787</fpage>
          -
          <lpage>17828</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Luo,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sankaranarayana</surname>
          </string-name>
          ,
          <article-title>Htnet for micro-expression recognition</article-title>
          , Neuro - computing
          <volume>602</volume>
          (
          <year>2024</year>
          )
          <fpage>128196</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Arandjelović</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zhao , Short and long range relation based spatio -temporal transformer for micro-expression recognition</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>1973</fpage>
          -
          <lpage>1985</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Umix : Improving importance weighting for subpopulation shift via uncertainty-aware mixup</article-title>
          ,
          <source>arXiv preprint arXiv:2209.08928</source>
          (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2209. 08928.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>