<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pain 139 (2008) 267-274. URL: https://journals.lww.com/
00006396</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-030-86608-2_13</article-id>
      <title-group>
        <article-title>GraphAU-Pain: Graph-based Action Unit Representation for Pain Intensity Estimation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhiyu Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <email>yang.liu@oulu.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hatice Gunes</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Machine Vision and Signal Analysis, University of Oulu</institution>
          ,
          <addr-line>Oulu</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Cambridge</institution>
          ,
          <addr-line>Cambridge</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>2021</volume>
      <fpage>112</fpage>
      <lpage>119</lpage>
      <abstract>
        <p>Understanding pain-related facial behaviors is essential for digital healthcare in terms of efective monitoring, assisted diagnostics, and treatment planning, particularly for patients unable to communicate verbally. Existing data-driven methods of detecting pain from facial expressions are limited due to interpretability and severity quantification. To this end, we propose GraphAU-Pain, leveraging a graph-based framework to model facial Action Units (AUs) and their interrelationships for pain intensity estimation. AUs are represented as graph nodes, with co-occurrence relationships as edges, enabling a more expressive depiction of pain-related facial behaviors. By utilizing a relational graph neural network, our framework ofers improved interpretability and significant performance gains. Experiments conducted on the publicly available UNBC dataset demonstrate the efectiveness of the GraphAU-Pain, achieving an F1-score of 66.21% and accuracy of 87.61% in pain intensity estimation. The code is available for re-implementation at github.com/ZW471/GraphAU-Pain .</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Pain Intensity Estimation</kwd>
        <kwd>Facial Expression Analysis</kwd>
        <kwd>Graph Neural Networks</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Pain detection is critical in clinical and caregiving settings for timely assessment and improved patient
management. Current methods like self-reports and observational evaluations have notable
limitations [1]. Self-reports are subjective and depend on patient communication ability, often impaired in
nonverbal individuals, children, or those with cognitive impairments. Observational methods, while
more objective, require extensive training to ensure accuracy and consistency.</p>
      <p>Deep learning has driven interest in automated pain estimation via facial expression analysis,
bypassing advanced medical equipment while providing objective measures. Methods like CNNs [2, 3]
and hybrid frameworks [4, 5, 6] have been applied for pain estimation using facial features.
Transformers [7, 8] perform remarkably in pain prediction from facial videos. However, most rely solely
on image features, neglecting physiological insights, limiting clinical interpretability. Additionally,
undersampling to address dataset imbalance reduces generalizability in diverse populations [9, 10].</p>
      <p>Pain estimation can leverage features tied to facial expressions. Landmark-based methods like nose
tip or eye corner coordinates outperform pixel-based techniques but lack clear physiological links to
pain [11, 12]. The Facial Action Coding System (FACS) [13] maps facial movements into Action Units
(AUs) with intensity levels, aggregating into the Prkachin and Solomon Pain Intensity (PSPI) score [14].
The UNBC dataset [15] provides AU and PSPI-labeled videos, supporting AU-informed methods like
K-Nearest Neighbor [16] and Bayesian Networks [17], which report high accuracy but sufer from
overoptimism due to class imbalance and overlook AU relationships. Recent approaches like
GLACNN [18] and Multiple Instance Learning [19] have improved AU relation modeling. However, these
methods rely on pixel-wise AU relationships, struggling with subtle facial changes and generalization
across diverse racial groups, limiting real-world applicability.</p>
      <p>Recent advances in graph neural networks (GNNs), such as Multi-dimensional Edge Feature-based AU
Relation Graph for AU (ME-GraphAU) [20] and Graph Relation Network (GRU) [21], have demonstrated
promising AU prediction performance on the DISFA dataset [22] and BP4D dataset [23]. These methods
use CNN backbones like ResNet [24] or VGG [25] to learn image features for each AU, then construct
a relational graph based on the AU features. Each node in such graphs represents an AU, and each
edge represents the relationship between a pair of AUs [26]. Through message propagation in GNN
layers, the output AU features can capture individual AU information from neighbors as well as
structural information [27]. Such graph-based AU features can then be aggregated to build a full-face
representation informed by AUs and their relationships for downstream tasks [28], inspiring us to
incorporate graph-based AU detection into pain intensity estimation.</p>
      <p>Motivated by the above, we introduce GraphAU-Pain for accurate and interpretable pain intensity
estimation and summarize our three contributions as follows:
• GraphAU-Pain Model. We propose a novel graph-based framework that transforms AU
detection into pain estimation by modeling AU relationships as a dynamic graph structure, enabling
more expressive and interpretable pain assessment compared to traditional image-driven
approaches.
• Cross-Dataset Transfer with Relabeling. We introduce a novel transfer learning strategy
that leverages DISFA-pretrained weights to address UNBC’s limited training data. By creating a
hybrid UNBC+ dataset that combines original annotations with predicted labels for missing AUs,
we enable efective knowledge transfer while preserving dataset integrity, significantly improving
model performance in AU occurrence prediction.
• High Performance &amp; Interpretability. Comprehensive experiments validate GraphAU-Pain
for pain estimation, outperforming GLA-CNN [18] (current state-of-the-art work in this area)
and demonstrating improved interpretability via explicit AU modeling.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Facial Action Unit Detection</title>
        <p>FACS [13] categorizes facial expressions into 44 fundamental components known as AUs, each
corresponding to specific muscle movements with intensity ratings from 0 to 5. For instance, AU1 represents
the “Inner Brow Raiser” and AU2 the “Outer Brow Raiser.” These AUs provide a systematic method for
analyzing facial expressions, including pain and other afective states. Early approaches to AU detection
relied on traditional machine learning methods, with OpenFace [29] being a widely used open-source
tool based on SVMs [30]. Deep learning methods like Deep Region and Multi-label Learning [31]
established early benchmarks but failed to model AU interdependencies. Later attention-based
methods [32, 33] improved performance significantly, though their pixel-wise approach faced limitations in
capturing subtle facial changes and generalizing across diverse racial groups.</p>
        <p>Graph Neural Networks (GNNs) have emerged as a powerful architecture for modeling complex
dependencies between facial landmarks or AUs. A GNN layer passes information between nodes
through edges, allowing nodes to learn their neighborhood information. This makes GNNs essential for
learning relational representations, as demonstrated in fields like knowledge graph construction [ 34]
and recommender systems [35]. In facial AU analysis, some studies introduced prior AU co-occurrence
knowledge via Graph Convolutional Networks [36, 37], while others like Graph Relation Network
(GRN) [21] explored knowledge-free approaches. GRN constructs a fully connected directed graph
with image features as nodes and uses attention-based edge functions, achieving MAE of 0.7 on
BP4D and 0.2 on DISFA. Unlike GRN, which fully connects all nodes, ME-GraphAU [20] introduces
a novel approach by using a CNN backbone to extract features for each node that represents an AU
and establishing connections between nodes based on feature similarity. This architecture efectively
captures facial feature relationships by modeling AUs as nodes and their interactions as edges, achieving
strong performance with F1 scores of 65.5% on BP4D and 63.1% on DISFA. Despite these advances,
direct adaptation of AU detection models to pain estimation tasks remains challenging due to two key
limitations. First, pain estimation datasets are typically much smaller than those used for AU detection.
Second, these datasets exhibit a significant class imbalance in AU occurrences. Our GraphAU-Pain
framework addresses both challenges through a novel transfer learning strategy, as detailed in Sec. 3.4.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Pain Prediction Based on Facial Expressions</title>
        <p>Early approaches relied on feature extraction and classification techniques like KNN [ 16] and Random
Forest [38]. These handcrafted feature approaches were limited by variations in head pose, lighting,
and spontaneous expressions.</p>
        <p>Recent deep learning approaches have evolved from traditional CNNs [39, 40] to more sophisticated
architectures that incorporate AU features. While CNNs provide a foundation through pixel-level
analysis, their lack of physiological knowledge limits both interpretability and generalizability. This
limitation has driven the development of more advanced approaches, such as LSTM-based continuous
pain monitoring [41]. However, this promising work faces practical constraints due to its reliance on a
private dataset [42]. Similarly, while AU-based pain prediction has shown potential on BP4D [43], its
binary classification approach may not capture the nuanced spectrum of natural pain experiences.</p>
        <p>The state-of-the-art GLA-CNN [18] represents a significant advancement by combining CNNs with
attention mechanisms to analyze facial pain and AU relationships. On the UNBC dataset, it achieves
36.2% F1-score and 56.5% accuracy. While these metrics show improvement, they remain insuficient
for clinical applications. The model’s fine-grained category scheme for pain levels fails to account for
PSPI’s sensitivity to AU intensity. This limitation becomes apparent when considering cases such as
intense AU4 (brow lowering) can occur without pain [44], leading to confusion between its Weak Pain
(   = 0), Weak Pain (   = 1), and Mild Pain (   = 2) categories. Furthermore, the
black-box nature of the design limits clinical utility by obscuring how the estimated pain intensity is
derived. These challenges highlight the need for more reliable and interpretable pain detection models,
motivating our development of GraphAU-Pain. Our approach explicitly incorporates AUs through a
graph-based modeling framework, providing enhanced interpretability through transparent AU-pain
intensity relationships and improved accuracy via comprehensive modeling of AU interdependencies.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Material and Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Implementation Overview</title>
        <p>The GraphAU-Pain model is designed to estimate pain intensity based on facial image data, and its
training involves two key steps. First, the full-face and AU representation learning modules are trained
for AU occurrence prediction. These modules are adapted from the AU Relationship-aware Node Feature
Learning (ANFL) component of ME-GraphAU [20]. Because directly training ANFL for AU occurrence
prediction on the UNBC dataset only yielded a 20% average F1-score, we developed a cross-dataset
transfer learning strategy to improve performance. This strategy transfers AU prediction capabilities
from the DISFA dataset model trained by Luo et al. [20] to the UNBC dataset. To address diferences in
AU labels between datasets, we created a relabeled UNBC+ dataset to align the AU annotations and
used undersampling to address AU imbalance. Second, with the weights trained for AU prediction,
GraphAU-Pain is then trained for pain intensity estimation with the full UNBC dataset. Facial images
are first processed through the CNN backbone to extract pixel-wise full-face representations. These
representations are then transformed into graph-based structures to learn AU-specific features. Finally,
the full-face and AU-based features are combined to estimate pain intensity. The performance of the
model is evaluated using two metrics: average F1-score and accuracy.</p>
        <p>As illustrated in Fig. 1, the GraphAU-Pain model comprises three sequential modules: Full-face
Representation Learning that extracts high-level facial features through pixel-based global representations,
AU Representation Learning that captures both local and global AU using graph-based representations,
and a Pain Intensity Classifier that maps the learned features to specific pain intensity levels.
3.2. Model
to a position in the image.</p>
        <sec id="sec-3-1-1">
          <title>Full-face Representation Learning</title>
          <p>GraphAU-Pain uses a ResNet-50 backbone for extracting a
full-face representation from an input image. By inputting a face image x ∈ R172× 172× 3 to the backbone,
we obtain hb ∈ R36× 2048, which represents 36 facial image features of length 2048, each corresponding</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>AU Representation Learning</title>
          <p>After acquiring the backbone feature hb, the AU representation
module learns two distinct embeddings: ha ∈ RAU (AU = 512) encoding individual AU occurrences,
and hg ∈ RAU encoding the complete AU relational graph structure. The module first transforms
hb through AU fully connected layers to generate initial AU representations Ha ∈ RAU× AU , where
each row represents one AU. These representations serve as node features in a graph where each node
connects to its  = 3 most similar nodes based on dot-product similarity. The graph structure is then
processed through a graph convolutional layer:</p>
          <p>H′a = ReLU︁( Ha + BN(︀ A⊤FC1(Ha) + FC2(Ha))︀
︁)
∈ RAU× AU ,
where BN denotes batch normalization, FC represents fully connected layers, and A is the normalized
adjacency matrix. The graph representation hg ∈ RAU is obtained through global sum pooling of the
node embeddings. Although we could also add an edge update module here [20], we omitted it to avoid
overfitting on the UNBC dataset. For each AU, its occurrence probability  is computed as the cosine
similarity between its representation ℎ, and a learnable vector :
 =</p>
          <p>
            ReLU(ℎ,)⊤ReLU()
‖ReLU(ℎ,)‖2‖ReLU()‖2
where ‖ · ‖ represents the L2 norm. This AU occurrence prediction can serve as either a pretraining task
or an auxiliary training objective alongside the primary pain intensity estimation task. In this work, we
use it as a pretraining task on an undersampled dataset to enforce the AU representation module to
focus on minority classes. After obtaining these three features, three fully connected layers with ReLU
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
hab = ReLU(h′a⊤ hb′) ∈ R36.
y^ =
          </p>
          <p>
            W [hab ‖ h′g] + b ∈ Rpain ,
Representation Classifier The final pain intensity classification is get by concatenating the
interaction scalar hab with the hg′ feature, then passing the result to an FC layer:
where pain = 3 for the one-hot encoding of the three-level pain intensity classification used in this
work.
activation map each of them to a common dimension of 36, producing h′b, h′a, and h′g. (For hb, FC is
applied row-wise.) Finally, a feature-infusing step on ha′ and hb′ is performed:
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            )
(
            <xref ref-type="bibr" rid="ref5">5</xref>
            )
(
            <xref ref-type="bibr" rid="ref6">6</xref>
            )
 
1 ∑︁ ∑︁  , log max(︀ , ,  ︀) ,
ℒ = −
          </p>
          <p>=1 =1
where  is the number of samples,  is the number of classes, , are the softmax output probabilities
for the -th sample belonging to the -th class, , is a binary indicator (1 if the -th sample belongs to
the -th class, otherwise 0),  are class weights, and  = 1− 8 prevents log(0). The weight of a class
 is calculated by</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Loss Function</title>
        <p>With over 80% of the frames demonstrating no expression of pain, the class imbalance in the UNBC
dataset poses a significant challenge for deep learning models. To minimize this, we employ a weighted
cross-entropy loss to prioritize underrepresented classes. It is calculated by</p>
        <p>1/occurrence_rate( )
 =  · ∑︀
=1 1/occurrence_rate()
.</p>
        <p>In this work, the class weights are 0.07 for No Pain, 0.33 for Mild Pain, and 2.6 for Obvious Pain.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Transfer Learning</title>
        <p>Our preliminary experimental results showed that directly training GraphAU-Pain on UNBC yielded
unsatisfactory results, with 30–40% F1-score and 60–80% accuracy after trying several settings. To
tackle this problem and improve AU prediction performance through transfer learning, in this work
we initialized the weights of the full-face and AU representation modules with the weights pretrained
on DISFA provided by Luo et al. [20]. We chose the DISFA dataset [22] for pretraining because it is
three times larger than UNBC and provides high-quality AU annotations that align well with UNBC’s
AU annotation scheme—sharing six out of eight AUs with UNBC and three of them are used in PSPI
calculation—making this dataset suitable for transfer learning. However, since DISFA includes two
additional AUs (AU1 and AU2) not present in UNBC’s original annotations, for fine-tuning the pretrained
weights, we need to label these additional AUs for UNBC to ensure complete AU coverage. To do this,
we pass UNBC’s facial images through the pretrained representation learning modules to predict all
eight AU labels. We then create a hybrid dataset (UNBC+) by keeping UNBC’s original annotations for
the six overlapping AUs while using the predicted values for AU1 and AU2. This relabeling process
ensures that our model can learn from a complete set of AU activations while maintaining the reliability
of UNBC’s original annotations where available.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets and Labels</title>
        <p>
          GraphAU-Pain was trained on the UNBC-McMaster Shoulder Pain Expression Archive Database [15].
The dataset contains 48,398 colored frames from 25 participants with shoulder problems, showing
facial expressions during pain-inducing actions. The faces were detected with OpenCV’s haarcascade
frontalface default classifier and cropped to 172× 172. Each frame has 10 AU intensities (
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">0− 5</xref>
          ) and
PSPI scores (
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">0− 16</xref>
          ), calculated as    =  4+max( 6,  7)+max( 9,  10)+ 43 [14].
PSPI pain intensity is categorized into ordinal levels: No Pain (   = 0), Mild Pain (   ∈
[1, 4]), and Obvious Pain (   ≥ 5). The categories are distributed in a skewed way, with
each category respectively consisting approximately 82%, 15%, and 3% of the UNBC dataset. This
categorization is more clinically meaningful and interpretable compared to the method used by Wu et
al. [18], as it mitigates the high sensitivity of PSPI scores to AU variations. Moreover, we replaced AU
intensity with occurrence by capping the AU score at one.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Training and Evaluation Details</title>
        <p>To prepare AU occurrence prediction before pain intensity estimation training, the supervised
finetuning (SFT) process of GraphAU-Pain’s AU representation learning module was performed on the
ANFL component with weights pretrained on DISFA for 20 epochs provided by Luo et al. [20]. Our
relabeled UNBC+ dataset was used in the SFT process and undersampled to address data imbalance.
The undersampling process involved randomly removing approximately 90% of facial images with
   = 0 and excluding facial images without active AUs. The full list of frames included in this
subset is made available in the code repository. We use all AU = 8 AU labels in the UNBC+ dataset
as listed in Table 1. We trained the module through SFT for 17 epochs with a learning rate of 1− 5,
a batch size of 16, and an Adam optimizer with  1 = 0.9,  2 = 0.999, and a weight decay of 5− 4.
The AU representation learning module achieved remarkable results in AU detection, compared to
state-of-the-art results, as detailed in Table 1. The training for pain intensity estimation used the same
hyperparameters as the SFT process but was performed on the full original UNBC dataset.</p>
        <p>The GraphAU-Pain model was trained using a learning rate of 1− 4, a batch size of 64, and an
Adam optimizer with the same hyperparameters used in SFT. The representation learning module
is set to connect edges between an AU node and its 3 most similar AU nodes. The weights learned
through SFT on ANFL were used to initialize the ResNet backbone and the representation learning
module of GraphAU-Pain. The model was trained on the full UNBC dataset for 8 epochs on an NVIDIA
GeForce RTX 4070 GPU (8 GB) and an Intel i7-13900H CPU, with an estimated training time of about 3
minutes per epoch. The evaluation uses the same metrics as the state-of-the-art method GLA-CNN [18]:
accuracy, average F1, average recall, and average precision, where all average values are unweighted.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Model Performance</title>
        <p>Overall, GraphAU-Pain achieves a commendable average F1-score of 66.21% and a high accuracy of
87.61%. Table 2 details the per-class results, showing strong performance for the No Pain category,
which has an F1-score of 93.10%. However, performance declines for the Mild and Obvious categories,
reflecting F1-scores of 51.19% and 54.35%, respectively. This performance gap between No Pain and
the Mild/Obvious categories is largely attributable to the dataset’s pronounced class imbalance. The
log-scaled confusion matrix in Figure 2 further illustrates the distribution of predictions. While No
Pain dominates the diagonal, indicating high accuracy there, some of-diagonal misclassifications occur
between Mild and Obvious, and there is a noticeable bias toward predicting No Pain. Consequently,
while the model is robust in detecting No Pain, additional strategies are needed to better distinguish
between higher pain intensities.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Comparison to SOTA</title>
        <p>To the best of our knowledge, GLA-CNN [18] is the only other method that uses AUs for pain-intensity
estimation on UNBC while focusing on cross-sectional facial image frames. No additional methods
apply exactly the same categorization scheme, so we compare GraphAU-Pain with GLA-CNN and other
models reported in [18]. To align labels, we reclassify pain intensities into four categories: No Pain
(   = 0), Weak Pain (   = 1), Mild Pain (   = 2), and Strong Pain (   ≥ 3). By
altering only the model’s final layer from three to four outputs and keeping other settings unchanged,
GraphAU-Pain shows substantial gains in both accuracy and average F1-score, as shown in Table 3
and Figure 2b. Note that GLA-CNN and the other compared models were trained and evaluated on an
undersampled subset of UNBC [18] to deal with class imbalance, whereas GraphAU-Pain is trained on
the full dataset. Therefore, their published performance might be higher than what would have been</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Ablation Analysis</title>
        <p>The ablation analysis in Table 4 underscores the critical role of graph representation and GNN in the
GraphAU-Pain model. Removing the graph representation (w/o graph rep.) reduces the mean F1-score
from 66.2% to 63.1%, mainly due to the performance drops in No Pain and Mild Pain, highlighting
the value of graph modeling for capturing AU relationships. Similarly, removing the GNN layer (w/o
GNN ) causes a significant drop to 40.3%, emphasizing the importance of graph-based interactions in
learning AU features. The simplest setup (Only ResNet), relying solely on CNNs, achieves the lowest
mean F1-score of 35.2%, demonstrating that CNNs alone fail to efectively model AU correlations
for pain estimation. These results afirm the superiority of graph-based learning methods for pain
estimation.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Discussion and Future Work</title>
        <p>GraphAU-Pain demonstrates significant potential for advancing automated pain assessment through
several key contributions. By leveraging graph-based learning to model AU relationships, our approach
achieves superior performance compared to existing methods, with an accuracy of 87.61% in the
clinically meaningful three-category classification system. The model’s strong performance in detecting
No Pain (93.10% F1-score) makes it particularly valuable for initial screening applications. Furthermore,
the AU representation learning module provides a more interpretable framework for understanding
how diferent facial expressions contribute to pain assessment. This could lead to more reliable and
explainable automated pain monitoring systems in clinical settings, potentially reducing the burden on
healthcare providers and improving patient care through continuous, objective pain assessment.</p>
        <p>While the proposed method shows promising results in pain estimation, there remains room for
improvement. Firstly, since PSPI only captures facial expressions, it may not reflect true subjective
pain [50]. Future research could focus on finding alternative pain indicators. Secondly, aligning UNBC
with DISFA through UNBC+ removes three pain-related AUs and adds noise through predicted labels,
potentially impacting performance. A promising direction is to design AU occurrence prediction models
specifically for pain-oriented datasets like UNBC. Lastly, while the AU occurrence-based representation
learning module provides satisfactory representation, AU intensity-based approaches (e.g., GRN [21])
could also be explored since AU intensity better relates to pain intensity. However, this direction may
require more complex models and additional training data to mitigate the impact of data imbalance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper presents GraphAU-Pain, a GNN-based model combining graph-based AU features with
full-face representation for pain prediction. It surpasses the state-of-the-art methods while enabling
AU-informed pain estimation for clinical transparency. GraphAU-Pain addresses challenges like
limited data and class imbalance in the UNBC dataset through a novel transfer learning strategy. Key
contributions include improved pain classification benchmarks, better interpretability through
AUbased representations, and critical baselines for future AU-based pain estimation. Overall, this work
demonstrates the potential of GNNs for accurate, clinically viable pain estimation solutions.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Y. Liu’s work was supported in part by the Finnish Cultural Foundation for North Ostrobothnia Regional
Fund under Grant 60231712, and in part by the Instrumentarium Foundation under Grant 240016.
Z. Wang was supported by the Churchill College Postgraduate Academic Travel Grant PAT0062 for
conference participation.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[41] E. Othman, P. Werner, F. Saxen, A. Al-Hamadi, S. Gruss, S. Walter, Classification networks for
continuous automatic pain intensity monitoring in video using facial expression on the X-ITE
Pain Database, Journal of Visual Communication and Image Representation 91 (2023). doi:10.
1016/j.jvcir.2022.103743.
[42] S. Gruss, M. Geiger, P. Werner, O. Wilhelm, H. C. Traue, A. Al-Hamadi, S. Walter, Multi-Modal
Signals for Analyzing Pain Responses to Thermal and Electrical Stimuli, Journal of Visualized
Experiments 2019 (2019). URL: https://app.jove.com/t/59057. doi:10.3791/59057.
[43] K. Feghoul, M. Bouazizi, D. Santana, D. Santana Maia, Facial Action Unit Detection using 3D Face</p>
      <p>Land-marks for Pain Detection, Technical Report, 2023. URL: https://hal.science/hal-04320516v1.
[44] P. Werner, A. Al-Hamadi, K. Limbrecht-Ecklundt, S. Walter, S. Gruss, H. C. Traue, Automatic Pain
Assessment with Facial Activity Descriptors, IEEE Transactions on Afective Computing 8 (2017).
doi:10.1109/TAFFC.2016.2537327.
[45] Z. Zhao, Q. Liu, S. Wang, Learning deep global multi-scale and local attention features for facial
expression recognition in the wild, IEEE Transactions on Image Processing 30 (2021) 6544–6556.
doi:10.1109/TIP.2021.3093397.
[46] Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, H. Hu, Video swin transformer, in: 2022
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 3192–3201.
doi:10.1109/CVPR52688.2022.00320.
[47] X. Xin, X. Lin, S. Yang, X. Zheng, Pain intensity estimation based on a spatial transformation
and attention cnn, PLOS ONE 15 (2020) 1–15. URL: https://doi.org/10.1371/journal.pone.0232412.
doi:10.1371/journal.pone.0232412.
[48] R. Yang, S. Tong, M. Bordallo, E. Boutellaa, J. Peng, X. Feng, A. Hadid, On pain assessment from
facial videos using spatio-temporal local descriptors, in: 2016 Sixth International Conference
on Image Processing Theory, Tools and Applications (IPTA), 2016, pp. 1–6. doi:10.1109/IPTA.
2016.7820930.
[49] S. Walter, S. Gruss, H. Ehleiter, J. Tan, H. C. Traue, P. Werner, A. Al-Hamadi, S. Crawcour, A. O.</p>
      <p>Andrade, G. Moreira da Silva, The biovid heat pain database data for the advancement and
systematic validation of an automated pain recognition system, in: 2013 IEEE International
Conference on Cybernetics (CYBCO), 2013, pp. 128–131. doi:10.1109/CYBConf.2013.6617456.
[50] G. D. De Sario, C. R. Haider, K. C. Maita, R. A. Torres-Guzman, O. S. Emam, F. R. Avila, J. P. Garcia,
S. Borna, C. J. McLeod, C. J. Bruce, R. E. Carter, A. J. Forte, Using AI to Detect Pain through Facial
Expressions: A Review, 2023. doi:10.3390/bioengineering10050548.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wollenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Weitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kunz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lautenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. U.</given-names>
            <surname>Garbas</surname>
          </string-name>
          , U. Schmid,
          <source>Automatic Detection of Pain from Facial Expressions: A Survey</source>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2019</year>
          .
          <volume>2958341</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Othman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Werner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Saxen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Hamadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gruss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walter</surname>
          </string-name>
          ,
          <article-title>Automatic vs</article-title>
          .
          <article-title>Human recognition of pain intensity from facial expression on the x-ite pain database</article-title>
          ,
          <source>Sensors</source>
          <volume>21</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          . 3390/s21093273.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          , G. Cucurull,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Gonfaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nasrollahi</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Moeslund</surname>
            ,
            <given-names>F. X.</given-names>
          </string-name>
          <string-name>
            <surname>Roca</surname>
          </string-name>
          , Deep Pain:
          <article-title>Exploiting Long Short-Term Memory Networks for Facial Expression Classification</article-title>
          ,
          <source>IEEE Transactions on Cybernetics</source>
          <volume>52</volume>
          (
          <year>2022</year>
          )
          <fpage>3314</fpage>
          -
          <lpage>3324</lpage>
          . URL: https://ieeexplore.ieee.org/document/ 7849133/. doi:
          <volume>10</volume>
          .1109/TCYB.
          <year>2017</year>
          .
          <volume>2662199</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>El Morabit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rivenq</surname>
          </string-name>
          , M.-E.-n. Zighem,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ouahabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taleb-Ahmed</surname>
          </string-name>
          ,
          <article-title>Automatic Pain Estimation from Facial Expressions: A Comparative Analysis Using Of-the-</article-title>
          <source>Shelf CNN Architectures, Electronics</source>
          <volume>10</volume>
          (
          <year>2021</year>
          )
          <year>1926</year>
          . doi:
          <volume>10</volume>
          .3390/electronics10161926.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zhao, Incorporating high-level and low-level cues for pain intensity estimation</article-title>
          ,
          <source>in: 2018 24th International Conference on Pattern Recognition (ICPR)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>3495</fpage>
          -
          <lpage>3500</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICPR.
          <year>2018</year>
          .
          <volume>8545244</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baygin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baygin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arunkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fujita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tuncer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.-S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Palmer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. B. Azizan</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <string-name>
            <surname>Kadri</surname>
            ,
            <given-names>U. R.</given-names>
          </string-name>
          <string-name>
            <surname>Acharya</surname>
          </string-name>
          ,
          <article-title>Automated detection of pain levels using deep feature extraction from shutter blinds-based dynamic-sized horizontal patches with facial images</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <article-title>17297</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41598-022-21380-4.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>