<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Cao);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>UP-Phys: Exploring the Efect of Prior Knowledge in Unsupervised Remote Photoplethysmography</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yan Jiang</string-name>
          <email>jiangyan@nuist.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mingyue Cao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hao Yu</string-name>
          <email>yuhao@nuist.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingyu Liu</string-name>
          <email>xingyu@nuist.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xu Cheng</string-name>
          <email>xcheng@nuist.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Remote Photoplethysmography, Unsupervised Learning, Prior Knowledge</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Nanjing University of Information Science and Technology</institution>
          ,
          <addr-line>219 Ningliu Road, Nanjing, Jiangsu, 210044</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2031</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Remote photoplethysmography (rPPG) is a non-contact method that estimates multiple physiological parameters according to facial videos. Although existing supervised rPPG methods have achieved remarkable performance, the success mainly benefits from massive and expensive annotated data. Fortunately, many unsupervised rPPG methods have emerged recently to solve this issue. However, we find that existing unsupervised rPPG methods are learn-from-scratch. Many downstream tasks in deep learning have achieved great success using fine-tuning strategies in the past decade. Inspired by this, we explore the efect of prior knowledge in unsupervised rPPG and proposed UP-Phys. Moreover, to regulate the backbone to prioritize regions rich in rPPG information, we propose a plug-and-play representation augmentation module (RAM). RAM dynamically enhances salient temporal-spatial information derived from extracted features, efectively reducing the efect of noise brought by lighting, motion, iments on two widely used rPPG datasets UBFC-rPPG and PURE demonstrate the superiority of our proposed method. In addition, our method achieves 15.79 RMSE accuracy in the 3rd RePSS.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
Unsupervised</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>learn-from-scratch
expensive, requiring not only the deployment of subjects equipped with contact PPG or ECG
sensors but also careful consideration of various potential environmental factors such as lighting
changes, motion, gestures, and so on while capturing data. In addition, existing supervised
rPPG methods struggle to break through the bottleneck posed by unlabeled data due to their
performance being positively corresponded to the scale of annotated data available, resulting in
less applicability in real scenarios. Fortunately, some unsupervised rPPG methods have been
proposed recently to solve this issue of expensive rPPG data annotations.</p>
      <p>Existing unsupervised rPPG methods [12, 13, 14, 15] can be roughly divided into two
categories: contrastive and non-contrastive. In the former category, Sun et al. [13] pioneered the
introduction of contrastive learning into unsupervised rPPG methods with their proposal of
Contrast-Phys. This method was developed based on four key observations: spatial similarity in
rPPG signals, temporal similarity in rPPG signals, dissimilarity in rPPG signals across diferent
videos, and HR range constraint. Crucially, Contrast-Phys eliminates the reliance on annotated
data and achieves state-of-the-art in publicly available academic datasets. For the latter category,
Speth et al. [14] extended unsupervised methods based on contrastive learning research lines
into non-contrastive and proposed SiNC by discovering periodic signals in video data. SiNC
considers that periodicity sufices for learning minuscule visual features corresponding to the
blood volume pulse from unlabeled face videos, which brings novel inspirations into the rPPG
community.</p>
      <p>Despite achieving encouraging progress, the aforementioned unsupervised rPPG methods are
learn-from-scratch, as shown in Fig.1 (a). This training strategy may introduce potential issues
such as limited generalization, overfitting, and reliance on the scale of data. Moreover, the
quality of predicted rPPG signals by the deep neural network has emerged as a pivotal challenge
in elevating the performance ceiling of unsupervised rPPG, as it lacks efective supervision by
label information. During the past decade, many downstream tasks in computer vision adopted
the fine-tuning strategy [ 16, 17, 18] and achieved significant success. This success is attributed
to the prior knowledge acquired through pretraining, which enables the network to adapt to
various datasets more eficiently and attain superior performance. Inspired by this, in this paper,
we explore the efect of prior knowledge in unsupervised rPPG and propose UP-Phys, as shown
in Fig.1 (b). Specifically, we utilize the Contrast-Phys pre-trained on the MMSE-HR [ 19] dataset
and fine-tune other datasets. Compared with the oficial training protocol of 30 epochs, our
UP-Phys undergoes only 1 epoch of fine-tuning, resulting in significant time savings during
training. Furthermore, we design a plug-and-play representation augmentation module (RAM)
that dynamically enhances salient temporal-spatial information derived from extracted features.
This augmentation empowers the network to prioritize regions abundant in rPPG information,
consequently reducing the efect of noise brought by lighting, motion, etc. Generally, the main
contributions of this paper can be summarized as follows:
• We introduce a novel solution for unsupervised rPPG, termed UP-Phys, which leverages
prior knowledge to reduce training time notably.
• We design a plug-and-play representation augmentation module (RAM) that
dynamically enhances salient temporal-spatial information derived from extracted features for
unsupervised rPPG.
• Experiments on PURE and UBFC-rPPG datasets demonstrate that our UP-Phys
significantly outperforms existing unsupervised rPPG methods, and even surpasses some
supervised counterparts. In addition, UP-Phys achieves 15.79 RMSE accuracy in 3rd
RePSS.</p>
      <p>Prior</p>
      <p>Knowledge
Face Videos</p>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <p>The overview of our proposed UP-Phys is shown in Fig.2.</p>
      <sec id="sec-3-1">
        <title>2.1. Preprocessing</title>
        <p>To reduce background noise and interference from irrelevant areas, we adopt the OpenFace
toolkit to preprocess the video. Specifically, we begin by determining the minimum and
maximum horizontal and vertical coordinates of generated landmarks to pinpoint the central
facial point for each frame. The size of the bounding box is set to 1.2 times the range of
the vertical coordinates of landmarks from the first frame, and this size remains constant for
subsequent frames. Then, we crop the face from each frame and resize it to 128 × 128 according
to the central facial point of each frame and bounding box. To minimize I/O overhead during
training, we convert video files into Hierarchical Data Format (HDF5) format.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Prior Knowledge</title>
        <p>Over the past decade, deep learning has achieved significant success, with many downstream
tasks showing impressive results through fine-tuning pre-trained weights. Inspired by this,
we introduce the fine-tuning strategy into unsupervised rPPG as existing methods are
learnfrom-scratch. Specifically, we utilize Contrast-Phys, pre-trained on the MMSE-HR dataset, and
ifne-tune it for just one epoch on the UBFC-rPPG dataset to investigate the impact of prior
knowledge, as shown in Tab. 1.</p>
        <p>With 25 pre-training videos, the MAE accuracy improves by 0.14 but RMSE accuracy increases
by 0.47. This indicates that prior knowledge can help reduce the average error. The underlying
reason for bad RMSE stems from less prior knowledge. When we increase the pre-training
videos to 50, as shown in index 3, we can observe that both MAE and RMSE achieve significant
improvement. Moreover, the pre-training with 100 videos shows the best performance with
the lowest MAE of 0.33 and RMSE of 0.65. This indicates that larger prior knowledge
significantly enhances the model’s prediction accuracy and consistency. In summary, these results
demonstrate a clear trend that as the number of pre-training videos increases, the accuracy
of the model improves. This emphasizes the benefits of leveraging prior knowledge through
pre-training in enhancing the performance of rPPG models.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Representation Augmentation Module</title>
        <p>Existing unsupervised rPPG methods mainly design refreshing strategies to achieve robust
training without annotated data. The quality of rPPG signal prediction by these methods heavily
relies on the features extracted by the backbone. These unsupervised methods rely solely on
3DCNN and cannot accurately focus on regions with rich rPPG signals in complex environments
such as head movement and lighting, resulting in dificulty in improving performance. Therefore,
we propose a plug-and-play representation augmentation module (RAM) that dynamically
enhances salient temporal-spatial information, helping the backbone focus on regions rich in
rPPG information.</p>
        <p>Specifically, given the input features F ∈ ℝ× × × , we first apply 3D AdaptiveMaxPool to
extract the most salient rPPG knowledge in both horizontal and vertical directions. Subsequently,
we utilize a softmax function to transform this rPPG knowledge into a distribution ranging from
0 to 1. This distribution is then used to create the augmentation mask through multiplication.
Finally, this augmented mask is added to the input features to enhance the rPPG information. It
is written as follows:</p>
        <p>
          F̀ = F + Softmax (AMP (F)) ⊗ Softmax (AMP (F)),
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where AMP and AMP denote the 3D AdaptiveMaxPool with pooling kernels ( ,  , 1) and
( , 1,  ) , respectively. ⊗ is the multiplication operation.
        </p>
        <p>After that, the augmented features F̀ are processed by 3D AdaptiveAvgPool to attain the
directional rPPG knowledge. Then, we concatenate the two directional features along the spatial
dimension to investigate the spatial rPPG information. In addition, A basic 3D convolutional
block is employed to discover shared rPPG information and reduce channel dimension, which
can be expressed as:</p>
        <p>F̆ = Conv(Cat(AAP (F̀), AAP (F̀))),
where AAP and AAP denote the 3D AdaptiveAvgPool with pooling kernels ( ,  , 1) and
( , 1,  ) , respectively. Cat(⋅, ⋅) denotes the concatenation on the height dimension. Conv denotes
the basic 3D convolutional block consisting of a pointwise convolution, batch normalization,
and ELU activation.</p>
        <p>Further, we split the F̆ along spatial dimension and get F̆ℎ and F̆ . Based on F̆ℎ and F̆ , a
pointwise convolution is utilized to restore the channel dimension. Then, sigmoid
normalization and multiplication are employed to generate a mask that discriminates against rPPG
information. Finally, the mask is element-wise multiplicated with the input features to augment
the features, thereby regulating the backbone sensitively concentrating on the regions rich in
rPPG information.</p>
        <p>
          F̂ = [ ( P1×1(F̆ℎ)) ⊗  ( P1×1(F̆ ))] ⊙ F.
where F̂ ∈ ℝ× × × is the augmented features;  denotes the sigmoid function; P1×1 denotes
the pointwise convolution; ⊙ is the element-wise multiplication.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Experiments</title>
      <sec id="sec-4-1">
        <title>3.1. Experimental Setup and Evaluation Protocol</title>
        <p>Datasets. We evaluate the proposed method on the two widely used rPPG datasets
UBFCrPPG [20] and PURE [21]. In addition, we pretrain our method on the MMSE-HR [19] dataset.
UBFC-rPPG contains 42 videos where subjects manipulate their heart rates by engaging in
mathematical games. Each video is recorded at 30 frames per second (fps), has a resolution of
640×480, and runs for approximately one minute. Ground truth data is collected synchronously
using a CMS50E pulse oximeter at a sampling rate of 30 Hz. PURE records videos of 10 subjects
across 6 diferent scenarios, including those with head movements. Each video maintains a
one-minute duration, is captured at 30 fps, and boasts a resolution of 640×480. The ground
truth is accurately recorded using a fingertip pulse oximeter at 60 Hz, specifically to capture the
blood volume pulse (BVP) signal. MMSE-HR contains 102 videos from 40 subjects. Each video
is 25fps, and the subject’s emotional guidance ensures the heart rate changes. Physiological
data were collected by the Boipac Mp150 data acquisition system at 1khz.</p>
        <p>Evaluation Protocol. Following previous works [13, 14], we adopt mean absolute error (MAE),
root mean squared error (RMSE), and person correlation coeficient (R) as the evaluation metrics.
Experimental Setup. We implement our UP-Phys on the PyTorch framework with two RTX
2080Ti GPUs. The Contrast-Phys [13] is utilized as our baseline. The proposed RAM is added
after encoder 1 and encoder 2 of the backbone. We initially pre-train our UP-Phys model on the
MMSE-HR dataset, utilizing the AdamW optimizer with a learning rate of 10−5 for 30 epochs.
Subsequently, we only fine-tune the UP-Phys 1 epoch on the dataset to be evaluated. All other
settings are maintained consistently with those of Contrast-Phys.</p>
        <p>RePSS Setup. We first pre-train our UP-Phys on 209 videos collected by MMSE-HR and
VIPLHR [22] datasets. Subsequently, we fine-tune our method on the UBFC-rPPG and PURE datasets
for 1 epoch. We finally achieve 15.79 RMSE accuracy on the 3rd RePSS.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Intra-Dataset Testing</title>
        <p>We report 3 representative supervised and unsupervised methods for comparison.
Comparison with Unsupervised Methods. As reported in Tab. 2, the performance of our
method surpasses current leading unsupervised methods. More precisely, our UP-Phys achieves
0.18 and 0.48 MAE accuracy on UBFC-rPPG and PURE datasets, respectively. It significantly
outperforms SiNC [14] by 0.41 and 0.13 on these two datasets. Note that while our UP-Phys
is based on Contrast-Phys [13], it significantly outperforms Contrast-Phys. This success is
attributed to the pivotal role of prior knowledge and UP-Phys’s keen ability to focus on regions
abundant in rPPG information, simultaneously demonstrating the efectiveness of our proposed
method.</p>
        <p>Comparison with Supervised Methods. Supervised methods such as Dual-GAN [11] perform
well on both datasets, particularly achieving excellent results with an MAE of 0.44 and an RMSE
of 0.67 on UBFC-rPPG. This can be attributed to the ability of supervised methods to utilize
labeled information in the dataset for training, facilitating the model to learn accurate heart
rate estimation patterns. However, without the label information, our proposed UP-Phys
significantly surpasses Dual-GAN. The excellent performance of our method benefits from the
insightful design of the prior knowledge. Interestingly, our method shows the potential of
unsupervised rPPG methods, and we believe this design can bring new insights to the rPPG
community.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Ablation Study</title>
        <p>To evaluate the contribution of the designed component, we conduct an ablation experiment on
the UBFC-rPPG dataset, as shown in Tab. 3.</p>
        <p>Baseline in index 1 denotes that we directly train the Contrast-Phys [13]. It is observed that
the baseline only achieves 0.64 MAE accuracy and 1.00 RMSE accuracy, showing the limited
capability of the baseline to predict accurate HR.</p>
        <p>Efectiveness of RAM. As shown in index 2, by only adding the RAM, the MAE slightly
decreases to 0.58, but the RMSE increases to 1.50, indicating that the RAM module improves the
prediction accuracy of the model on some samples but introduces large errors on other samples.
With the help of knowledge, as shown in index 4, the MAE further decreases to 0.18 and the
RMSE to 0.45, achieving a superior performance. This indicates that prior knowledge can help
RAM significantly reduce prediction errors.</p>
        <p>Efectiveness of Prior Knowledge. As shown in index 3, only directly adopting the
pretrain can bring significant improvement. Specifically, the MAE drops from 0.64 to 0.33 and
RMSE drops from 1.00 to 0.65. Meanwhile, this accuracy even surpasses existing unsupervised
rPPG methods, showing the efectiveness of prior knowledge.</p>
        <p>Generally, the above observation and analysis demonstrate the efectiveness of our proposed
components.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>This paper introduces a novel unsupervised method termed UP-Phys that leverages prior
knowledge to reduce training time and improve HR estimate accuracy notably. Furthermore, we design
a plug-and-play representation augmentation module (RAM) that dynamically enhances salient
temporal-spatial information derived from extracted features. This augmentation empowers
the network to prioritize regions abundant in rPPG information, consequently reducing the
efect of noise brought by lighting, motion, etc. Experiments on PURE and UBFC-rPPG datasets
demonstrate the efectiveness of our method. In addition, our method achieves 15.79 RMSE
accuracy in the 3rd RePSS.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Acknowledgements</title>
      <p>This research is funded in part by the National Natural Science Foundation of China (Grant No.
61802058, 61911530397), in part by the open Project Program of the State Key Laboratory of
CAD&amp;CG, Zhejiang University (under Grant A2318), and in part by the Postgraduate Research
&amp; Practice Innovation Program of Jiangsu Province (Grant No. KYCX24_1514).
[10] R. Song, H. Chen, J. Cheng, C. Li, Y. Liu, X. Chen, Pulsegan: Learning to generate realistic
pulse waveforms in remote photoplethysmography, IEEE Journal of Biomedical and Health
Informatics 25 (2021) 1373–1384.
[11] H. Lu, H. Han, S. K. Zhou, Dual-gan: Joint bvp and noise modeling for remote physiological
measurement, in: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, 2021, pp. 12404–12413.
[12] J. Gideon, S. Stent, The way to my heart is through contrastive learning: Remote
photoplethysmography from unlabelled video, in: Proceedings of the IEEE/CVF international
conference on computer vision, 2021, pp. 3995–4004.
[13] Z. Sun, X. Li, Contrast-phys: Unsupervised video-based remote physiological measurement
via spatiotemporal contrast, in: European Conference on Computer Vision, Springer, 2022,
pp. 492–510.
[14] J. Speth, N. Vance, P. Flynn, A. Czajka, Non-contrastive unsupervised learning of
physiological signals from video, in: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2023, pp. 14464–14474.
[15] M. Cao, X. Cheng, X. Liu, Y. Jiang, H. Yu, J. Shi, St-phys: Unsupervised spatio-temporal
contrastive remote physiological measurement, IEEE Journal of Biomedical and Health
Informatics (2024).
[16] H. Yu, X. Cheng, W. Peng, Toplight: Lightweight neural networks with task-oriented
pretraining for visible-infrared recognition, in: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 2023, pp. 3541–3550.
[17] H. Yu, X. Cheng, W. Peng, W. Liu, G. Zhao, Modality unifying network for visible-infrared
person re-identification, in: Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2023, pp. 11185–11195.
[18] X. Liu, X. Cheng, H. Chen, H. Yu, G. Zhao, Diferentiable auxiliary learning for sketch
reidentification, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 38,
2024, pp. 3747–3755.
[19] Z. Zhang, J. M. Girard, Y. Wu, X. Zhang, P. Liu, U. Ciftci, S. Canavan, M. Reale, A. Horowitz,
H. Yang, et al., Multimodal spontaneous emotion corpus for human behavior analysis, in:
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
3438–3446.
[20] S. Bobbia, R. Macwan, Y. Benezeth, A. Mansouri, J. Dubois, Unsupervised skin tissue
segmentation for remote photoplethysmography, Pattern Recognition Letters 124 (2019)
82–90.
[21] R. Stricker, S. Müller, H.-M. Gross, Non-contact video-based pulse rate measurement on a
mobile service robot, in: The 23rd IEEE International Symposium on Robot and Human
Interactive Communication, IEEE, 2014, pp. 1056–1062.
[22] X. Niu, S. Shan, H. Han, X. Chen, Rhythmnet: End-to-end heart rate estimation from face
via spatial-temporal representation, IEEE Transactions on Image Processing 29 (2019)
2409–2423.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Alikhani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Seppänen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . Zhao,
          <article-title>Atrial fibrillation detection from face videos by fusing subtle variations</article-title>
          ,
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          <volume>30</volume>
          (
          <year>2019</year>
          )
          <fpage>2781</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>McDuf</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Blackford, iphys: An open non-contact imaging-based physiological measurement toolbox, in: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC)</article-title>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>6521</fpage>
          -
          <lpage>6524</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <article-title>Photoplethysmography and its application in clinical physiological measurement</article-title>
          ,
          <source>Physiological measurement 28</source>
          (
          <year>2007</year>
          )
          <article-title>R1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zhao, Facial-video-based physiological signal measurement: Recent advances and afective applications</article-title>
          ,
          <source>IEEE Signal Processing Magazine</source>
          <volume>38</volume>
          (
          <year>2021</year>
          )
          <fpage>50</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Sabour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Benezeth</surname>
          </string-name>
          , P. De Oliveira,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chappe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Ubfc-phys: A multimodal database for psychophysiological studies of social stress</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>622</fpage>
          -
          <lpage>636</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Birla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          , Patron:
          <article-title>Exploring respiratory signal derived from non-contact face videos for face anti-spoofing</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>187</volume>
          (
          <year>2022</year>
          )
          <fpage>115883</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Birla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Sunrise: Improving 3d mask face anti-spoofing for short videos using pre-emptive split and merge</article-title>
          ,
          <source>IEEE Transactions on Dependable and Secure Computing</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McDuf</surname>
          </string-name>
          ,
          <article-title>Deepphys: Video-based physiological measurement using convolutional attention networks</article-title>
          ,
          <source>in: Proceedings of the european conference on computer vision (ECCV)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>349</fpage>
          -
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zhao, Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>02419</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>