<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fusion of vision transformers and networks for advanced face anti-spoofing convolutional</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhanseri Ikram</string-name>
          <email>ikram.zhanseri@outlook.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bauyrzhan Omarov</string-name>
          <email>bauyrzhan313@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Al-Farabi Kazakh National University</institution>
          ,
          <addr-line>Almaty</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Face anti-spoofing systems play a crucial role in securing biometric authentication frameworks against presentation attacks. The growing complexity of spoofing techniques demands the development of advanced detection methods that can effectively generalize across various attack forms and environmental conditions. In response to challenges, a new architecture fusing Vision Transformers (ViT), ConvNeXT, and Swin Transformer is proposed for advanced face anti-spoofing. The method combines global contextual modeling with local feature extraction and multi-scale analysis. Detailed evaluations on the OULU-NPU and CASIA-MFSD datasets demonstrate competitive performance across various protocols, with notable improvements in generalization to unseen environmental conditions. Feature space visualizations reveal improved class separability post-fusion, emphasizing the effectiveness of the combined approach. Cross-dataset experiments highlight challenges in domain generalization in bidirectional evaluations between OULU-NPU and CASIA-MFSD. The proposed method advances the state-of-the-art in face anti-spoofing, offering insights into feature fusion strategies and avenues for future research in cross-domain generalization.</p>
      </abstract>
      <kwd-group>
        <kwd>Face anti-spoofing</kwd>
        <kwd>machine learning</kwd>
        <kwd>computer vision</kwd>
        <kwd>transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Face recognition technologies have rapidly evolutionized and are now essential to various security
systems, from personal devices to large-scale surveillance networks. However, the widespread
adoption of these technologies has also led to the emergence of face spoofing attacks, where
attackers use photos, videos, masks, or other facial representations to deceive recognition systems
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The field of face anti-spoofing is a critical component of robust biometric systems and it has
undergone significant advancements in recent years, primarily fueled by the rapid evolution of
deep learning methodologies.
      </p>
      <p>
        The fusion of ViT and Convolutional Neural Networks (CNNs) represents a promising route for
addressing these challenges. Vision Transformers, introduced by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], have demonstrated
remarkable performance in various computer vision tasks by applying self-attention mechanisms
to model long-range dependencies. Conversely, CNNs excel at extracting hierarchical local features
and have been the cornerstone of many successful face anti-spoofing approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
synergistic integration of these architectures aims to harness their complementary strengths,
potentially yielding a more detailed and nuanced representation of facial characteristics pertinent
to spoofing detection.
      </p>
      <p>
        The proposed methodology implies a multi-stream architecture that processes input images
through ViT and uses two parallel pathways. In this context ViT offers the potential to capture
subtle, global features that may be indicative of spoofing attacks. The proposed approach builds
upon recent advancements in face anti-spoofing research, including multi-modal fusion techniques
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], attention mechanisms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and domain generalization strategies [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The remainder of this paper is organized as follows: Section 2 provides an overview of related
works, the advancements and challenges in the domain of face anti-spoofing. In Section 3, we detail
the materials and methods used in our study, including the problem statement, the proposed
method, evaluation metrics, loss functions, and datasets. Section 4 presents the experimental
results, demonstrating the performance of our approach on various benchmarks. Finally, Section 5
offers a discussion of the findings, their implications, and potential directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Face anti-spoofing research has made a big growth transitioning from traditional handcrafted
feature-based approaches to deep learning-driven methodologies.</p>
      <p>
        Early face anti-spoofing techniques primarily relied on texture analysis to differentiate between
genuine and spoofed facial presentations. Local Binary Patterns (LBP) and its variants were
extensively employed to capture micro-textural patterns [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Subsequent works explored more
sophisticated descriptors such as SURF [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and HOG [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to boost the discriminative power of
extracted features. While these methods demonstrated good results in controlled environments,
their performance often degraded under variable lighting conditions and against high-quality
spoofing attacks.
      </p>
      <p>
        The deep learning settled a paradigm shift in face anti-spoofing research. CNNs became the
powerful tools for automatically learning hierarchical features from raw input images. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
proposed a CNN architecture specifically designed for face anti-spoofing, incorporating a
pixelwise supervision strategy to improve localization capabilities. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduced a multi-stream CNN
framework that concurrently processed color, depth, and infrared information to bolster spoofing
detection accuracy, thus outperforming traditional methods, particularly in scenarios involving
diverse spoofing techniques. Recognizing the potential of temporal cues in distinguishing between
genuine and spoofed facial presentations, researchers began incorporating motion analysis into
anti-spoofing frameworks. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] proposed a 3D CNN architecture to capture spatio-temporal
features from video sequences. Long Short-Term Memory (LSTM) networks were employed by [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
to model the temporal dynamics of facial movements, demonstrating strong robustness against
video replay attacks and 3D mask impersonations.
      </p>
      <p>
        The integration of attention mechanisms into face anti-spoofing models has gained significant
interest due to their ability to focus on salient regions. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] introduced a spatial attention module to
emphasize discriminative facial areas for spoofing detection. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] proposed a channel attention
mechanism to adaptively recalibrate feature maps, improving the model's sensitivity to subtle
spoofing artifacts.
      </p>
      <p>
        A continuous challenge in face anti-spoofing lies in the domain shift between training and
testing distributions. To address this, several works have explored domain generalization
techniques. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] proposed a multi-adversarial domain generalization framework to learn
domaininvariant features. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] introduced a meta-learning approach to simulate domain shift during
training, thereby increase the model's generalization capabilities. The recent success of ViT in
various computer vision tasks has sparked interest in their application to face anti-spoofing. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
adapted the ViT architecture for spoofing detection, demonstrating competitive performance with
CNN-based counterparts. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] proposed a hybrid CNN-Transformer model that applied both local
and global feature representations for advanced spoofing detection. Recognizing the limitations of
single-modality approaches, researchers have explored the fusion of multiple information sources
for robust spoofing detection. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] proposed a multi-modal framework that combined visible light,
infrared, and depth information. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] introduced a cross-modal fusion strategy that used
complementary cues from different sensing modalities, showing that multi-modal approaches can
address diverse spoofing scenarios and environmental variations.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <p>
        The proposed architecture in Figure 1 consists of ViT, ConvNeXT [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and Swin Transformer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
to create a robust face anti-spoofing system. ConvNeXT introduces a pure ConvNet approach that
incorporates design elements from transformers, achieving performance competitive with
state-ofthe-art vision transformers while maintaining the efficiency and inductive biases of CNNs. Swin
Transformer proposes a hierarchical vision transformer that utilizes shifted windows, enabling
efficient modeling of image features at various scales. The methodology uses the strengths of each
component to address the complex challenge of distinguishing genuine from spoofed facial
presentations.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Problem statement</title>
        <p>Face anti-spoofing systems aim to differentiate between bona fide facial presentations and
fraudulent attempts using various spoofing techniques, such as printed photographs, digital
displays, or 3D masks. The challenge lies in capturing both fine-grained textural details and global
contextual information while maintaining robustness across diverse environmental conditions and
attack modalities. Formally, given an input image
, where
and
represent
height, width, and channels respectively, the objective is to learn a function
where 0 denotes a spoofed presentation and 1 indicates a genuine facial image.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Proposed method</title>
        <p>The proposed architecture comprises three main components. A ViT for global feature extraction,
a ConvNeXT module for local feature refinement, and a Swin Transformer for multi-scale feature
analysis. The fusion of outputs from ConvNeXT and Swin Transformer yields the final
classification decision.</p>
        <p>The ViT module partitions the input image I into N non-overlapping patches, each of size
. In our experiment since we use 8x8 patching for 224x224x3 image, the N will become 784.
(1)
is our ViT model. Then
is the output of i-th layer, where total number layers is 12 and is the batch size.
Out of 12 layers, we use only last 4 layers, since they represent the higher level information. Each
of shape</p>
        <p>, after concatenation the tensor becomes
where DWConv and PointwiseConv represent depthwise and pointwise convolutions, respectively,
and GELU is the Gaussian Error Linear Unit activation function.</p>
        <p>The Swin Transformer operates on the same concatenated feature maps, employing shifted
window-based self-attention to capture multi-scale contextual information
(2)
(3)
(4)
(5)</p>
        <p>The ConvNeXT module processes the concatenated high-level feature maps from ViT using a
series of depthwise separable convolutions and inverted bottleneck layers. For each block i.
where Q, K, V are query, key, and value matrices, d is the dimension of queries/keys, and B is the
relative position bias. Prior to the Swin Transformer, features were passed through a convolutional
layer to reduce the number of parameters, thereby facilitating more efficient training. This
approach was necessary because the Swin Transformer, unlike ConvNeXT, is comparatively slower
in processing.</p>
        <p>The outputs from ConvNeXT and Swin Transformer undergo global average pooling:</p>
        <p>BPCER measures the rate at which the system incorrectly classifies a genuine attempt as a
spoofing attack. metric reflects the system's ability to correctly recognize legitimate users.</p>
        <p>ACER is the average of APCER and BPCER. It is usually used to find a balance between
aforementioned metrics:
For the loss, we used Binary Cross Entropy Loss:
where GAP is Global Average Pooling. The final value before feeding toSigmoid function is
so that both have an equal contribution.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Metrics and loss</title>
        <p>
          Performance metrics used to evaluate the effectiveness of these anti-spoofing systems include
APCER, BPCER, and ACER [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>APCER measures the rate at which the system incorrectly classifies a spoofing attack as a
genuine attempt. In other words, it is the proportion of spoofing attempts that are wrongly
accepted as legitimate by the system.
(6)
(7)
(8)
(9)
(10)
(11)
(12)
where N - the total number of samples, - true label,
- predicted probability.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Dataset</title>
        <p>
          The proposed method was evaluated on two widely recognized datasets in the face anti-spoofing
domain: OULU-NPU [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] in Figure 2 and CASIA-MFSD [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] in Figure 3. These datasets provide
diverse spoofing scenarios and environmental conditions, enabling detailed assessment of
antispoofing algorithms.
        </p>
        <p>OULU-NPU combines 4,950 real access and spoofing videos from 55 subjects, captured using six
mobile devices with front-facing cameras. The dataset provides four protocols evaluating
generalization across unseen environmental conditions, attack types, input sensors, and a
combination thereof. Spoofing attacks include print and video-replay using two printers and two
display devices. Environmental variations encompass three sessions with different illumination and
background settings.</p>
        <p>CASIA-MFSD contains 600 video clips of genuine and attack attempts from 50 subjects. The
dataset features three image quality categories: low-quality, normal-quality, and high-quality.
Spoofing attacks are categorized into three types: warped photo attacks, cut photo attacks, and
video attacks. The dataset was collected under varying illumination conditions and with different
digital devices, presenting challenges in terms of image quality and attack diversity. The dataset
provides seven scenarios from three main protocols with low, high qualities and last is a mix of all
train versus mix of all test samples. For this dataset we used only the 3rd protocol, which is a 7th
scenario.</p>
        <p>
          For both datasets we took only every 25th frame and 5 frame in total only. Face detection was
performed using MTCNN [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] library.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Experimental setup</title>
        <p>In this research, the experiments were executed using an NVIDIA RTX 4090 GPU with 24GB of
VRAM. Batch size is 16, which took about 17GB memory of GPU. The optimizer is Adam with
initial learning rate value 0.00001 by reducing using ReduceLROnPlateau scheduler after each 3
epochs without loss decrease for 20 epochs. For the augmentations we used flipping, rotating,
random cropping, blurring and changing the brightness.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment results</title>
      <p>The proposed method went through a different evaluations using the OULU-NPU and
CASIAMFSD datasets, with performance metrics including ACER, APCER and BPCER. The experimental
results are presented in Tables 1, 2, and 3, showing the model's performance across various
protocols and cross-dataset scenarios.</p>
      <sec id="sec-4-1">
        <title>Protocol</title>
        <p>1
2
3
4</p>
      </sec>
      <sec id="sec-4-2">
        <title>Method</title>
        <p>
          STDN [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
        </p>
        <p>
          CDCN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
DC-CDN [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]
NAS-FAS [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
        </p>
        <p>
          Ours
STDN [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
        </p>
        <p>
          CDCN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
DC-CDN [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]
NAS-FAS [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
        </p>
        <p>
          Ours
STDN [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
        </p>
        <p>
          CDCN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
DC-CDN [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]
NAS-FAS [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
        </p>
        <p>
          Ours
STDN [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
        </p>
        <p>
          CDCN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
DC-CDN [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]
NAS-FAS [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
        </p>
        <p>Ours</p>
        <p>Table 2 presents the performance on the CASIA-MFSD dataset. The proposed method achieved
an APCER of 1.68%, BPCER of 1.68%, and ACER of 1.68%, indicating high performance in detecting
presentation attacks while maintaining a low false rejection rate for genuine presentations.</p>
        <p>Figure 4 presents a visual representation of the feature spaces generated by different
components of the proposed architecture. The top graph illustrates the Principal Component
Analysis (PCA) of the ViT output, while the bottom graph depicts the PCA after fusion of
ConvNeXT and Swin Transformer blocks. Both visualizations use the same samples from the
CASIA-MFSD dataset. The ViT output on top graph provides a relatively clustered distribution of
features, with considerable overlap between live faces and various types of spoofing attacks. The
feature space lacks clear separation between classes, indicating that the ViT alone struggles to
distinguish between genuine and fake presentations consistently. In contrast, the fused output of
ConvNeXT and Swin Transformer blocks on the bottom graph demonstrates a markedly improved
feature distribution. The live face samples form a distinct cluster, which are shown in blue and
gray, well-separated from the spoofing attack samples. The fake face presentations, including
paper-based and video replay attacks, are grouped together but distinctly apart from the live face
cluster.</p>
        <p>Figure 5 illustrates the challenge of domain shift in cross-dataset scenarios. The PCA
visualization shows the feature distribution when the model, trained on CASIA-MFSD, is tested on
OULU-NPU. The plot reveals four distinct clusters corresponding to live faces and different types of
spoofing attacks. Notably, the printed fake faces (orange cluster) are well-separated from other
categories, suggesting robust detection of this attack type across datasets. However, the live faces
(blue), video replay attacks (red), and another type of printed fake (green) show some overlap,
indicating potential challenges in distinguishing these categories in cross-dataset scenarios. The
significant shift in feature distribution between the training and testing datasets is evident from the
distinct grouping of samples on the right side of the plot.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The experimental results and feature space visualizations provide valuable insights into the
performance and characteristics of the proposed face anti-spoofing method. The fusion of ViT,
ConvNeXT, and Swin Transformer demonstrates promising capabilities in addressing the
challenges of face presentation attack detection across various scenarios.</p>
      <p>On the OULU-NPU dataset, the proposed method shows competitive performance across all four
protocols. Notably, in Protocol 1, which evaluates generalization to unseen environmental
conditions, the method achieves the lowest APCER among all compared approaches, suggesting
that the fusion of global and local feature extraction mechanisms effectively mitigates the impact of
varying illumination and background conditions. The method's performance in Protocols 2-4
remains competitive, indicating robustness to unseen attack types and input sensors. The high
performance on the CASIA-MFSD dataset further confirms the method's effectiveness in handling
diverse spoofing techniques and image qualities. The lowest APCER on this dataset is particularly
noteworthy, showing a strong ability to detect all presentation attacks without false acceptances.</p>
      <p>
        The PCA visualizations in Figure 4 provide crucial insights into the feature learning process.
The ViT output alone shows limited separation between live faces and spoofing attacks, indicating
that global context modeling is insufficient for robust anti-spoofing. However, the fused output of
ConvNeXT and Swin Transformer blocks demonstrates a marked improvement in class
separability, indicating the importance of combining global contextual information with local
textural features and multi-scale analysis for effective spoofing detection. The clear separation
between live faces and various attack types in the fused feature space aligns with the strong
performance metrics observed on individual datasets. It suggests that the proposed architecture
successfully learns discriminative features that generalize well across different spoofing techniques
and environmental conditions within a single dataset. The cross-dataset experiments reveal both
strengths and limitations of the proposed method. When trained on OULU-NPU and tested on
CASIA-MFSD, the model achieves an ACER, which, while not optimal, indicates some degree of
generalization. On the other hand, training on CASIA-MFSD and testing on OULU-NPU yields a
better ACER, suggesting that the features learned from CASIA-MFSD may be more generalizable.
Figure 5 visualizes the domain shift problem inherent [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] in cross-dataset scenarios. The distinct
clustering of samples from the test dataset separate from the training dataset distribution
highlights the challenge of domain adaptation in face anti-spoofing. The clear separation of printed
fake faces in this scenario is encouraging, indicating that certain attack types may be more
consistently detectable across domains.
      </p>
      <p>
        While the proposed method demonstrates strong performance within individual datasets, the
cross-dataset results reveal room for improvement in domain generalization. The disparity in
crossdataset performance depending on the training set suggests that the model's generalization
capabilities are influenced by the diversity and characteristics of the training data. Future work
should focus on addressing the domain shift problem, potentially through techniques such as
adversarial domain adaptation or meta-learning approaches [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In summary, the proposed face anti-spoofing method, which combines Vision Transformer,
ConvNeXT, and Swin Transformer, demonstrates high results in detecting presentation attacks
across diverse scenarios. Experimental evaluations on OULU-NPU and CASIA-MFSD datasets
reveal competitive performance, particularly in generalizing to unseen environmental conditions
and attack types. Feature space analysis through PCA visualizations shows the importance of
fusing global and local feature representations. The clear separation between genuine and spoofed
samples in the fused feature space correlates with the strong performance metrics observed on
individual datasets. The proposed method offers a balance between capturing fine-grained textures
and modeling long-range dependencies, crucial for robust spoofing detection.</p>
      <p>However, cross-dataset experiments expose challenges in domain generalization, with varying
performance when transferring between OULU-NPU and CASIA-MFSD. Future work should focus
on improving cross-dataset generalization through advanced domain adaptation techniques or
meta-learning approaches. Additionally, investigating the individual contributions of each
architectural component may lead to further optimizations in the fusion strategy. While the
proposed method exhibits strong performance within individual datasets, addressing domain shift
remains a critical challenge for real-world deployment of face anti-spoofing systems.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Google Gemini in order to: Grammar and
spelling check. After using this tool, the authors reviewed and edited the content as needed and
take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Galbally</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Fierrez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Biometric anti-spoofing methods: A survey in face recognition</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>2</volume>
          ,
          <fpage>1530</fpage>
          -
          <lpage>1552</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Dosovitskiy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolesnikov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weissenborn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unterthiner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Houlsby</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Searching central difference convolutional networks for face anti-spoofing</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S. Z.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>A dataset and benchmark for large-scale multi-modal face anti-spoofing</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , &amp;
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Cross-domain face presentation attack detection via multi-domain disentangled representation learning</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yuen</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Multi-adversarial discriminative deep domain generalization for face presentation attack detection</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Määttä</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pietikäinen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Face spoofing detection from single images using micro-texture analysis</article-title>
          .
          <source>In 2011 International Joint Conference on Biometrics (IJCB).</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Pinto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedrini</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>W. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rocha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Face spoofing detection through visual codebooks of spectral temporal cubes</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          ,
          <volume>24</volume>
          (
          <issue>12</issue>
          ),
          <fpage>4726</fpage>
          -
          <lpage>4740</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Komulainen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pietikäinen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Face spoofing detection using dynamic texture</article-title>
          .
          <source>In Asian Conference on Computer Vision.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Wen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Face anti-spoofing: Model matters, so does data</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jourabloo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Learning deep models for face anti-spoofing: Binary or auxiliary supervision</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>3D Face Mask Presentation Attack Detection Based on Intrinsic Image Analysis</article-title>
          .
          <source>IET Biometrics</source>
          ,
          <volume>8</volume>
          (
          <issue>5</issue>
          ),
          <fpage>342</fpage>
          -
          <lpage>351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Learning temporal features using LSTM-CNN architecture for face anti-spoofing</article-title>
          .
          <source>In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Deep spatial gradient and temporal depth learning for face anti-spoofing</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>J.,</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Face anti-spoofing with human material perception</article-title>
          .
          <source>In European Conference on Computer Vision.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Single-side domain generalization for face antispoofing</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yuen</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Regularized fine-grained meta face anti-spoofing</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Searching central difference convolutional networks for face anti-spoofing</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yuen</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Remote photoplethysmography correspondence feature for 3D mask face presentation attack detection</article-title>
          .
          <source>In European Conference on Computer Vision.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S. Z.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>CASIA-SURF: A large-scale multi-modal benchmark for face anti-spoofing</article-title>
          .
          <source>IEEE Transactions on Biometrics, Behavior, and Identity Science</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <fpage>182</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>George</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Marcel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks</article-title>
          .
          <source>IEEE Transactions on Information Forensics and Security</source>
          ,
          <volume>16</volume>
          ,
          <fpage>361</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feichtenhofer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>A ConvNet for the 2020s</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Swin transformer: Hierarchical vision transformer using shifted windows</article-title>
          .
          <source>In Proceedings of the IEEE/CVF International Conference on Computer Vision.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24] ISO/IEC 30107-3:
          <year>2023</year>
          . (
          <year>2023</year>
          ).
          <article-title>Information technology - Biometric presentation attack detection - Part 3: Testing and reporting (Edition 2)</article-title>
          . International Organization for Standardization.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Boulkenafet</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komulainen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hadid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>OULU-NPU: A mobile face presentation attack database with real-world variations</article-title>
          .
          <source>In 2017 12th IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            , &amp;
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Z.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>A face antispoofing database with diverse attacks</article-title>
          .
          <source>In 2012 5th IAPR International Conference on Biometrics (ICB).</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Joint face detection and alignment using multitask cascaded convolutional networks</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.1604.02878.tt.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stehouwer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>On disentangling spoof trace for generic face antispoofing</article-title>
          .
          <source>In Proceedings of the European Conference on Computer Vision</source>
          (pp.
          <fpage>406</fpage>
          -
          <lpage>422</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , et al. (
          <year>2020</year>
          ).
          <article-title>Searching central difference convolutional networks for face anti-spoofing</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          (pp.
          <fpage>5295</fpage>
          -
          <lpage>5305</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Dual-cross central difference network for face anti-spoofing</article-title>
          .
          <source>In Proceedings of the International Joint Conference on Artificial Intelligence</source>
          (pp.
          <fpage>1281</fpage>
          -
          <lpage>1287</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S. Z.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>NAS-FAS: Static-dynamic central difference network search for face anti-spoofing</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>43</volume>
          (
          <issue>9</issue>
          ),
          <fpage>3005</fpage>
          -
          <lpage>3023</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Boulkenafet</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komulainen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hadid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Face anti-spoofing using speeded-up robust features and Fisher vector encoding</article-title>
          .
          <source>IEEE Signal Processing Letters</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ),
          <fpage>141</fpage>
          -
          <lpage>145</lpage>
          . https://doi.org/10.1109/LSP.
          <year>2017</year>
          .
          <volume>2654306</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Atoum</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jourabloo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Face anti-spoofing using patch and depthbased CNNs</article-title>
          .
          <source>In 2017 IEEE International Joint Conference on Biometrics (IJCB)</source>
          (pp.
          <fpage>319</fpage>
          -
          <lpage>328</lpage>
          ). IEEE. https://doi.org/10.1109/BTAS.
          <year>2017</year>
          .
          <volume>8272724</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Improving Face Anti-Spoofing by 3D Virtual Synthesis</article-title>
          .
          <source>In 2019 International Conference on Biometrics (ICB)</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ). https://doi.org/10.1109/ICB45273.
          <year>2019</year>
          .
          <volume>8987385</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chu</surname>
          </string-name>
          , W.-S. (
          <year>2023</year>
          ).
          <article-title>Rethinking domain generalization for face anti-spoofing: Separability and alignment</article-title>
          .
          <source>Proceedings of the IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition. Available at arXiv:
          <volume>2303</volume>
          .13662 [cs.CV]. https://doi.org/10.48550/arXiv.2303.13662.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kot</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Learning Meta Pattern for Face AntiSpoofing</article-title>
          .
          <source>IEEE Transactions on Information Forensics and Security</source>
          ,
          <volume>17</volume>
          ,
          <fpage>1201</fpage>
          -
          <lpage>1213</lpage>
          . https://doi.org/10.1109/TIFS.
          <year>2022</year>
          .
          <volume>3158551</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>