=Paper=
{{Paper
|id=Vol-3800/paper2
|storemode=property
|title=Research on the effectiveness of concatenated embeddings in facial verification
|pdfUrl=https://ceur-ws.org/Vol-3800/paper2.pdf
|volume=Vol-3800
|authors=Denys Khanin,Viktor Otenko,Volodymyr Khoma
|dblpUrl=https://dblp.org/rec/conf/csdp/KhaninOK24
}}
==Research on the effectiveness of concatenated embeddings in facial verification==
<pdf width="1500px">https://ceur-ws.org/Vol-3800/paper2.pdf</pdf>
<pre>
                                Research on the effectiveness of concatenated
                                embeddings in facial verification⋆
                                Denys Khanin1,*,†, Viktor Otenko1,† and Volodymyr Khoma1,†
                                1
                                    Lviv Polytechnic National University, Information Security Department, 12 Stepana Bandery str., 79000 Lviv, Ukraine


                                                    Abstract
                                                    In the era of digital authentication, facial verification systems have become a cornerstone of security
                                                    protocols across various applications. This study explores the performance synergy from concatenated
                                                    embeddings in enhancing biometric authentication accuracy. By leveraging the Celebrities in Frontal-
                                                    Profile dataset (CFP), we investigate whether the fusion of embeddings generated by models such as VGG-
                                                    Face, Facenet, OpenFace, ArcFace, and SFace can result in a more robust authentication process. Our
                                                    approach is rooted in the hypothesis that the diverse strengths of these models, when combined, can address
                                                    the limitations inherent in single-model systems, thus providing a more comprehensive solution to facial
                                                    verification. The approach involves computing the L2 distance between normalized concatenated
                                                    embeddings of an input face image and an anchor, thereby determining the authenticity of the individual.
                                                    Experiments are designed to compare the performance of singular model embeddings against concatenated
                                                    embeddings, employing metrics such as accuracy, False Acceptance Rate (FAR), and False Rejection Rate
                                                    (FRR). One of the critical aspects of our research is the implementation of Z-Score normalization and L2
                                                    normalization processes to standardize the embeddings from different models. These normalization
                                                    techniques are vital in ensuring that the diverse outputs from various models are effectively combined,
                                                    maintaining balance and consistency in the feature vectors. Additionally, our methodology includes a
                                                    comprehensive evaluation framework that meticulously analyses the trade-offs between computational
                                                    efficiency and performance gains achieved through model concatenation. The findings of this research
                                                    could significantly contribute to the development of more secure and reliable facial verification systems by
                                                    using multiple existing models without the need for new model research, designing, and training. This
                                                    approach not only optimizes resource utilization but also provides a scalable solution that can be readily
                                                    adapted to existing systems, enhancing their security measures without extensive overhauls. Furthermore,
                                                    the study’s insights into the integration of model outputs could pave the way for future innovations in
                                                    biometric authentication, encouraging the development of hybrid systems that combine the best attributes
                                                    of various neural network architectures. This research underscores the potential of concatenated
                                                    embeddings in revolutionizing facial verification technology. By harnessing the power of multiple neural
                                                    network models, we can create a system that delivers superior accuracy and robustness, addressing the
                                                    pressing need for advanced security solutions. This study sets the stage for further exploration into multi-
                                                    model integration, offering a promising direction for future advancements in biometric authentication.

                                                    Keywords
                                                    facial verification, biometric authentication, neural networks, concatenated embeddings, machine
                                                    learning, deep learning, model fusion, facial recognition, verification accuracy, security systems 1


                         1. Introduction                                                                   measures, as highlighted by Yevseiev et al. in their detailed
                                                                                                           monograph on socio-cyber-physical systems security [2].
                         In today’s digital landscape, facial verification [1] systems                         This research investigates the potential of enhancing
                         have become pivotal in ensuring the security and                                  facial verification accuracy through concatenated
                         authenticity of individual identities across various                              embeddings from multiple neural network [3] models.
                         applications, from mobile device security to access controls                      Utilizing the CFP dataset [4], we aim to determine whether
                         in sensitive environments. The adoption of facial                                 the integration of various model embeddings can produce a
                         recognition technology is driven by its non-intrusive nature                      more robust and secure biometric authentication system. By
                         and the unique, hard-to-replicate characteristics of the                          examining the performance synergy of these concatenated
                         human face, positioning it as a front-runner in biometric                         embeddings in comparison to singular model outputs, this
                         authentication methods. Furthermore, the integration of                           study aims to contribute to the development of more
                         socio-cyber-physical systems security frameworks provides                         advanced and reliable facial verification techniques with the
                         a comprehensive approach to enhancing cybersecurity                               existing set of models for facial verification.


                                CSDP-2024: Cyber Security and Data Protection, June 30, 2024, Lviv,             0009-0001-4009-0202 (D. Khanin); 0000-0003-4781-7766 (V. Otenko);
                                Ukraine                                                                       0000-0001-9391-6525 (V. Khoma)
                                ∗ Corresponding author.
                                                                                                                            © 2024 Copyright for this paper by its authors. Use permitted under
                                †
                                  These authors contributed equally.                                                        Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                   denys.o.khanin@lpnu.ua (D. Khanin); viktor.i.otenko@lpnu.ua
                                (V. Otenko); volodymyr.v.khoma@lpnu.ua (V. Khoma)
CEUR
Workshop
                  ceur-ws.org
              ISSN 1613-0073
                                                                                                      12
Proceedings
1.1. Background                                                       the long run. The hypothesis presented in this study arises
                                                                      from these challenges, proposing the use of concatenated
The evolution of biometric authentication technologies has            embeddings [10] from multiple models as a means to bypass
been significantly influenced by advancements in machine              the constraints of singular model dependency. This
learning and deep learning [5], particularly in the domain of         approach aims to explore whether integrating the diverse
facial recognition. Neural network models, such as VGG-               capabilities of established models can offer a more robust
Face, Facenet, OpenFace, ArcFace, and SFace, represent the            and accurate solution for facial verification, thus addressing
forefront of research and development in this field. These            the core issues associated with the current methodologies.
models are designed to extract and analyze facial features
[6] from images, transforming them into numerical                     1.3. Objectives of the research
representations known as embeddings. These embeddings
capture the unique aspects of an individual’s facial                  This research focuses on key goals designed to explore
structure, enabling systems to perform verification tasks             improvements in facial verification systems:
with high degrees of accuracy. The success of these models
is predicated on their ability to learn complex patterns and                  To create a system to test both individual and
variations in facial features across diverse datasets, under                   combined embeddings from models such as VGG-
various conditions of lighting, pose, and expression.                          Face, Facenet, OpenFace, ArcFace, and SFace,
    In facial verification technology, using just one neural                   leveraging the CFP dataset for comprehensive
network model comes with certain limitations. Different                        analysis.
models excel in various aspects, such as accuracy, speed of                   To measure the effectiveness of each model and
processing, and their ability to handle changes in lighting or                 their combinations using accuracy, FAR, and FRR
facial features [7]. The drive for better performance and                      [11].
reliability in these systems often requires large and varied                  To analyze system performance across single and
datasets for training, which can be resource-intensive.                        combined model embeddings to identify the most
Additionally, there is a constant need to develop and test                     effective strategies for facial verification.
new model architectures that can effectively transform                        To extract insights for potential system
facial images into useful numerical data, known as                             enhancements and recognize any associated
embeddings. This scenario suggests that combining several                      challenges with multi-model embeddings.
neural network models might offer a more efficient
solution. By leveraging the unique strengths of multiple              2. Related works
models, such an approach could potentially overcome the
                                                                      The problem of enhancing the accuracy and robustness of
common challenges in facial verification. This sets the stage
                                                                      facial verification systems has been a focal point in
for investigating how the integration of outputs from
                                                                      numerous studies due to ongoing challenges such as
different models could lead to improvements in system
                                                                      spoofing, adversarial attacks, and varying conditions of
performance.
                                                                      image capture. Ding and Tao (2018) addressed the
1.2. Problem statement                                                limitations of traditional face recognition approaches by
                                                                      introducing Trunk-Branch Ensemble Convolutional Neural
The hypothesis driving this research emerges from a critical          Networks for video-based face recognition, which improved
challenge within the realm of facial verification systems: the        recognition accuracy but still faced challenges in handling
limitations of using single-model architectures in achieving          dynamic and complex environments [12]. Nagrath et al.
consistently high accuracy across diverse conditions. This            (2021) highlighted the need for lightweight and efficient
issue underscores the necessity of exploring alternative              neural networks like MobileNetV2 for real-time applications,
strategies that can leverage the strengths of existing                but their study also pointed out the difficulties in maintaining
technologies without the need for constant model                      high accuracy under real-time constraints [13].
retraining, dataset updates, or the development of new                     Li et al. (2018) focused on enhancing deep learning
architectures [8]. Moreover, an exploratory survey by                 features with facial texture features for improved
Hlushchenko and Dudykevych on access control paradigms                recognition performance, but the integration of different
highlights the evolving landscape of policy management                feature extraction techniques remained complex and
and its implications for biometric security systems [9].              computationally intensive [14].
    Current facial verification systems often rely on a                    Moon et al. (2016) developed a face recognition system
singular neural network model, which may excel under                  based on convolutional neural networks using multiple
specific conditions but fall short in others. This reliance           distance faces, which further emphasized the necessity of
poses a significant problem as it demands continuous                  integrating various models to enhance system robustness, yet
updates to the model and its underlying dataset to address            it also indicated the increased computational demands [15].
emerging challenges and maintain system performance.                       Yang et al. (2019) explored federated machine learning
Such an iterative cycle of development is resource-                   for face verification, addressing privacy and security
intensive, requiring substantial investments in data                  concerns while maintaining high verification accuracy.
collection, processing, and computational power.                      Their research underscored the challenge of managing
Additionally, the creation of new model architectures to              decentralized data and the need for efficient data integration
improve feature extraction and classification accuracy                techniques [16].
further complicates the process, making it unsustainable in


                                                                 13
Bhuiyan et al. (2017) presented a noise-resistant network for                  balanced representation of genders, ethnicities,
face recognition under noisy conditions, which highlighted                     and professions. This approach ensures the dataset
the ongoing challenge of achieving robust performance in                       reflects the complexity and diversity of facial
diverse real-world scenarios [17].                                             appearances and expressions in everyday life.
    Recent advancements have shown that despite
significant improvements in facial verification technologies,             CFP dataset examples are shown below in Fig. 1: 4
several unresolved issues persist. Gao et al. (2018) discussed        random face images for each of the 3 individuals from the
privacy-preserving techniques in face recognition, which              dataset.
remain a critical concern in the deployment of these
systems [18]. Furthermore, the study by Hanmandlu et al.
(2013) on Elastic Bunch Graph Matching for face
recognition identified the need for better handling of pose
and illumination variations [19].
    The integration of multiple models to leverage their
unique strengths and mitigate individual weaknesses is a
promising approach, as highlighted by recent research on              Figure 1: CFP dataset example images of individuals
hybrid and ensemble methods. However, this integration
introduces new challenges, such as increased computational            3.2. Models
complexity and the need for sophisticated normalization
techniques to ensure consistent and reliable performance              This study employs various neural network models, each
[20, 21]. Additionally, Brydinskyi et al. provide a                   with unique architectures and characteristics, to determine
comparative analysis of modern deep-learning models for               the effectiveness of concatenated systems in facial
speaker verification, demonstrating the critical role of              verification. The models utilized include VGG-Face,
model selection and combination in enhancing verification             Facenet, Facenet512, OpenFace, ArcFace, and SFace, each
accuracy [22].                                                        designed to extract and analyze facial features from images,
    These studies collectively underscore the necessity of            transforming them into numerical representations known
integrating multiple models to leverage their unique                  as embeddings. A comparison of architecture, embedding
strengths and mitigate individual weaknesses, aligning with           dimensions, training focus, and key features of each model
our research objective of using concatenated embeddings to            is described in Table 1.
enhance facial verification systems’ accuracy and
robustness. The proposed approach builds on the
                                                                      3.3. Concatenation system
foundations laid by these works, aiming to address their              The concatenation system forms a pivotal component of our
limitations through the strategic combination of diverse              methodology, designed to harness the collective strengths
neural network models.                                                of multiple facial recognition models. This approach seeks
                                                                      to enhance the robustness and accuracy of facial verification
3. Methodology                                                        by leveraging the diverse feature representations extracted
                                                                      by different models. The process involves several key steps,
3.1. Dataset                                                          each contributing to the formation of a comprehensive
The CFP dataset plays a pivotal role in our study, offering a         feature set that is used for facial verification:
nuanced exploration of facial verification across varying
                                                                         1.    Model Selection: The first step involves selecting a
poses. Its construction and attributes are as follows:
                                                                               set of neural network models, such as VGG-Face,
        Size and Volume: The dataset consists of images of                    Facenet, OpenFace, ArcFace, and SFace, each
         500 individuals, with 10 frontal images per                           known for its unique approach to capturing facial
         individual.                                                           features. This diversity is crucial for assembling a
        Resolutions and Quality: Including a mix of                           wide-ranging feature set.
         resolutions and qualities, the dataset mirrors the              2.    Output Extraction: For each model, we extract the
         variability encountered in real-world applications,                   output embeddings that represent the facial
         ranging from high-definition to lower-quality                         features identified by that model. These
         images, challenging the adaptability of verification                  embeddings are the high-dimensional vectors that
         systems to varying image fidelity.                                    encapsulate the model’s interpretation of the facial
                                                                               features.
        Diversity of Conditions: It spans a broad spectrum
                                                                         3.    Z-Score Normalization [28]: To standardize the
         of real-life conditions— different lighting scenarios
                                                                               embeddings from different models, we apply Z-
         from natural daylight to artificial and low light
                                                                               Score normalization to each embedding vector.
         environments, varied backgrounds from simple to
                                                                               This normalization process adjusts the
         cluttered scenes, and a wide range of facial
                                                                               embeddings so that they have a mean of 0 and a
         expressions and poses, especially focusing on
                                                                               standard deviation of 1. This step is essential for
         extreme profile views that pose a significant
                                                                               mitigating the variance in scale and distribution of
         challenge to current algorithms.
                                                                               the embeddings across different models, ensuring
        Source: Images are sourced from the internet,
                                                                               that no single model’s output disproportionately
         capturing “in the wild” conditions that include a
                                                                               influences the concatenated feature vector.


                                                                 14
Table 1
Neural network model characteristics
                                    Embedding
 Model             Architecture                     Training Focus           Key Features
                                    Dimension
 VGG-              VGG-16           4096            Facial Recognition       Deep convolutional layers, trained on large facial image
 Face[23]                                                                    dataset, use small (3×3) convolution filters, capture fine facial
                                                                             details
 Facenet[24]       Inception-       128             Triplet         Loss     Compact       embeddings      optimize     distance     between
                   ResNet v1                        Function                 similar/dissimilar faces, use triplet-based loss function to
                                                                             enhance verification accuracy
 Facenet512        Inception-       512             Extended Triplet         Higher-dimensional embeddings, capture more nuanced
                   ResNet v1                        Loss Function            features, an extension of Facenet with increased embedding
                                                                             size for a richer representation
 OpenFace[25]      nn4.small2       128             Real-time                Balances accuracy and computational efficiency, suitable for
                                                    Recognition              real-time applications, a lightweight model designed for
                                                                             practical use on modest hardware
 ArcFace[26]       ResNet-100       512             Additive Angular         Enhances discriminative power, improves geometric accuracy
                                                    Margin Loss              of feature space, uses additive angular margin loss to manage
                                                                             class margins
 SFace[27]         Xception-39      128             Scale Variations         Efficient handling of scale issues, rapid and accurate
                                                                             recognition, perform well on high-resolution images, notable
                                                                             efficiency and accuracy, especially on large datasets


    4.      Concatenation: Following normalization, the                                            𝐹𝑃
                                                                                         𝐹𝐴𝑅 =            ,                 (1)
            embeddings from all selected models are                                            (𝐹𝑃 + 𝑇𝑁)
            concatenated into a single, comprehensive feature            where FP is the number of false positives, and TN is the
            vector. This concatenated vector represents a                number of true negatives.
            fusion of the diverse facial features recognized by              False Rejection Rate: FRR assesses the frequency at
            the individual models, capturing a broader                   which the system wrongly rejects an authentic match. This
            spectrum of facial characteristics than any single           metric is important for understanding the usability of the
            model could.                                                 system, as a high FRR may lead to user frustration. Lower
    5.      L2 Normalization [29]: The concatenated feature              FRR values are desirable, indicating better performance.
            vector undergoes L2 normalization, which scales              FRR is calculated as:
            the vector to have a unit norm. This normalization
            step is critical for preparing the feature vector for                                      𝐹𝑁
                                                                                         𝐹𝑅𝑅 =                  ,                  (2)
            similarity calculations, ensuring that the                                            (𝑇𝑃 + 𝐹𝑁)
            magnitude of the vector does not affect the                      where FN represents false negatives, and TP denotes
            distance measurements.                                       true positives.
    6.      EER Determination: Upon calculating the L2                       Accuracy: This metric measures the overall
            distances between facial image pairs, we identify            effectiveness of the facial verification system. It is calculated
            the Equal Error Rate (EER), the point where the              as the ratio of correctly identified instances (both true
            FAR and the FRR converge. Determining the EER                positives and true negatives) to the total number of
            is essential, as it represents an optimal balance            instances. High accuracy indicates that the system is
            point for the system’s decision threshold,                   effective in correctly verifying facial identities. The formula
            minimizing both false positives and false                    for accuracy is given by:
            negatives. This optimal threshold is then used to
            distinguish between matches and non-matches                                              𝑇𝑃 + 𝑇𝑁
                                                                              𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                               ,         (3)
            across the entire dataset, allowing for the proper                               (𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁)
            measurement of verification metrics such as                      where TP represents true positives, TN denotes true
            accuracy, FAR, and FRR.                                      negatives, FP stands for false positives, and FN signifies false
                                                                         negatives.
3.4. Evaluation metrics                                                      Together, these metrics provide a comprehensive
To analyze the performance of our facial verification                    overview of the system’s performance, offering insights into
systems, including both single and combined models, we                   its accuracy, security, and usability. By evaluating these
use three main metrics: accuracy, FAR, and FRR. These                    metrics, we can make informed decisions on optimizing
metrics help us understand the systems’ performance in                   model configurations and improving facial verification
correctly identifying faces.                                             systems.
    False Acceptance Rate: FAR measures the likelihood
that the system incorrectly verifies an impostor as a genuine            3.5. Technical setup
user. It is crucial to evaluate the security aspect of the facial        Experiments were conducted on a defined technical
verification system, with lower values indicating higher                 framework comprising specific hardware and software
security. FAR is calculated as:                                          components.
                                                                             Hardware Configuration: MacBook Pro 16 with an
                                                                         M1 Pro processor and 16GB RAM, offering enough


                                                                    15
computational power for handling neural network                          For OpenFace and SFace models:
operations.
   Software Configuration:                                                                      𝑖𝑚𝑔
                                                                                       𝑖𝑚𝑔 =         .                      (7)
                                                                                                 255
        Python 3.11: Selected for its widespread support
         for data analysis and machine learning tasks.               4.2. Singular models evaluation
        Tensorflow-metal 1.1.0: Optimized for the M1 Pro,           In the evaluation phase of our experiments, each neural
         enhancing machine learning computation speeds.              network model was assessed individually to establish its
        OpenCV-python 4.9.0: Utilized for image                     performance on the CFP dataset. A crucial part of this
         processing tasks such as loading, resizing, and             assessment involved determining the EER for each model,
         cropping.                                                   which provides a threshold at which the rate of false
        Deepface 0.0.83: A library providing access to              acceptances is equal to the rate of false rejections.
         several facial recognition model weights (VGG-                  The process began with the calculation of distances
         Face, Facenet, OpenFace, ArcFace, SFace) and their          between facial embeddings for both genuine and impostor
         functionalities, streamlining the embedding                 pairs. Following this, we computed the EER for each model,
         extraction.                                                 which then served as a basis for determining the
                                                                     corresponding accuracy at the EER point and the best
4. Experiments and results                                           overall accuracy achieved by the model. These metrics give
                                                                     us insight into the models’ capabilities in facial verification
4.1. Data preprocessing                                              tasks under the diverse conditions presented by the CFP
                                                                     dataset.
Data preprocessing is a crucial initial phase in our
                                                                         The results of the singular model evaluations are
experiment, ensuring facial images are properly conditioned
                                                                     summarized in Table 2.
for analysis by various neural network models. Here’s an
outline of the preprocessing steps undertaken:                       Table 2
    Loading Images: Images are first loaded in RGB color             Singular model metrics on the CFP dataset
space, retaining their essential color information which is
                                                                      Model         EER(%)      EER Accuracy(%)    Best Accuracy(%)
crucial for accurate analysis of facial features.
                                                                      VGG-Face      4.7         95.28              95.28
    Scaling Pixel Values: To standardize the images, pixel
                                                                      Facenet       3.4         96.62              97.45
values for each color channel are scaled to a range from 0 to         Facenet512    3.15        96.85              97.37
255.                                                                  OpenFace      18.3        81.70              81.72
    Model-Specific Normalization: Depending on each                   ArcFace       5.95        94.07              94.65
model’s requirements, specific normalization techniques are           SFace         18.5        81.42              81.80
applied to the image data to match the conditions under
which the models were trained [30].                                  Analyzing results, we observe a wide range in performance
    For Facenet model:                                               across different models. Models such as Facenet and
                                                                     Facenet512 show promising EER values and high accuracy,
                   𝑖𝑚𝑔 − 𝑚𝑒𝑎𝑛(𝑖𝑚𝑔)                                   indicating their robustness in handling facial verification.
         𝑖𝑚𝑔 =                          ,            (4)
                       𝑠𝑡𝑑(𝑖𝑚𝑔)                                      Conversely, models like OpenFace and SFace, demonstrate
    where mean and std are the mean and standard                     the challenges of achieving high accuracy in diverse CFP
deviation of the image’s pixel values, respectively.                 dataset conditions.
For Facenet512 and ArcFace models:
                                                                     4.3. Concatenated clusters evaluation
                    𝑖𝑚𝑔
             𝑖𝑚𝑔 =        − 1,                       (5)             The exploration of concatenated clusters is an integral part
                     127
                                                                     of the research, aimed at harnessing the collective strengths
   For the VGGFace model:
                                                                     of multiple neural network models to enhance facial
                             93.5940                                 verification accuracy. This section discusses the evaluation
          𝑖𝑚𝑔 = 𝑖𝑚𝑔 − 104.7624 ,                    (6)              of clusters formed by all possible combinations of six
                           129.18633                                 distinct models: VGG-Face, Facenet, Facenet512, OpenFace,
    this formula represents the subtraction of mean values           ArcFace, and SFace. Each cluster is identified by a unique ID
for each color channel (R, G, B) based on VGGFace1 training          for ease of reference and comparative analysis.
data.


                                                                16
Figure 2: Visual analysis of performance metrics for Facenet and Facenet512 cluster (Cluster ID 5)

The evaluation methodology began with the                                  contrasts with clusters of high-performing
determination of the EER for each cluster. EER serves as                   models, which, on average, only show about a
a crucial metric for assessing the balance between                         0.5% improvement in accuracy. This
security and user convenience. By employing this                           observation suggests that strategic pairing,
threshold, we derived the EER-based accuracy and the                       especially involving models with varied
best accuracy achievable across a range of thresholds,                     strengths, can effectively compensate for
thereby quantifying the models’ verification capabilities.                 individual weaknesses.
The metrics graphs and threshold range analysis                           Variable Outcomes from Mixed Model
examples are shown in Fig. 2.                                              Clusters: Not all model combinations lead to
     Following the graphical analysis, the performance                     positive outcomes. In some cases, such as the
results for each cluster are presented in Table 3. This table              cluster of Facenet and VGG-Face(Cluster ID 0),
arranges the EER accuracy and the best accuracy observed                   the resulting accuracy was slightly lower than
for each cluster in 57 combinations.                                       that of the Facenet model on its own. This
In the evaluation of concatenated clusters, our data                       points to the complexity of model interactions
indicates that selected clusters achieve a marginal                        within clusters and indicates that combining
increase in accuracy over the highest-performing                           models does not guarantee enhanced
individual model, Facenet512. Specifically, clusters 5, 9,                 performance and may result in suboptimal
11, 25, 26, 27, 45, and 55 demonstrate a modest                            results in certain configurations.
enhancement, improving upon the best singular model’s                     Considerations          on      Computational
accuracy by approximately 0.23%. While this                                Efficiency: While some model clusters achieve
improvement showcases the potential advantages of                          minor improvements in accuracy, like Facenet
model concatenation, it is crucial to consider the                         with Facenet512(Cluster ID 5) with a 0.23%
computational trade-offs associated with such a strategy.                  increase, the requisite computational resources
                                                                           increase significantly. This raises important
5. Discussions                                                             considerations about the cost-benefit ratio of
                                                                           employing concatenated models, especially
5.1. Insights from the results                                             when the gains in performance are marginal
The study’s exploration into the performance of facial                     compared to the added computational demand.
verification models, individually and in combined                         Maintaining High Accuracy and Security: It
clusters, has revealed several key insights:                               is noteworthy that both individual and
                                                                           clustered models achieving the highest
        Impact of Model Pairing on Performance:                           performance were able to maintain their
         Our findings highlight a notable trend where                      accuracy without any false acceptances on the
         clusters combining models with lower initial                      CFP dataset. This demonstrates their potential
         accuracy see significant performance boosts.                      in scenarios demanding high security, where
         For instance, pairing OpenFace with                               maintaining accuracy without compromising
         Sface(Cluster ID 13) resulted in a 4.33% increase                 on false acceptance rates is crucial.
         in accuracy, achieving an 86.13% rate. This


                                                                17
Table 3
Cluster models metrics on the CFP dataset
       Cluster-ID   Models                                                  EER Accuracy(%)      Best Accuracy(%)
       0            VGG-Face,Facenet                                                       95.63                  95.63
       1            VGG-Face,Facenet512                                                    95.97                  96.18
       2            VGG-Face,OpenFace                                                      95.33                  95.33
       3            VGG-Face,ArcFace                                                       95.88                  95.88
       4            VGG-Face,SFace                                                         95.55                  95.55
       5            Facenet,Facenet512                                                     97.13                  97.68
       6            Facenet,OpenFace                                                       95.25                  95.53
       7            Facenet,ArcFace                                                        95.25                  96.35
       8            Facenet,SFace                                                          95.57                  96.13
       9            Facenet512,OpenFace                                                    96.87                  97.28
       10           Facenet512,ArcFace                                                     96.65                  97.43
       11           Facenet512,SFace                                                       97.20                  97.33
       12           OpenFace,ArcFace                                                       93.53                  94.83
       13           OpenFace,SFace                                                         85.83                  86.13
       14           ArcFace,SFace                                                          93.67                  94.62
       15           VGG-Face,Facenet,Facenet512                                            96.13                  96.35
       16           VGG-Face,Facenet,OpenFace                                              95.57                  95.63
       17           VGG-Face,Facenet,ArcFace                                               95.93                  96.03
       18           VGG-Face,Facenet,SFace                                                 95.67                  95.67
       19           VGG-Face,Facenet512,OpenFace                                           95.90                  96.25
       20           VGG-Face,Facenet512,ArcFace                                            96.13                  96.52
       21           VGG-Face,Facenet512,SFace                                              95.92                  96.23
       22           VGG-Face,OpenFace,ArcFace                                              95.68                  95.98
       23           VGG-Face,OpenFace,SFace                                                95.35                  95.35
       24           VGG-Face,ArcFace,SFace                                                 95.82                  95.92
       25           Facenet,Facenet512,OpenFace                                            96.68                  97.55
       26           Facenet,Facenet512,ArcFace                                             96.65                  97.53
       27           Facenet,Facenet512,SFace                                               97.30                  97.57
       28           Facenet,OpenFace,ArcFace                                               94.90                  96.13
       29           Facenet,OpenFace,SFace                                                 94.38                  94.62
       30           Facenet,ArcFace,SFace                                                  95.98                  96.15
       31           Facenet512,OpenFace,ArcFace                                            96.80                  97.23
       32           Facenet512,OpenFace,SFace                                              96.57                  97.23
       33           Facenet512,ArcFace,SFace                                               96.55                   97.3
       34           OpenFace,ArcFace,SFace                                                 93.53                  94.37
       35           VGG-Face,Facenet,Facenet512,OpenFace                                   95.97                  96.52
       36           VGG-Face,Facenet,Facenet512,ArcFace                                    96.20                  96.72
       37           VGG-Face,Facenet,Facenet512,SFace                                      96.12                  96.47
       38           VGG-Face,Facenet,OpenFace,ArcFace                                      95.77                  96.07
       39           VGG-Face,Facenet,OpenFace,SFace                                        95.47                  95.70
       40           VGG-Face,Facenet,ArcFace,SFace                                         95.97                  96.03
       41           VGG-Face,Facenet512,OpenFace,ArcFace                                   96.08                  96.55
       42           VGG-Face,Facenet512,OpenFace,SFace                                     95.97                  96.32
       43           VGG-Face,Facenet512,ArcFace,SFace                                      96.10                  96.63
       44           VGG-Face,OpenFace,ArcFace,SFace                                        95.63                  95.88
       45           Facenet,Facenet512,OpenFace,ArcFace                                    96.92                  97.35
       46           Facenet,Facenet512,OpenFace,SFace                                      96.78                  97.43
       47           Facenet,Facenet512,ArcFace,SFace                                       96.57                  97.43
       48           Facenet,OpenFace,ArcFace,SFace                                         94.93                  95.88
       49           Facenet512,OpenFace,ArcFace,SFace                                      96.80                  97.10
       050          VGG-Face,Facenet,Facenet512,OpenFace,ArcFace                           96.12                  96.72
       51           VGG-Face,Facenet,Facenet512,OpenFace,SFace                             95.98                  96.53
       52           VGG-Face,Facenet,Facenet512,ArcFace,SFace                              96.13                  96.75
       53           VGG-Face,Facenet,OpenFace,ArcFace,SFace                                95.73                  96.07
       54           VGG-Face,Facenet512,OpenFace,ArcFace,SFace                             96.07                  96.55
       55           Facenet,Facenet512,OpenFace,ArcFace,SFace                              96.95                  97.35
       56           VGG-Face,Facenet,Facenet512,OpenFace,ArcFace,SFace                     96.08                  96.67


          Strategic Composition of Clusters for                         composition suggests that the diverse feature
           Optimal Performance: The analysis further                     recognition capabilities of the combined models
           reveals that the most successful clusters often               contribute to a more comprehensive analysis,
           include a combination of the top two performing               thereby enhancing the overall system’s
           models along with a lower-performing one. This                performance.


                                                              18
5.2. Challenges encountered                                          clusters achieved incremental improvements in accuracy,
                                                                     the requisite increase in computational resources was
Throughout this research, we encountered several                     significant. For applications prioritizing computational
challenges that impacted both the implementation of our              efficiency, singular models like Facenet or Facenet512,
experiments and the analysis of results:                             which provide high accuracy without substantial
                                                                     computational overhead, might be more advisable.
        Input Data Normalization: For effective
                                                                     Specifically, the cluster combining Facenet and Facenet512
         performance, each neural network model requires
                                                                     (Cluster ID 5) presents a compelling option, marginally
         input data to be normalized according to the
                                                                     outperforming the accuracy of the Facenet singular model by
         specific training data it was developed with. This
                                                                     0.23%, achieving a 97.68% accuracy rate. This slight
         normalization process involved adjusting the color
                                                                     improvement might justify the additional computational
         space and scaling for each model to match its
                                                                     resources in scenarios where maximizing accuracy is
         training conditions. We successfully applied
                                                                     paramount.
         model-specific normalization for most of the
                                                                          In contexts where the verification system can
         models, ensuring that the input data closely
                                                                     accommodate extended inference times and has access to
         mirrored the conditions under which the models
                                                                     extended computational power, employing model clusters
         were originally trained.
                                                                     could be beneficial. For verification systems bound by
        Z-Score Normalization for Model Output                      computational and time constraints yet seeking to improve
         Embeddings: Given the variance in scale and                 upon the accuracy provided by singular fast-inference
         distribution of embeddings across different                 models like OpenFace, forming clusters with other rapid-
         models, a significant challenge was standardizing           inference models offers a strategic solution. For example,
         these embeddings for consistent comparison. By              pairing OpenFace with SFace led to a significant 4.33%
         implementing Z-Score normalization on each                  accuracy increase over the singular OpenFace model,
         embedding vector, the embedding was adjusted to             achieving an 86.13% accuracy rate. This strategy allows for
         have a mean of 0 and a standard deviation of 1.             a balanced enhancement in accuracy while maintaining
         This crucial step allowed us to mitigate the                essential high-speed inference capabilities, suitable for
         disparities across model outputs.                           applications where both efficiency and accuracy are valued.
        Heavy Computation Without Heavy Server                           The exploration of concatenated model clusters in facial
         Resources: The computation required for                     verification creates numerous opportunities for future
         generating embeddings for 57 clusters, along with           research. A promising direction involves analyzing the
         individual model evaluations, was significant. To           specific features within each model’s embeddings that most
         manage this, the caching mechanism [31] was                 influence verification decisions. By identifying and
         implemented for embeddings post-system setup.               prioritizing these impactful features, it may be possible to
         The strategy enabled the reuse of embeddings                filter out less relevant or noisy features from model
         across different clusters and singular model                embeddings [32]. This approach holds potential not only for
         experiments, saving dozens of hours in                      singular model systems but could notably enhance the
         computational time.                                         performance of clustered model systems by focusing on the
        Poor Accuracy for OpenFace and SFace                        combination of the most determinant features for L2
         Models: The lower-than-expected accuracy for                distance calculations.
         OpenFace and SFace models raised concerns. This                  Future research could also explore the efficiency of
         may have resulted from inaccurate normalization             alternative distance metrics such as Cosine [33] and L1
         information or deviations from the default training         distances. These metrics may produce different distributions,
         data used in these model weights. While this paper          thresholds, and ultimately accuracies for model clusters,
         did not directly address enhancements to these              offering new insights into the optimization of verification
         models’ accuracy, identifying the potential causes          systems. Additionally, further investigations could evaluate
         paves the way for future improvements.                      how these systems scale and perform under larger, higher-
        Average Size and Resolution Dataset While the               quality datasets with more varying conditions, potentially
         CFP dataset was sufficiently comprehensive for              uncovering benefits not observed in the current dataset.
         our experimental purposes, its size and the                      Given that certain models are highly dependent on the
         variability of its data presented limitations. A            alignment of facial images, integrating dynamic alignment
         larger and more diverse dataset could potentially           techniques tailored to each model within a cluster could
         reveal insights and issues not observed with the            improve accuracy. This personalized approach to face
         CFP dataset used in the study. This                         alignment may optimize the performance of each model’s
         acknowledgment serves as a recommendation for               contributions to the cluster. The initial success of combining
         future research directions to explore more                  lower-performing models with fast inference rates suggests a
         extensive datasets for a deeper analysis.                   valuable strategy for developing efficient verification systems
                                                                     suited for embedded environments. Future work could focus
6. Conclusions and future work                                       on identifying and testing combinations of efficient models to
                                                                     create a verification system that balances accuracy with the
The findings from the experiments offer valuable insights
                                                                     computational speed necessary for real-time applications in
into the performance synergy of employing concatenated
                                                                     constrained environments.
model clusters for facial verification systems. While several


                                                                19
In conclusion, the decision to employ singular models or                       Cities and Society 66 (2021). doi: 10.1016/j.scs.
concatenated clusters should be guided by the specific                         2020.102692.
requirements and constraints of the facial verification                 [14]   Y. Li, et al., Improving Deep Learning Feature with
system in question. The strategic composition of clusters,                     Facial Texture Feature for Face Recognition, Wireless
balancing between computational efficiency and marginal                        Personal Commun. 103 (2018) 1195–1206. doi:
                                                                               10.1007/s11277-018-5377-2.
gains in accuracy, remains a critical consideration for the
                                                                        [15]   H. Moon, C. Seo, S. Pan, A Face Recognition System
deployment of robust and effective biometric authentication                    Based on Convolution Neural Network Using Multiple
solutions. Additionally, the model of a decoy system based                     Distance Face, Soft Comput. 21(17) (2016) 4995–5002.
on dynamic attributes for cybercrime investigation, as                         doi: 10.1007/s00500-016-2095-0T.
proposed by Vasylyshyn et al., offers a novel approach to               [16]   Q. Yang, et al., Federated Machine Learning: Concept
enhancing system security and can be integrated into future                    and Applications, ACM Trans. Intel. Syst. Technol.
research to address emerging threats [34].                                     10(2) (2019) 1–19. doi: 10.1145/3298981.
                                                                        [17]   M. Bhuiyan, S. Khushbu, M. Islam, A Deep Learning
References                                                                     Based Assistive System to Classify COVID-19 Face
                                                                               Mask for Human Safety with YOLOv3, International
[1]    G. Alfarsi, et al., Techniques for Face Verification:                   Conference on Computing and Networking
       Literature Review, 2019 International Arab                              Technology (2017). doi: 10.1109/ICCNT.2017.8211490.
       Conference on Information Technology (ACIT) (2019)               [18]   C.-Z. Gao, et al., Privacy-preserving Naive Bayes
       107–112. doi: 10.1109/ACIT47987.201 9.8990975.                          Classifiers Secure Against the Substitution-then-
[2]    S. Yevseiev, et al., Models of Socio-cyber-physical                     comparison Attack, Inf. Sci. 444 (2018) 72–88. doi:
       Systems Security: Monograph, PC TECHNOLOGY                              10.1016/j.ins.2018.02.003.
       CENTER (2023).                                                   [19]   M. Hanmandlu,        D. Gupta,      S. Vasikarla,  Face
[3]    M. Zulfiqar, et al., Deep Face Recognition for                          Recognition Using Elastic Bunch Graph Matching,
       Biometric Authentication, 2019 International                            Applied Imagery Pattern Recognition Workshop
       Conference on Electrical, Communication, and                            (AIPR): Sensing for Control and Augmentation, IEEE
       Computer Engineering (ICECCE) (2019) 1–6. doi:                          (2013) 1–7. doi: 10.1109/AIPR. 2013.6749303.
       10.1109/ICECCE47252.2019.8940725.                                [20]   R. Ghiass, et al., Infrared Face Recognition: A
[4]    S. Sengupta, et al., Frontal to Profile Face Verification               Comprehensive Review of Methodologies and
       in the Wild, IEEE Conference on Applications of                         Datasets, Pattern Recognit. 47 (2014) 2807–2824. doi:
       Computer Vision (2016).                                                 10.1016/j.patcog.2014.03.005.
[5]    Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature            [21]   C. Kotropoulos, I. Pitas, Face Authentication Using
       521 (2015) 436–444. doi: 10.1038/nature14539.                           Morphological Dynamic Link Architecture, Audio-
[6]    C. Ding, D. Tao, Robust Face Recognition via                            and Video-based Biometric Person Authentication
       Multimodal Deep Face Representation, IEEE                               (1997) 169–176. doi: 10.1007/3-540-64473-6_22.
       Transactions on Multimedia 17(11) (2015) 2049–2058.              [22]   V. Brydinskyi, et al., Comparison of Modern Deep
       doi: 10.1109/TMM.2015.2477042.                                          Learning Models for Speaker Verification, Appl. Sci.
[7]    M. Egmont-Petersen, D. de Ridder, H. Handels, Image                     14(4) (2024). doi: 10.3390/app14041329.
       Processing with Neural Networks—a review, Pattern                [23]   Q. Cao, et al., VGGFace2: A Dataset for Recognising
       Recognit. 35(10) (2002) 2279–2301. doi: 10.1016/S0031-                  Faces across Pose and Age, 13th IEEE International
       3203(01)00178-9.                                                        Conference on Automatic Face & Gesture Recognition
[8]    N. Polyzotis, et al., Data Lifecycle Challenges in                      (2018) 67–74. doi: 10.1109/FG.2018. 00020.
       Production Machine Learning: A Survey. SIGMOD                    [24]   F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: A
       Rec. 47(2) (2018) 17–28. doi: 10.1145/3299887. 3299891.                 Unified Embedding for Face Recognition and
[9]    P. Hlushchenko, V. Dudykevych, Exploratory Survey                       Clustering, IEEE Conference on Computer Vision and
       of Access Control Paradigms and Policy Management                       Pattern Recognit. (CVPR) (2015) 815–823. doi:
       Engines, in: Proceedings of the 7th International                       10.1109/CVPR.2015.7298682.
       Workshop on Computer Modeling and Intelligent                    [25]   T. Baltrušaitis, P. Robinson, L.-P. Morency, OpenFace:
       Systems, vol. 3702 (2024) 263–279.                                      An Open Source Facial Behavior Analysis Toolkit,
[10]   X. Wang, et al., Automated Concatenation of                             IEEE Winter Conference on Applications of Computer
       Embeddings for Structured Prediction, Proceedings of                    Vision        (WACV)        (2016)        1–10.     doi:
       the 59th Annual Meeting of the Association for                          10.1109/WACV.2016.7477553.
       Computational Linguistics and the 11th International             [26]   J. Deng, et al., ArcFace: Additive Angular Margin Loss
       Joint Conference on Natural Language Processing 1                       for Deep Face Recognition, IEEE Transactions on
       (2021) 2643–2660. doi: 10.18653/v1/2021.acl-long.206.                   Pattern Analysis and Machine Intelligence 44(10)
[11]   R. Tronci, G. Giacinto, F. Roli, Designing Multiple                     (2022) 5962–5979. doi: 10.1109/TPAMI.2021. 3087709.
       Biometric Systems: Measures of Ensemble                          [27]   J. Wang, et al., SFace: An Efficient Network for Face
       Effectiveness, Eng. Appl. Artificial Intel. 22(1) (2009)                Detection in Large Scale Variations. ArXiv (2018).
       66–78. doi: 10.1016/j.engappai.2008.04.007.                      [28]   A. Jain, K. Nandakumar, A. Ross, Score Normalization
[12]   C. Ding,      D. Tao,      Trunk-Branch        Ensemble                 in Multimodal Biometric Systems, Pattern Recognit.
       Convolutional Neural Networks for Video-Based Face                      38(12) (2005) 2270–2285. doi: 10.1016/j.pat
       Recognition, IEEE Transactions on Pattern Analysis                      cog.2005.01.012.
       and Machine Intelligence 40(4) (2018) 1002–1014. doi:            [29]   V. Perlibakas, Distance Measures for PCA-based Face
       10.1109/TPAMI.2017. 2700390.                                            Recognition, Pattern Recognit. Lett. 25 (2004) 711–724.
[13]   P. Nagrath, et al., SSDMNV2: A Real-time DNN-based                      doi: 10.1016/j.patrec.2004.01.011.
       Face Mask Detection System Using Single Shot                     [30]   F. Günther, S. Fritsch, Neuralnet: Training of Neural
       Multibox Detector and MobileNetV2, Sustainable                          Networks, R. J., 2(30) (2010). doi: 10.32614/RJ-2010-
                                                                               006.

                                                                   20
[31] L. Fasnacht, Mmappickle: Python 3 Module to Store
     Memory-mapped Numpy Array in Pickle Format, J.
     Open     Source      Softw.    3(651)   (2018).    doi:
     10.21105/JOSS.00651.
[32] H. Ye, et al., Towards Robust Neural Graph
     Collaborative Filtering via Structure Denoising and
     Embedding Perturbation, ACM Transactions on
     Information Systems 41 (2022) 1–28. doi:
     10.1145/3568396.
[33] H. Nguyen, L. Bai, Cosine Similarity Metric Learning
     for Face Verification (2010) 709–720. doi: 10.1007/978-
     3-642-19309-5_55.
[34] S. Vasylyshyn, et al., A Model of Decoy System Based
     on Dynamic Attributes for Cybercrime Investigation,
     Eastern-European J. Enterp. Technol. 1(9(121) (2023)
     6–20. doi: 10.15587/1729-4061.2023.273363.


                                                               21

</pre>