=Paper=
{{Paper
|id=Vol-3800/paper2
|storemode=property
|title=Research on the effectiveness of concatenated embeddings in facial verification
|pdfUrl=https://ceur-ws.org/Vol-3800/paper2.pdf
|volume=Vol-3800
|authors=Denys Khanin,Viktor Otenko,Volodymyr Khoma
|dblpUrl=https://dblp.org/rec/conf/csdp/KhaninOK24
}}
==Research on the effectiveness of concatenated embeddings in facial verification==
Research on the effectiveness of concatenated
embeddings in facial verification⋆
Denys Khanin1,*,†, Viktor Otenko1,† and Volodymyr Khoma1,†
1
Lviv Polytechnic National University, Information Security Department, 12 Stepana Bandery str., 79000 Lviv, Ukraine
Abstract
In the era of digital authentication, facial verification systems have become a cornerstone of security
protocols across various applications. This study explores the performance synergy from concatenated
embeddings in enhancing biometric authentication accuracy. By leveraging the Celebrities in Frontal-
Profile dataset (CFP), we investigate whether the fusion of embeddings generated by models such as VGG-
Face, Facenet, OpenFace, ArcFace, and SFace can result in a more robust authentication process. Our
approach is rooted in the hypothesis that the diverse strengths of these models, when combined, can address
the limitations inherent in single-model systems, thus providing a more comprehensive solution to facial
verification. The approach involves computing the L2 distance between normalized concatenated
embeddings of an input face image and an anchor, thereby determining the authenticity of the individual.
Experiments are designed to compare the performance of singular model embeddings against concatenated
embeddings, employing metrics such as accuracy, False Acceptance Rate (FAR), and False Rejection Rate
(FRR). One of the critical aspects of our research is the implementation of Z-Score normalization and L2
normalization processes to standardize the embeddings from different models. These normalization
techniques are vital in ensuring that the diverse outputs from various models are effectively combined,
maintaining balance and consistency in the feature vectors. Additionally, our methodology includes a
comprehensive evaluation framework that meticulously analyses the trade-offs between computational
efficiency and performance gains achieved through model concatenation. The findings of this research
could significantly contribute to the development of more secure and reliable facial verification systems by
using multiple existing models without the need for new model research, designing, and training. This
approach not only optimizes resource utilization but also provides a scalable solution that can be readily
adapted to existing systems, enhancing their security measures without extensive overhauls. Furthermore,
the study’s insights into the integration of model outputs could pave the way for future innovations in
biometric authentication, encouraging the development of hybrid systems that combine the best attributes
of various neural network architectures. This research underscores the potential of concatenated
embeddings in revolutionizing facial verification technology. By harnessing the power of multiple neural
network models, we can create a system that delivers superior accuracy and robustness, addressing the
pressing need for advanced security solutions. This study sets the stage for further exploration into multi-
model integration, offering a promising direction for future advancements in biometric authentication.
Keywords
facial verification, biometric authentication, neural networks, concatenated embeddings, machine
learning, deep learning, model fusion, facial recognition, verification accuracy, security systems 1
1. Introduction measures, as highlighted by Yevseiev et al. in their detailed
monograph on socio-cyber-physical systems security [2].
In today’s digital landscape, facial verification [1] systems This research investigates the potential of enhancing
have become pivotal in ensuring the security and facial verification accuracy through concatenated
authenticity of individual identities across various embeddings from multiple neural network [3] models.
applications, from mobile device security to access controls Utilizing the CFP dataset [4], we aim to determine whether
in sensitive environments. The adoption of facial the integration of various model embeddings can produce a
recognition technology is driven by its non-intrusive nature more robust and secure biometric authentication system. By
and the unique, hard-to-replicate characteristics of the examining the performance synergy of these concatenated
human face, positioning it as a front-runner in biometric embeddings in comparison to singular model outputs, this
authentication methods. Furthermore, the integration of study aims to contribute to the development of more
socio-cyber-physical systems security frameworks provides advanced and reliable facial verification techniques with the
a comprehensive approach to enhancing cybersecurity existing set of models for facial verification.
CSDP-2024: Cyber Security and Data Protection, June 30, 2024, Lviv, 0009-0001-4009-0202 (D. Khanin); 0000-0003-4781-7766 (V. Otenko);
Ukraine 0000-0001-9391-6525 (V. Khoma)
∗ Corresponding author.
© 2024 Copyright for this paper by its authors. Use permitted under
†
These authors contributed equally. Creative Commons License Attribution 4.0 International (CC BY 4.0).
denys.o.khanin@lpnu.ua (D. Khanin); viktor.i.otenko@lpnu.ua
(V. Otenko); volodymyr.v.khoma@lpnu.ua (V. Khoma)
CEUR
Workshop
ceur-ws.org
ISSN 1613-0073
12
Proceedings
1.1. Background the long run. The hypothesis presented in this study arises
from these challenges, proposing the use of concatenated
The evolution of biometric authentication technologies has embeddings [10] from multiple models as a means to bypass
been significantly influenced by advancements in machine the constraints of singular model dependency. This
learning and deep learning [5], particularly in the domain of approach aims to explore whether integrating the diverse
facial recognition. Neural network models, such as VGG- capabilities of established models can offer a more robust
Face, Facenet, OpenFace, ArcFace, and SFace, represent the and accurate solution for facial verification, thus addressing
forefront of research and development in this field. These the core issues associated with the current methodologies.
models are designed to extract and analyze facial features
[6] from images, transforming them into numerical 1.3. Objectives of the research
representations known as embeddings. These embeddings
capture the unique aspects of an individual’s facial This research focuses on key goals designed to explore
structure, enabling systems to perform verification tasks improvements in facial verification systems:
with high degrees of accuracy. The success of these models
is predicated on their ability to learn complex patterns and To create a system to test both individual and
variations in facial features across diverse datasets, under combined embeddings from models such as VGG-
various conditions of lighting, pose, and expression. Face, Facenet, OpenFace, ArcFace, and SFace,
In facial verification technology, using just one neural leveraging the CFP dataset for comprehensive
network model comes with certain limitations. Different analysis.
models excel in various aspects, such as accuracy, speed of To measure the effectiveness of each model and
processing, and their ability to handle changes in lighting or their combinations using accuracy, FAR, and FRR
facial features [7]. The drive for better performance and [11].
reliability in these systems often requires large and varied To analyze system performance across single and
datasets for training, which can be resource-intensive. combined model embeddings to identify the most
Additionally, there is a constant need to develop and test effective strategies for facial verification.
new model architectures that can effectively transform To extract insights for potential system
facial images into useful numerical data, known as enhancements and recognize any associated
embeddings. This scenario suggests that combining several challenges with multi-model embeddings.
neural network models might offer a more efficient
solution. By leveraging the unique strengths of multiple 2. Related works
models, such an approach could potentially overcome the
The problem of enhancing the accuracy and robustness of
common challenges in facial verification. This sets the stage
facial verification systems has been a focal point in
for investigating how the integration of outputs from
numerous studies due to ongoing challenges such as
different models could lead to improvements in system
spoofing, adversarial attacks, and varying conditions of
performance.
image capture. Ding and Tao (2018) addressed the
1.2. Problem statement limitations of traditional face recognition approaches by
introducing Trunk-Branch Ensemble Convolutional Neural
The hypothesis driving this research emerges from a critical Networks for video-based face recognition, which improved
challenge within the realm of facial verification systems: the recognition accuracy but still faced challenges in handling
limitations of using single-model architectures in achieving dynamic and complex environments [12]. Nagrath et al.
consistently high accuracy across diverse conditions. This (2021) highlighted the need for lightweight and efficient
issue underscores the necessity of exploring alternative neural networks like MobileNetV2 for real-time applications,
strategies that can leverage the strengths of existing but their study also pointed out the difficulties in maintaining
technologies without the need for constant model high accuracy under real-time constraints [13].
retraining, dataset updates, or the development of new Li et al. (2018) focused on enhancing deep learning
architectures [8]. Moreover, an exploratory survey by features with facial texture features for improved
Hlushchenko and Dudykevych on access control paradigms recognition performance, but the integration of different
highlights the evolving landscape of policy management feature extraction techniques remained complex and
and its implications for biometric security systems [9]. computationally intensive [14].
Current facial verification systems often rely on a Moon et al. (2016) developed a face recognition system
singular neural network model, which may excel under based on convolutional neural networks using multiple
specific conditions but fall short in others. This reliance distance faces, which further emphasized the necessity of
poses a significant problem as it demands continuous integrating various models to enhance system robustness, yet
updates to the model and its underlying dataset to address it also indicated the increased computational demands [15].
emerging challenges and maintain system performance. Yang et al. (2019) explored federated machine learning
Such an iterative cycle of development is resource- for face verification, addressing privacy and security
intensive, requiring substantial investments in data concerns while maintaining high verification accuracy.
collection, processing, and computational power. Their research underscored the challenge of managing
Additionally, the creation of new model architectures to decentralized data and the need for efficient data integration
improve feature extraction and classification accuracy techniques [16].
further complicates the process, making it unsustainable in
13
Bhuiyan et al. (2017) presented a noise-resistant network for balanced representation of genders, ethnicities,
face recognition under noisy conditions, which highlighted and professions. This approach ensures the dataset
the ongoing challenge of achieving robust performance in reflects the complexity and diversity of facial
diverse real-world scenarios [17]. appearances and expressions in everyday life.
Recent advancements have shown that despite
significant improvements in facial verification technologies, CFP dataset examples are shown below in Fig. 1: 4
several unresolved issues persist. Gao et al. (2018) discussed random face images for each of the 3 individuals from the
privacy-preserving techniques in face recognition, which dataset.
remain a critical concern in the deployment of these
systems [18]. Furthermore, the study by Hanmandlu et al.
(2013) on Elastic Bunch Graph Matching for face
recognition identified the need for better handling of pose
and illumination variations [19].
The integration of multiple models to leverage their
unique strengths and mitigate individual weaknesses is a
promising approach, as highlighted by recent research on Figure 1: CFP dataset example images of individuals
hybrid and ensemble methods. However, this integration
introduces new challenges, such as increased computational 3.2. Models
complexity and the need for sophisticated normalization
techniques to ensure consistent and reliable performance This study employs various neural network models, each
[20, 21]. Additionally, Brydinskyi et al. provide a with unique architectures and characteristics, to determine
comparative analysis of modern deep-learning models for the effectiveness of concatenated systems in facial
speaker verification, demonstrating the critical role of verification. The models utilized include VGG-Face,
model selection and combination in enhancing verification Facenet, Facenet512, OpenFace, ArcFace, and SFace, each
accuracy [22]. designed to extract and analyze facial features from images,
These studies collectively underscore the necessity of transforming them into numerical representations known
integrating multiple models to leverage their unique as embeddings. A comparison of architecture, embedding
strengths and mitigate individual weaknesses, aligning with dimensions, training focus, and key features of each model
our research objective of using concatenated embeddings to is described in Table 1.
enhance facial verification systems’ accuracy and
robustness. The proposed approach builds on the
3.3. Concatenation system
foundations laid by these works, aiming to address their The concatenation system forms a pivotal component of our
limitations through the strategic combination of diverse methodology, designed to harness the collective strengths
neural network models. of multiple facial recognition models. This approach seeks
to enhance the robustness and accuracy of facial verification
3. Methodology by leveraging the diverse feature representations extracted
by different models. The process involves several key steps,
3.1. Dataset each contributing to the formation of a comprehensive
The CFP dataset plays a pivotal role in our study, offering a feature set that is used for facial verification:
nuanced exploration of facial verification across varying
1. Model Selection: The first step involves selecting a
poses. Its construction and attributes are as follows:
set of neural network models, such as VGG-Face,
Size and Volume: The dataset consists of images of Facenet, OpenFace, ArcFace, and SFace, each
500 individuals, with 10 frontal images per known for its unique approach to capturing facial
individual. features. This diversity is crucial for assembling a
Resolutions and Quality: Including a mix of wide-ranging feature set.
resolutions and qualities, the dataset mirrors the 2. Output Extraction: For each model, we extract the
variability encountered in real-world applications, output embeddings that represent the facial
ranging from high-definition to lower-quality features identified by that model. These
images, challenging the adaptability of verification embeddings are the high-dimensional vectors that
systems to varying image fidelity. encapsulate the model’s interpretation of the facial
features.
Diversity of Conditions: It spans a broad spectrum
3. Z-Score Normalization [28]: To standardize the
of real-life conditions— different lighting scenarios
embeddings from different models, we apply Z-
from natural daylight to artificial and low light
Score normalization to each embedding vector.
environments, varied backgrounds from simple to
This normalization process adjusts the
cluttered scenes, and a wide range of facial
embeddings so that they have a mean of 0 and a
expressions and poses, especially focusing on
standard deviation of 1. This step is essential for
extreme profile views that pose a significant
mitigating the variance in scale and distribution of
challenge to current algorithms.
the embeddings across different models, ensuring
Source: Images are sourced from the internet,
that no single model’s output disproportionately
capturing “in the wild” conditions that include a
influences the concatenated feature vector.
14
Table 1
Neural network model characteristics
Embedding
Model Architecture Training Focus Key Features
Dimension
VGG- VGG-16 4096 Facial Recognition Deep convolutional layers, trained on large facial image
Face[23] dataset, use small (3×3) convolution filters, capture fine facial
details
Facenet[24] Inception- 128 Triplet Loss Compact embeddings optimize distance between
ResNet v1 Function similar/dissimilar faces, use triplet-based loss function to
enhance verification accuracy
Facenet512 Inception- 512 Extended Triplet Higher-dimensional embeddings, capture more nuanced
ResNet v1 Loss Function features, an extension of Facenet with increased embedding
size for a richer representation
OpenFace[25] nn4.small2 128 Real-time Balances accuracy and computational efficiency, suitable for
Recognition real-time applications, a lightweight model designed for
practical use on modest hardware
ArcFace[26] ResNet-100 512 Additive Angular Enhances discriminative power, improves geometric accuracy
Margin Loss of feature space, uses additive angular margin loss to manage
class margins
SFace[27] Xception-39 128 Scale Variations Efficient handling of scale issues, rapid and accurate
recognition, perform well on high-resolution images, notable
efficiency and accuracy, especially on large datasets
4. Concatenation: Following normalization, the 𝐹𝑃
𝐹𝐴𝑅 = , (1)
embeddings from all selected models are (𝐹𝑃 + 𝑇𝑁)
concatenated into a single, comprehensive feature where FP is the number of false positives, and TN is the
vector. This concatenated vector represents a number of true negatives.
fusion of the diverse facial features recognized by False Rejection Rate: FRR assesses the frequency at
the individual models, capturing a broader which the system wrongly rejects an authentic match. This
spectrum of facial characteristics than any single metric is important for understanding the usability of the
model could. system, as a high FRR may lead to user frustration. Lower
5. L2 Normalization [29]: The concatenated feature FRR values are desirable, indicating better performance.
vector undergoes L2 normalization, which scales FRR is calculated as:
the vector to have a unit norm. This normalization
step is critical for preparing the feature vector for 𝐹𝑁
𝐹𝑅𝑅 = , (2)
similarity calculations, ensuring that the (𝑇𝑃 + 𝐹𝑁)
magnitude of the vector does not affect the where FN represents false negatives, and TP denotes
distance measurements. true positives.
6. EER Determination: Upon calculating the L2 Accuracy: This metric measures the overall
distances between facial image pairs, we identify effectiveness of the facial verification system. It is calculated
the Equal Error Rate (EER), the point where the as the ratio of correctly identified instances (both true
FAR and the FRR converge. Determining the EER positives and true negatives) to the total number of
is essential, as it represents an optimal balance instances. High accuracy indicates that the system is
point for the system’s decision threshold, effective in correctly verifying facial identities. The formula
minimizing both false positives and false for accuracy is given by:
negatives. This optimal threshold is then used to
distinguish between matches and non-matches 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = , (3)
across the entire dataset, allowing for the proper (𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁)
measurement of verification metrics such as where TP represents true positives, TN denotes true
accuracy, FAR, and FRR. negatives, FP stands for false positives, and FN signifies false
negatives.
3.4. Evaluation metrics Together, these metrics provide a comprehensive
To analyze the performance of our facial verification overview of the system’s performance, offering insights into
systems, including both single and combined models, we its accuracy, security, and usability. By evaluating these
use three main metrics: accuracy, FAR, and FRR. These metrics, we can make informed decisions on optimizing
metrics help us understand the systems’ performance in model configurations and improving facial verification
correctly identifying faces. systems.
False Acceptance Rate: FAR measures the likelihood
that the system incorrectly verifies an impostor as a genuine 3.5. Technical setup
user. It is crucial to evaluate the security aspect of the facial Experiments were conducted on a defined technical
verification system, with lower values indicating higher framework comprising specific hardware and software
security. FAR is calculated as: components.
Hardware Configuration: MacBook Pro 16 with an
M1 Pro processor and 16GB RAM, offering enough
15
computational power for handling neural network For OpenFace and SFace models:
operations.
Software Configuration: 𝑖𝑚𝑔
𝑖𝑚𝑔 = . (7)
255
Python 3.11: Selected for its widespread support
for data analysis and machine learning tasks. 4.2. Singular models evaluation
Tensorflow-metal 1.1.0: Optimized for the M1 Pro, In the evaluation phase of our experiments, each neural
enhancing machine learning computation speeds. network model was assessed individually to establish its
OpenCV-python 4.9.0: Utilized for image performance on the CFP dataset. A crucial part of this
processing tasks such as loading, resizing, and assessment involved determining the EER for each model,
cropping. which provides a threshold at which the rate of false
Deepface 0.0.83: A library providing access to acceptances is equal to the rate of false rejections.
several facial recognition model weights (VGG- The process began with the calculation of distances
Face, Facenet, OpenFace, ArcFace, SFace) and their between facial embeddings for both genuine and impostor
functionalities, streamlining the embedding pairs. Following this, we computed the EER for each model,
extraction. which then served as a basis for determining the
corresponding accuracy at the EER point and the best
4. Experiments and results overall accuracy achieved by the model. These metrics give
us insight into the models’ capabilities in facial verification
4.1. Data preprocessing tasks under the diverse conditions presented by the CFP
dataset.
Data preprocessing is a crucial initial phase in our
The results of the singular model evaluations are
experiment, ensuring facial images are properly conditioned
summarized in Table 2.
for analysis by various neural network models. Here’s an
outline of the preprocessing steps undertaken: Table 2
Loading Images: Images are first loaded in RGB color Singular model metrics on the CFP dataset
space, retaining their essential color information which is
Model EER(%) EER Accuracy(%) Best Accuracy(%)
crucial for accurate analysis of facial features.
VGG-Face 4.7 95.28 95.28
Scaling Pixel Values: To standardize the images, pixel
Facenet 3.4 96.62 97.45
values for each color channel are scaled to a range from 0 to Facenet512 3.15 96.85 97.37
255. OpenFace 18.3 81.70 81.72
Model-Specific Normalization: Depending on each ArcFace 5.95 94.07 94.65
model’s requirements, specific normalization techniques are SFace 18.5 81.42 81.80
applied to the image data to match the conditions under
which the models were trained [30]. Analyzing results, we observe a wide range in performance
For Facenet model: across different models. Models such as Facenet and
Facenet512 show promising EER values and high accuracy,
𝑖𝑚𝑔 − 𝑚𝑒𝑎𝑛(𝑖𝑚𝑔) indicating their robustness in handling facial verification.
𝑖𝑚𝑔 = , (4)
𝑠𝑡𝑑(𝑖𝑚𝑔) Conversely, models like OpenFace and SFace, demonstrate
where mean and std are the mean and standard the challenges of achieving high accuracy in diverse CFP
deviation of the image’s pixel values, respectively. dataset conditions.
For Facenet512 and ArcFace models:
4.3. Concatenated clusters evaluation
𝑖𝑚𝑔
𝑖𝑚𝑔 = − 1, (5) The exploration of concatenated clusters is an integral part
127
of the research, aimed at harnessing the collective strengths
For the VGGFace model:
of multiple neural network models to enhance facial
93.5940 verification accuracy. This section discusses the evaluation
𝑖𝑚𝑔 = 𝑖𝑚𝑔 − 104.7624 , (6) of clusters formed by all possible combinations of six
129.18633 distinct models: VGG-Face, Facenet, Facenet512, OpenFace,
this formula represents the subtraction of mean values ArcFace, and SFace. Each cluster is identified by a unique ID
for each color channel (R, G, B) based on VGGFace1 training for ease of reference and comparative analysis.
data.
16
Figure 2: Visual analysis of performance metrics for Facenet and Facenet512 cluster (Cluster ID 5)
The evaluation methodology began with the contrasts with clusters of high-performing
determination of the EER for each cluster. EER serves as models, which, on average, only show about a
a crucial metric for assessing the balance between 0.5% improvement in accuracy. This
security and user convenience. By employing this observation suggests that strategic pairing,
threshold, we derived the EER-based accuracy and the especially involving models with varied
best accuracy achievable across a range of thresholds, strengths, can effectively compensate for
thereby quantifying the models’ verification capabilities. individual weaknesses.
The metrics graphs and threshold range analysis Variable Outcomes from Mixed Model
examples are shown in Fig. 2. Clusters: Not all model combinations lead to
Following the graphical analysis, the performance positive outcomes. In some cases, such as the
results for each cluster are presented in Table 3. This table cluster of Facenet and VGG-Face(Cluster ID 0),
arranges the EER accuracy and the best accuracy observed the resulting accuracy was slightly lower than
for each cluster in 57 combinations. that of the Facenet model on its own. This
In the evaluation of concatenated clusters, our data points to the complexity of model interactions
indicates that selected clusters achieve a marginal within clusters and indicates that combining
increase in accuracy over the highest-performing models does not guarantee enhanced
individual model, Facenet512. Specifically, clusters 5, 9, performance and may result in suboptimal
11, 25, 26, 27, 45, and 55 demonstrate a modest results in certain configurations.
enhancement, improving upon the best singular model’s Considerations on Computational
accuracy by approximately 0.23%. While this Efficiency: While some model clusters achieve
improvement showcases the potential advantages of minor improvements in accuracy, like Facenet
model concatenation, it is crucial to consider the with Facenet512(Cluster ID 5) with a 0.23%
computational trade-offs associated with such a strategy. increase, the requisite computational resources
increase significantly. This raises important
5. Discussions considerations about the cost-benefit ratio of
employing concatenated models, especially
5.1. Insights from the results when the gains in performance are marginal
The study’s exploration into the performance of facial compared to the added computational demand.
verification models, individually and in combined Maintaining High Accuracy and Security: It
clusters, has revealed several key insights: is noteworthy that both individual and
clustered models achieving the highest
Impact of Model Pairing on Performance: performance were able to maintain their
Our findings highlight a notable trend where accuracy without any false acceptances on the
clusters combining models with lower initial CFP dataset. This demonstrates their potential
accuracy see significant performance boosts. in scenarios demanding high security, where
For instance, pairing OpenFace with maintaining accuracy without compromising
Sface(Cluster ID 13) resulted in a 4.33% increase on false acceptance rates is crucial.
in accuracy, achieving an 86.13% rate. This
17
Table 3
Cluster models metrics on the CFP dataset
Cluster-ID Models EER Accuracy(%) Best Accuracy(%)
0 VGG-Face,Facenet 95.63 95.63
1 VGG-Face,Facenet512 95.97 96.18
2 VGG-Face,OpenFace 95.33 95.33
3 VGG-Face,ArcFace 95.88 95.88
4 VGG-Face,SFace 95.55 95.55
5 Facenet,Facenet512 97.13 97.68
6 Facenet,OpenFace 95.25 95.53
7 Facenet,ArcFace 95.25 96.35
8 Facenet,SFace 95.57 96.13
9 Facenet512,OpenFace 96.87 97.28
10 Facenet512,ArcFace 96.65 97.43
11 Facenet512,SFace 97.20 97.33
12 OpenFace,ArcFace 93.53 94.83
13 OpenFace,SFace 85.83 86.13
14 ArcFace,SFace 93.67 94.62
15 VGG-Face,Facenet,Facenet512 96.13 96.35
16 VGG-Face,Facenet,OpenFace 95.57 95.63
17 VGG-Face,Facenet,ArcFace 95.93 96.03
18 VGG-Face,Facenet,SFace 95.67 95.67
19 VGG-Face,Facenet512,OpenFace 95.90 96.25
20 VGG-Face,Facenet512,ArcFace 96.13 96.52
21 VGG-Face,Facenet512,SFace 95.92 96.23
22 VGG-Face,OpenFace,ArcFace 95.68 95.98
23 VGG-Face,OpenFace,SFace 95.35 95.35
24 VGG-Face,ArcFace,SFace 95.82 95.92
25 Facenet,Facenet512,OpenFace 96.68 97.55
26 Facenet,Facenet512,ArcFace 96.65 97.53
27 Facenet,Facenet512,SFace 97.30 97.57
28 Facenet,OpenFace,ArcFace 94.90 96.13
29 Facenet,OpenFace,SFace 94.38 94.62
30 Facenet,ArcFace,SFace 95.98 96.15
31 Facenet512,OpenFace,ArcFace 96.80 97.23
32 Facenet512,OpenFace,SFace 96.57 97.23
33 Facenet512,ArcFace,SFace 96.55 97.3
34 OpenFace,ArcFace,SFace 93.53 94.37
35 VGG-Face,Facenet,Facenet512,OpenFace 95.97 96.52
36 VGG-Face,Facenet,Facenet512,ArcFace 96.20 96.72
37 VGG-Face,Facenet,Facenet512,SFace 96.12 96.47
38 VGG-Face,Facenet,OpenFace,ArcFace 95.77 96.07
39 VGG-Face,Facenet,OpenFace,SFace 95.47 95.70
40 VGG-Face,Facenet,ArcFace,SFace 95.97 96.03
41 VGG-Face,Facenet512,OpenFace,ArcFace 96.08 96.55
42 VGG-Face,Facenet512,OpenFace,SFace 95.97 96.32
43 VGG-Face,Facenet512,ArcFace,SFace 96.10 96.63
44 VGG-Face,OpenFace,ArcFace,SFace 95.63 95.88
45 Facenet,Facenet512,OpenFace,ArcFace 96.92 97.35
46 Facenet,Facenet512,OpenFace,SFace 96.78 97.43
47 Facenet,Facenet512,ArcFace,SFace 96.57 97.43
48 Facenet,OpenFace,ArcFace,SFace 94.93 95.88
49 Facenet512,OpenFace,ArcFace,SFace 96.80 97.10
050 VGG-Face,Facenet,Facenet512,OpenFace,ArcFace 96.12 96.72
51 VGG-Face,Facenet,Facenet512,OpenFace,SFace 95.98 96.53
52 VGG-Face,Facenet,Facenet512,ArcFace,SFace 96.13 96.75
53 VGG-Face,Facenet,OpenFace,ArcFace,SFace 95.73 96.07
54 VGG-Face,Facenet512,OpenFace,ArcFace,SFace 96.07 96.55
55 Facenet,Facenet512,OpenFace,ArcFace,SFace 96.95 97.35
56 VGG-Face,Facenet,Facenet512,OpenFace,ArcFace,SFace 96.08 96.67
Strategic Composition of Clusters for composition suggests that the diverse feature
Optimal Performance: The analysis further recognition capabilities of the combined models
reveals that the most successful clusters often contribute to a more comprehensive analysis,
include a combination of the top two performing thereby enhancing the overall system’s
models along with a lower-performing one. This performance.
18
5.2. Challenges encountered clusters achieved incremental improvements in accuracy,
the requisite increase in computational resources was
Throughout this research, we encountered several significant. For applications prioritizing computational
challenges that impacted both the implementation of our efficiency, singular models like Facenet or Facenet512,
experiments and the analysis of results: which provide high accuracy without substantial
computational overhead, might be more advisable.
Input Data Normalization: For effective
Specifically, the cluster combining Facenet and Facenet512
performance, each neural network model requires
(Cluster ID 5) presents a compelling option, marginally
input data to be normalized according to the
outperforming the accuracy of the Facenet singular model by
specific training data it was developed with. This
0.23%, achieving a 97.68% accuracy rate. This slight
normalization process involved adjusting the color
improvement might justify the additional computational
space and scaling for each model to match its
resources in scenarios where maximizing accuracy is
training conditions. We successfully applied
paramount.
model-specific normalization for most of the
In contexts where the verification system can
models, ensuring that the input data closely
accommodate extended inference times and has access to
mirrored the conditions under which the models
extended computational power, employing model clusters
were originally trained.
could be beneficial. For verification systems bound by
Z-Score Normalization for Model Output computational and time constraints yet seeking to improve
Embeddings: Given the variance in scale and upon the accuracy provided by singular fast-inference
distribution of embeddings across different models like OpenFace, forming clusters with other rapid-
models, a significant challenge was standardizing inference models offers a strategic solution. For example,
these embeddings for consistent comparison. By pairing OpenFace with SFace led to a significant 4.33%
implementing Z-Score normalization on each accuracy increase over the singular OpenFace model,
embedding vector, the embedding was adjusted to achieving an 86.13% accuracy rate. This strategy allows for
have a mean of 0 and a standard deviation of 1. a balanced enhancement in accuracy while maintaining
This crucial step allowed us to mitigate the essential high-speed inference capabilities, suitable for
disparities across model outputs. applications where both efficiency and accuracy are valued.
Heavy Computation Without Heavy Server The exploration of concatenated model clusters in facial
Resources: The computation required for verification creates numerous opportunities for future
generating embeddings for 57 clusters, along with research. A promising direction involves analyzing the
individual model evaluations, was significant. To specific features within each model’s embeddings that most
manage this, the caching mechanism [31] was influence verification decisions. By identifying and
implemented for embeddings post-system setup. prioritizing these impactful features, it may be possible to
The strategy enabled the reuse of embeddings filter out less relevant or noisy features from model
across different clusters and singular model embeddings [32]. This approach holds potential not only for
experiments, saving dozens of hours in singular model systems but could notably enhance the
computational time. performance of clustered model systems by focusing on the
Poor Accuracy for OpenFace and SFace combination of the most determinant features for L2
Models: The lower-than-expected accuracy for distance calculations.
OpenFace and SFace models raised concerns. This Future research could also explore the efficiency of
may have resulted from inaccurate normalization alternative distance metrics such as Cosine [33] and L1
information or deviations from the default training distances. These metrics may produce different distributions,
data used in these model weights. While this paper thresholds, and ultimately accuracies for model clusters,
did not directly address enhancements to these offering new insights into the optimization of verification
models’ accuracy, identifying the potential causes systems. Additionally, further investigations could evaluate
paves the way for future improvements. how these systems scale and perform under larger, higher-
Average Size and Resolution Dataset While the quality datasets with more varying conditions, potentially
CFP dataset was sufficiently comprehensive for uncovering benefits not observed in the current dataset.
our experimental purposes, its size and the Given that certain models are highly dependent on the
variability of its data presented limitations. A alignment of facial images, integrating dynamic alignment
larger and more diverse dataset could potentially techniques tailored to each model within a cluster could
reveal insights and issues not observed with the improve accuracy. This personalized approach to face
CFP dataset used in the study. This alignment may optimize the performance of each model’s
acknowledgment serves as a recommendation for contributions to the cluster. The initial success of combining
future research directions to explore more lower-performing models with fast inference rates suggests a
extensive datasets for a deeper analysis. valuable strategy for developing efficient verification systems
suited for embedded environments. Future work could focus
6. Conclusions and future work on identifying and testing combinations of efficient models to
create a verification system that balances accuracy with the
The findings from the experiments offer valuable insights
computational speed necessary for real-time applications in
into the performance synergy of employing concatenated
constrained environments.
model clusters for facial verification systems. While several
19
In conclusion, the decision to employ singular models or Cities and Society 66 (2021). doi: 10.1016/j.scs.
concatenated clusters should be guided by the specific 2020.102692.
requirements and constraints of the facial verification [14] Y. Li, et al., Improving Deep Learning Feature with
system in question. The strategic composition of clusters, Facial Texture Feature for Face Recognition, Wireless
balancing between computational efficiency and marginal Personal Commun. 103 (2018) 1195–1206. doi:
10.1007/s11277-018-5377-2.
gains in accuracy, remains a critical consideration for the
[15] H. Moon, C. Seo, S. Pan, A Face Recognition System
deployment of robust and effective biometric authentication Based on Convolution Neural Network Using Multiple
solutions. Additionally, the model of a decoy system based Distance Face, Soft Comput. 21(17) (2016) 4995–5002.
on dynamic attributes for cybercrime investigation, as doi: 10.1007/s00500-016-2095-0T.
proposed by Vasylyshyn et al., offers a novel approach to [16] Q. Yang, et al., Federated Machine Learning: Concept
enhancing system security and can be integrated into future and Applications, ACM Trans. Intel. Syst. Technol.
research to address emerging threats [34]. 10(2) (2019) 1–19. doi: 10.1145/3298981.
[17] M. Bhuiyan, S. Khushbu, M. Islam, A Deep Learning
References Based Assistive System to Classify COVID-19 Face
Mask for Human Safety with YOLOv3, International
[1] G. Alfarsi, et al., Techniques for Face Verification: Conference on Computing and Networking
Literature Review, 2019 International Arab Technology (2017). doi: 10.1109/ICCNT.2017.8211490.
Conference on Information Technology (ACIT) (2019) [18] C.-Z. Gao, et al., Privacy-preserving Naive Bayes
107–112. doi: 10.1109/ACIT47987.201 9.8990975. Classifiers Secure Against the Substitution-then-
[2] S. Yevseiev, et al., Models of Socio-cyber-physical comparison Attack, Inf. Sci. 444 (2018) 72–88. doi:
Systems Security: Monograph, PC TECHNOLOGY 10.1016/j.ins.2018.02.003.
CENTER (2023). [19] M. Hanmandlu, D. Gupta, S. Vasikarla, Face
[3] M. Zulfiqar, et al., Deep Face Recognition for Recognition Using Elastic Bunch Graph Matching,
Biometric Authentication, 2019 International Applied Imagery Pattern Recognition Workshop
Conference on Electrical, Communication, and (AIPR): Sensing for Control and Augmentation, IEEE
Computer Engineering (ICECCE) (2019) 1–6. doi: (2013) 1–7. doi: 10.1109/AIPR. 2013.6749303.
10.1109/ICECCE47252.2019.8940725. [20] R. Ghiass, et al., Infrared Face Recognition: A
[4] S. Sengupta, et al., Frontal to Profile Face Verification Comprehensive Review of Methodologies and
in the Wild, IEEE Conference on Applications of Datasets, Pattern Recognit. 47 (2014) 2807–2824. doi:
Computer Vision (2016). 10.1016/j.patcog.2014.03.005.
[5] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature [21] C. Kotropoulos, I. Pitas, Face Authentication Using
521 (2015) 436–444. doi: 10.1038/nature14539. Morphological Dynamic Link Architecture, Audio-
[6] C. Ding, D. Tao, Robust Face Recognition via and Video-based Biometric Person Authentication
Multimodal Deep Face Representation, IEEE (1997) 169–176. doi: 10.1007/3-540-64473-6_22.
Transactions on Multimedia 17(11) (2015) 2049–2058. [22] V. Brydinskyi, et al., Comparison of Modern Deep
doi: 10.1109/TMM.2015.2477042. Learning Models for Speaker Verification, Appl. Sci.
[7] M. Egmont-Petersen, D. de Ridder, H. Handels, Image 14(4) (2024). doi: 10.3390/app14041329.
Processing with Neural Networks—a review, Pattern [23] Q. Cao, et al., VGGFace2: A Dataset for Recognising
Recognit. 35(10) (2002) 2279–2301. doi: 10.1016/S0031- Faces across Pose and Age, 13th IEEE International
3203(01)00178-9. Conference on Automatic Face & Gesture Recognition
[8] N. Polyzotis, et al., Data Lifecycle Challenges in (2018) 67–74. doi: 10.1109/FG.2018. 00020.
Production Machine Learning: A Survey. SIGMOD [24] F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: A
Rec. 47(2) (2018) 17–28. doi: 10.1145/3299887. 3299891. Unified Embedding for Face Recognition and
[9] P. Hlushchenko, V. Dudykevych, Exploratory Survey Clustering, IEEE Conference on Computer Vision and
of Access Control Paradigms and Policy Management Pattern Recognit. (CVPR) (2015) 815–823. doi:
Engines, in: Proceedings of the 7th International 10.1109/CVPR.2015.7298682.
Workshop on Computer Modeling and Intelligent [25] T. Baltrušaitis, P. Robinson, L.-P. Morency, OpenFace:
Systems, vol. 3702 (2024) 263–279. An Open Source Facial Behavior Analysis Toolkit,
[10] X. Wang, et al., Automated Concatenation of IEEE Winter Conference on Applications of Computer
Embeddings for Structured Prediction, Proceedings of Vision (WACV) (2016) 1–10. doi:
the 59th Annual Meeting of the Association for 10.1109/WACV.2016.7477553.
Computational Linguistics and the 11th International [26] J. Deng, et al., ArcFace: Additive Angular Margin Loss
Joint Conference on Natural Language Processing 1 for Deep Face Recognition, IEEE Transactions on
(2021) 2643–2660. doi: 10.18653/v1/2021.acl-long.206. Pattern Analysis and Machine Intelligence 44(10)
[11] R. Tronci, G. Giacinto, F. Roli, Designing Multiple (2022) 5962–5979. doi: 10.1109/TPAMI.2021. 3087709.
Biometric Systems: Measures of Ensemble [27] J. Wang, et al., SFace: An Efficient Network for Face
Effectiveness, Eng. Appl. Artificial Intel. 22(1) (2009) Detection in Large Scale Variations. ArXiv (2018).
66–78. doi: 10.1016/j.engappai.2008.04.007. [28] A. Jain, K. Nandakumar, A. Ross, Score Normalization
[12] C. Ding, D. Tao, Trunk-Branch Ensemble in Multimodal Biometric Systems, Pattern Recognit.
Convolutional Neural Networks for Video-Based Face 38(12) (2005) 2270–2285. doi: 10.1016/j.pat
Recognition, IEEE Transactions on Pattern Analysis cog.2005.01.012.
and Machine Intelligence 40(4) (2018) 1002–1014. doi: [29] V. Perlibakas, Distance Measures for PCA-based Face
10.1109/TPAMI.2017. 2700390. Recognition, Pattern Recognit. Lett. 25 (2004) 711–724.
[13] P. Nagrath, et al., SSDMNV2: A Real-time DNN-based doi: 10.1016/j.patrec.2004.01.011.
Face Mask Detection System Using Single Shot [30] F. Günther, S. Fritsch, Neuralnet: Training of Neural
Multibox Detector and MobileNetV2, Sustainable Networks, R. J., 2(30) (2010). doi: 10.32614/RJ-2010-
006.
20
[31] L. Fasnacht, Mmappickle: Python 3 Module to Store
Memory-mapped Numpy Array in Pickle Format, J.
Open Source Softw. 3(651) (2018). doi:
10.21105/JOSS.00651.
[32] H. Ye, et al., Towards Robust Neural Graph
Collaborative Filtering via Structure Denoising and
Embedding Perturbation, ACM Transactions on
Information Systems 41 (2022) 1–28. doi:
10.1145/3568396.
[33] H. Nguyen, L. Bai, Cosine Similarity Metric Learning
for Face Verification (2010) 709–720. doi: 10.1007/978-
3-642-19309-5_55.
[34] S. Vasylyshyn, et al., A Model of Decoy System Based
on Dynamic Attributes for Cybercrime Investigation,
Eastern-European J. Enterp. Technol. 1(9(121) (2023)
6–20. doi: 10.15587/1729-4061.2023.273363.
21