<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Sokurenko);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>analysis using artificial intelligence based tools⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerii Sokurenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bohdan Rusyn</string-name>
          <email>b.rusyn.prof@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Manzhai</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitalii Nosov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Svitlana</string-name>
          <email>luchiksvitlana@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luchyk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasil Luchyk</string-name>
          <email>luchik-vasil@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karpenko Physico-Mechanical Institute of the NAS of Ukraine</institution>
          ,
          <addr-line>Naukova Street 5 79601 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kharkiv National University of Internal Affairs</institution>
          ,
          <addr-line>L. Landau Avenue 27 61080 Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The negative demographic and socio-economic consequences of war have led to an intensification of criminal trends in Ukraine. Therefore, there is a constant need to accelerate the processes of integrating artificial intelligence technologies into modern criminal analysis to enhance the effectiveness of law enforcement agencies. This article examines the resolution of specific tasks in facial image processing and analysis using artificial intelligence-based tools. The authors conducted a comparative analysis of software products that provide facial recognition and localization services in images. They completed an in-depth study of facial recognition models, specifying their advantages and disadvantages, in order to utilize them most effectively in combination. A Python script was developed that implements an image processing pipeline which prioritizes the use of the Dlib detector and then, in case of its failure, switches to a backup MTCNN detector. The proposed hybrid approach aims to optimize both detection accuracy and efficiency, which is confirmed by successful processing of a wide range of images. In the article, the authors emphasize the presence of several limitations in applying this framework. Therefore, there is a need for research to be continued, particularly in the direction of optimizing the scaling coefficient and integrating additional quality metrics for automatic evaluation of cropped faces.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;facial recognition technologies</kwd>
        <kwd>criminal analysis</kwd>
        <kwd>CNN facial detector models</kwd>
        <kwd>hybrid approach</kwd>
        <kwd>two-stage pipeline</kwd>
        <kwd>efficiency 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        War, economic hardships, and mass migration have significantly affected the population size in
Ukraine. According to IMF estimates, Ukraine's population in 2025 amounts to 32.9 million people.
Compared to the pre-war year of 2021, this reduction constituted 21.9% [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The negative
demographic and socio-economic consequences of war have led to an intensification of criminal
trends in Ukraine. According to data from the Prosecutor General's Office, over twelve months of
2022, the number of crimes committed in Ukraine exceeded the number of crimes committed in
2021 (321,443 crimes) and the number of crimes committed in 2020 (360,662 crimes). Over the 12
months of 2024, law enforcement agencies registered 492,479 crimes with corresponding criminal
proceedings, of which 194,688 criminal proceedings involved charges against specific individuals.
Over 6 months of 2025, law enforcement agencies have already registered 327,847 crimes with
corresponding criminal proceedings, of which 101,399 criminal proceedings involved charges
against specific individuals [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Due to the shortage of human resources, law enforcement agencies are compelled to maximize
the use of cutting-edge information technologies and artificial intelligence (AI) for processing and
analyzing large volumes of materials. For example, the Palantir system is used to review data
arrays and conduct searches and processing of large volumes of information according to specific
parameters already at the stage of pre-trial investigation. The Microsoft Azure program has
enabled Ukrainian prosecutors and investigators to investigate war crimes committed by Russian
military personnel. Clearview AI technology is actively employed for verifying individuals at
checkpoints, identifying deceased soldiers and prisoners of war, and searching for missing persons.
Facial recognition technology has become one of the most powerful applications of artificial
intelligence for law enforcement agencies. The task of extracting and normalizing facial images,
bringing a large number of heterogeneous images (photographs containing faces of specific
individuals who were photographed both directly and indirectly through photographic
documentation of objects) to a unified format is critically important in solving operational and
service tasks in the IT units of the National Police of Ukraine. Therefore, it is important for law
enforcement officers to deepen their understanding of all capabilities of this technology to prevent
and investigate criminal acts against society, and to make criminals' use of cutting-edge digital
technologies more complicated.</p>
      <p>The goal of this work is to improve the effectiveness of law enforcement agencies by developing
and experimentally testing a hybrid approach to processing and recognizing facial images based on
artificial intelligence tools.</p>
      <p>The main task of the research is to determine the optimal combination of face detection models
that improves the accuracy, speed, and reliability of identifying individuals in digital images,
particularly in conditions of low quality, varying lighting, or shooting angles.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Over recent years, facial recognition technologies have undergone significant changes due to the
implementation of deep learning methods for neural networks. The main application areas of these
technologies include biometric authentication, video surveillance, forensic investigations, digital
identification, and others [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7">3-7</xref>
        ]. Publication [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presents a systematic review of scientific
publications on current facial recognition algorithms. Deep learning of neural networks in facial
recognition tasks involves the construction and training of multi-level models capable of
automatically extracting characteristic facial features from raw pixel images. Common
architectures of deep convolutional neural networks (CNN) in facial recognition tasks include:
VGGNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], ResNet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], InceptionNet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] used as base frameworks; FaceNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] generates vector
representations of faces (embeddings); ArcFace [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], CosFace [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], SphereFace [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] introduce
modifications to the loss function to enhance discriminativeness. Before CNN training happens, the
following procedures are performed: face detection with extraction of the facial region (for
example, using the MTCNN [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and RetinaFace [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] methods); alignment with normalization of
the position of eyes, nose, and mouth; scaling and normalization of images; data augmentation, or
the process of artificially generating new data from existing data, which includes rotation,
mirroring, and adding noise to images to improve generalization. Such data augmentation allows
for artificial expansion of the dataset by introducing minor modifications to the original data.
      </p>
      <p>The CNN training process occurs on a large dataset with millions of images, where the CNN
automatically learns to extract features instead of manual descriptor design. This process utilizes a
loss function that quantitatively measures how “poorly” the model performs its task or how much
its predictions differ from the true values. The training objective consists precisely in minimizing
this loss function. At the final stage, the CNN transforms the facial image into a fixed-length vector
(for example, 128 or 512 elements) that preserves semantic similarity (e.g., faces of the same person
have similar vectors) and can be used for classification, search, and verification. The CNN is also
validated on a test dataset that was not used during training.</p>
      <p>
        At the same time, CNN facial recognition technologies encounter a number of various problems
and challenges when implemented. These are highlighted by several researchers in their
publications. Among the main problems identified is image variability (i.e. intra-class variation)
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] that arises due to changes in lighting, viewing angle, facial expressions, accessories (such as
glasses or masks), as well as image quality. Such factors significantly complicate person
identification, reducing the accuracy of even modern deep learning models.
      </p>
      <p>
        Another problem is model transfer to new domains (i.e. domain adaptation). It occurs when
models trained on one type of data often demonstrate reduced effectiveness when applied in new
conditions (for example, changes in cameras, environment, culture, or usage context). This limits
the scalability and universality of recognition systems without prior fine-tuning [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        In facial recognition, particularly in applied tasks, the problem of limited training data volume
frequently arises. Many open datasets have imbalances in the representation of racial, gender, and
age groups, leading to algorithmic bias [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Additionally, deep learning models may demonstrate
higher accuracy for certain demographic groups compared to others, which is defined as model
bias. This model property can have critical consequences in the context of ensuring human rights.
For example, it is known that some models have significantly higher error rates when recognizing
individuals with dark skin color or atypical facial features [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        Finally, the unregulated deployment of facial recognition technologies raises significant societal
concerns regarding privacy rights and informed consent [
        <xref ref-type="bibr" rid="ref22 ref23">22-23</xref>
        ]. Consequently, the various
challenges and limitations encountered by practitioners, particularly law enforcement personnel, in
facial recognition applications require comprehensive analysis of different CNN facial detector
models to identify their respective advantages and limitations..
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <p>
        Pattern recognition systems typically operate in training and testing modes. During the training
phase, the feature space is partitioned into recognition classes for the purpose of constructing
decision rules. An important section of pattern recognition theory is automatic classification or
cluster analysis of input data [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. A cluster consists of a set of similar (analogous) pattern
realizations that can be separated from other objects according to specific criteria. Since cluster
analysis lacks an array of class identifiers for the realizations, this process is also termed
"unsupervised learning." The clustering process involves not only the search for similar realizations
but also the formation of decision rules for each cluster. A priori information about the number of
clusters and their distribution in the recognition feature space significantly simplifies this process.
The main approaches, distinguished by their field of knowledge and scientific direction for solving
pattern recognition problems, are:
1. Algebraic approach, whose main advantage is simple decision rules. The primary
disadvantage of this approach lies in low recognition reliability, as it does not account for
uncontrolled factors that influence the recognition process;
2. Geometric approach, characterized by universality, clarity, and simplicity of recognition
algorithm interpretation;
3. Statistical approach, which employs statistical characteristics for data analysis;
4. Biological approach, which includes artificial neural networks. Algorithms within this
approach model cognitive processes occurring in human brain nerve cells. The main
disadvantage of the biological approach is high sensitivity to the dimensionality of the
recognition feature space;
5. Network approach (semantic networks, frames, Petri nets, decision trees, etc.). The
advantages of this approach include model simplicity, possibility for extension and
complexity enhancement, while the main disadvantage is the complexity of constructing
decision rules;
6. Fuzzy approach, developed based on the algebraic approach and serving as a competitor to
the statistical approach. This approach allows modeling of pattern recognition processes
that a priori overlap in the recognition feature space. However, it is not adapted for
optimizing the parameters of recognition system functionality;
7. Game-theoretic approach, whose decision rules are characterized by high complexity and
low recognition reliability.
      </p>
      <p>Since all main approaches, except the algebraic one, intersect with the geometric approach, the
formation of a general decision-making theory is most justified within the framework of the
geometric approach. Within the geometric approach, pattern recognition theory is based on two
fundamental principles:
1. The maximum-distance principle, whereby decision rules are constructed by maximizing
the average inter-class distance.
2. The minimum-distance principle, whereby decision rules are constructed under the
condition of minimizing the average distance of pattern realization to its class center.</p>
      <p>Implementation of these principles constitutes a necessary condition for achieving maximum
recognition reliability, which is determined by the total probability of correct decision-making:
P = p1 D1 = p2 D2</p>
      <p>t
where p₁, p₂ are unconditional probabilities, D₁, D₂ are the first and second reliabilities, respectively.</p>
      <p>Suppose it is necessary to find pattern N, which is described by Nᵢ, i = 1...n, features, each of
which possesses mⱼ, j = 1...m properties. Thus, the pattern can be described by a matrix of
dimension m×n:</p>
      <p>N = ( N 11 ⋯</p>
      <p>N 1m ⋮ ⋱ ⋮ N n1 ⋯</p>
      <p>N nm )</p>
      <p>Suppose it is necessary to identify within an image array the specific image that corresponds to
pattern N. To accomplish this, we apply a known function f to pattern N:</p>
      <p>f ( N ) = ( f ( N 11 ) ⋯ f ( N 1m ) ⋮ ⋱ ⋮ f ( N n1 ) ⋯ f ( N nm ) )
Let the image array be denoted as K = K₁, K₂, ..., Kₛ. Each image is described by a matrix
K p = ( K 11 ⋯
p</p>
      <p>p p
K 1m ⋮ ⋱ ⋮ K n1 ⋯</p>
      <p>K npm )
and the action of function f on matrix (3) is described accordingly by the functional matrix
f ( K p) = ( f ( K 11 ) ⋯ f ( K 1pm ) ⋮ ⋱ ⋮ f ( K np1 ) ⋯ f ( K npm ) )</p>
      <p>p</p>
      <p>For the pattern to correspond to an image from the array, the following inequality must be
satisfied:
|f ( N )−f ( K p)|&lt; ϵ , |f ( N ij ) - f ( K ipj )| &lt; ϵ p ,
d ( f ( N ij ) , f ( K ipj ))=√ ∑ ( f ( N ij )−f ( K ipj ))h ,
r s
p=0
(1)
(2)
(3)
(4)
(5)
(6)
(8)</p>
      <p>CI = 1 – E,
E =</p>
      <p>H 0 - H ( γ )</p>
      <p>
        H 0
where E is the normalized information measure that represents the measure of recognition class
diversity. In practical applications of information synthesis for learning-capable recognition
systems, the Shannon entropy measure and the Kullback information measure have gained the
most widespread adoption [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. The normalized Shannon entropy criterion of functional efficiency
has the form
where H₀ is the unconditional average entropy:
where r is the parameter responsible for progressive weighting of large distances between objects;
h is the parameter responsible for gradual weighting of differences along individual coordinates.
      </p>
      <p>
        In practice, the fuzzy compactness hypothesis of pattern realizations applies, as classes
inherently overlap and exhibit indistinct boundaries. Consequently, the application of the
aforementioned deterministic distance-based proximity criteria in classification tasks fails to
achieve clear partitioning of the feature space into distinct recognition classes. To address this
limitation in pattern recognition applications, the Mahalanobis distance has been adopted [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]:
d ( f ( N ij ) , f ( K ij )) = ‖ f ( N ) - f ( K p) ‖T ∙ W -1( f ( N ij ) ,
p
f ( K ipj )) ,
where T denotes the transpose symbol for the column vector; W–1 represents the inverse covariance
matrix.
      </p>
      <p>During the analysis and synthesis of learning-capable recognition systems, an information
measure in the form of (9) is widely employed as a general measure of pattern proximity
(similarity).
(9)
(10)
(11)
(14)
(15)</p>
      <p>M (12)
H 0=−∑ p ( γl) log2 p ( γl)</p>
      <p>l=1</p>
      <p>H ( γ ) represents the a posteriori conditional entropy characterizing the residual uncertainty
after decision-making:</p>
      <p>H ( γ )=− ∑l=M1 p ( γl) ∑l=M1 p( μγml )log2 p( μγml ) (13)</p>
      <p>In expressions (12) and (13), the following notations are adopted: pₗ represents the unconditional
(a priori) probability of accepting hypothesis l; pₘₗ represents the a posteriori conditional
probability of accepting hypothesis m given that hypothesis l was a priori accepted; M denotes the
number of alternative hypotheses. In practical applications, the following assumptions are
commonly made:
1.
2.</p>
      <p>Decisions are binary in nature (M = 2).</p>
      <p>Given that the recognition system operates under a priori uncertainty conditions, the
assumption of equiprobable hypotheses is justified according to the Bernoulli-Laplace
principle.</p>
      <p>12 ∑=21 l p ( γl) ∑m2-1 p( μγml )log2 p( μγml )</p>
      <p>In law enforcement systems for facial recognition, minimal error (maximum accuracy) and
controlled error levels are required, which directly impacts mathematical modeling. Therefore,
mathematical models are refined through probabilistic components, specialized loss functions, and
multimodal architectures to ensure minimization of critical errors. The models must account for
the following aspects:</p>
      <p>Optimization of accuracy metrics (in modern machine learning and deep learning tasks,
optimization algorithms play a critical role, as they determine the efficiency and speed of
model training. Specifically, these algorithms aim to minimize the loss function by updating
model parameters based on gradient information).</p>
      <p>Bayesian interpretation and confidence levels (in such systems, it is important to obtain
probability estimates of an individual's membership in a particular class). The model often
incorporates a posteriori probabilities, that is, the probability of finding an image after the
occurrence of a specific event.</p>
      <p>Robustness to capture conditions (allowing facial recovery under low quality or partial
occlusion). Several tasks related to robustness are distinguished. Robust stability ensuring
system stability under all admissible deviations of the image object model from the
nominal.</p>
      <p>Multimodal models (combination of features from different biometric channels (voice, gait,
2D+3D data) allows for improved recognition accuracy).</p>
      <p>Adaptive thresholds and calibration (discrimination thresholds are selected depending on
the task context).</p>
      <p>Therefore, in law enforcement activities, the mathematical modeling process of facial
recognition shifts from purely classical accuracy optimization to controlled risk minimization. This
means that explicit error cost components, a priori scenario probabilities, and calibrated a
posteriori estimates are incorporated into the formalism. This stimulates the integration of
multilevel representations and quality assessment mechanisms for individual data transformation
processes, which, in turn, allows for adaptation of thresholds and re-verification algorithms before
decision-making.</p>
      <p>We conducted a comparative analysis of software products that provide facial recognition and
localization services in images. Information regarding the key features of these tools and their
availability for free use is presented in Table 1.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The findings in Table 1 highlight two critical constraints. First, effective processing of
largescale image datasets necessitates access to premium subscription tiers, which may be financially
prohibitive for government agencies. Second, the majority of these platforms lack local deployment
options, precluding their use for processing sensitive materials containing confidential information,
particularly personal data, under current security protocols. To overcome these limitations, we
propose a hybrid methodology incorporating dual face detection algorithms for automated facial
detection, alignment, and cropping processes. We begin with a comprehensive examination of
facial recognition models to optimize their combined implementation.</p>
      <p>A typical facial recognition system pipeline consists of sequential stages, each performing a
specific function. The efficiency of each stage is critical, as errors introduced in early phases can
accumulate and significantly impact the final outcom.e</p>
      <sec id="sec-4-1">
        <title>4.1. Face detection</title>
        <p>Face detection serves as the initial step, with the objective of localizing and isolating human faces
within an image or video frame. This stage produces bounding boxes around each detected face.
Various methods exist for face detection:</p>
        <p>Traditional methods, which include:
The Viola-Jones algorithm, which employs Haar-like features and cascade classifiers.
Histogram of Oriented Gradients (HOG) combined with Support Vector Machine (SVM).
Deep learning-based methods: Convolutional Neural Networks (CNN), including Haar
Cascades, Dlib HOG and Dlib CNN, Face Recognition (Dlib HOG + CNN), MediaPipe,
MTCNN, RetinaFace, YOLOv5-Face, and OpenVINO Face Detection.</p>
        <p>
          Analysis of recent research findings on these non-commercial face detection models [
          <xref ref-type="bibr" rid="ref26 ref27 ref28 ref29 ref30 ref31 ref32 ref33 ref34">26-34</xref>
          ],
which employ deep learning architectures and are implemented as open-source libraries or
frameworks in Python—including Haar Cascades, Dlib HOG, Dlib CNN, Face Recognition (Dlib
HOG + CNN), MediaPipe, MTCNN, RetinaFace, YOLOv5-Face, and OpenVINO Face Detection—
enabled their comparison across the following parameters:
        </p>
        <p>F1-score is a metric used to evaluate the accuracy of classification models, particularly when
classes in the dataset are imbalanced. This indicator is calculated as the harmonic mean (16)
between precision (17) and recall (18).</p>
        <p>F 1 score = 2 ∙ Precision⋅ Recall ; (16)</p>
        <p>Precision + Recall
Precision = Number of correctly detected faces TP , (17)</p>
        <p>All detected faces TP + FP
Recall = Number of correctly detected faces TP , (18)</p>
        <p>Total number of actual faces TP + FN
where TP (True Positives) is a correctly detected faces; FP (False Positives) is a falsely detected
faces; FN (False Negatives) is an undetected actual faces;
FPS (Frames Per Second) is a processing speed, that is, the number of images (frames) that can be
processed per second. Latency is a processing latency per image.</p>
        <p>Face Detection</p>
        <p>Model
Haar Cascades</p>
        <p>Dlib HOG
Dlib CNN</p>
        <p>Face
Recognition</p>
        <p>MTCNN
OpenVINO
Mediapipe</p>
        <p>RetinaFace
YOLOv5-Face</p>
        <p>FPS
min
max</p>
        <sec id="sec-4-1-1">
          <title>Precisio n</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Recall</title>
          <p>F1score</p>
          <p>Latency (per
image), ms
min
max
25
5
10
5
1
80
25
15
20
30
10
15
12
5
120
30
25
45
0.70
0.85
0.98
0.95
0.95
0.95
0.96
0.98
0.97
0.60
0.75
0.94
0.92
0.90
0.91
0.93
0.96
0.94
0.65
0.80
0.96
0.93
0.92
0.93
0.94
0.97
0.95
30
100
250
150
400
5
10
50
10
40
150
400
300
700
10
15
120
20</p>
          <p>CPU
CPU
CPU
CPU
CPU
CPU
CPU
GPU
GPU</p>
          <p>
            The calculation results for the aforementioned parameters of face detection models employing
deep learning architectures, as described in [
            <xref ref-type="bibr" rid="ref27 ref28 ref29 ref30 ref31 ref32 ref33 ref34 ref35">27-35</xref>
            ], are presented in Table 2.
          </p>
          <p>Comprehensive analysis of the facial recognition model parameters defined in Table 2 allowed
the following conclusions to be drawn.</p>
          <p>Analysis of the models based on processing speed (FPS), specifically the frame rate range from
minimum to maximum, enabled assessment of their suitability for real-time applications,
essentially evaluating model performance. OpenVINO Face Detection emerges as the clear leader in
this metric, achieving 80-120 FPS on CPU, making it an ideal candidate for surveillance systems
and other high-throughput applications. Deep learning-based models such as MTCNN and Dlib
HOG demonstrate the lowest speeds, with performance metrics of 1-5 and 5-10 FPS, respectively.</p>
          <p>Comparison of models based on primary accuracy metrics (Figure 1a, 1b) revealed that Dlib
CNN and RetinaFace demonstrate the highest accuracy performance with F1-scores exceeding
0.95.The data in Figures 1a and 1b indicate the models' capability to reliably detect faces with
minimal false positive occurrences. The Haar Cascades model, despite being a classical approach,
significantly underperforms across all accuracy metrics.</p>
          <p>Examination of facial recognition models and comparison of their processing latency
performance in milliseconds (Table 1) revealed that latency results correlate with FPS speed. The
OpenVINO model exhibits the lowest latency (5-10 ms), confirming its high efficiency. At the
opposite end of the spectrum is MTCNN with latency up to 700 ms, rendering it unsuitable for
realtime applications but acceptable for offline analysis where accuracy takes priority.</p>
          <p>To determine the trade-off between model accuracy (F1-score) and speed (maximum and
minimum latency), we established a normalized accuracy-to-latency ratio by calculating
F1-score/Latency(ms,min)*100 and F1-score/Latency(ms,max)*100 metrics. The calculated data for
each facial recognition model are displayed in Figure 2.</p>
          <p>As illustrated in Figure 2, the diagram reveals the fundamental challenge in face detection:
balancing accuracy and speed. Models offering the optimal trade-off are OpenVINO and MediaPipe,
which provide high accuracy (F1 &gt; 0.92) with very low latency. Models with high accuracy but
significant latency include Dlib CNN and MTCNN. RetinaFace represents the only GPU-based
model demonstrating high accuracy with moderate latency. Haar Cascades exhibits low accuracy
but relatively minimal latency.</p>
          <p>Real-time facial recognition models require optimization for rapid image processing. This may
include hardware acceleration such as GPU utilization, as well as quantization and pruning
techniques to reduce model size and increase processing speed. Figure 3 demonstrates that
GPUaccelerated models exhibit high performance while maintaining superior accuracy.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Face alignment</title>
        <p>Following face detection, the alignment stage aims to normalize the face to a standard position and
orientation. This involves identifying facial landmarks such as eye corners, nose, and mouth
corners, and transforming the image (e.g., rotation, scaling) so that these points are positioned
consistently. Alignment reduces variations caused by pose and improves the consistency of
features extracted in subsequent stages.</p>
        <p>
          Reference [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] demonstrates that alignment enhances recognition accuracy by up to 6%. The
techniques employed span from 2D affine transformations to more sophisticated 3D alignment
methodologies. Within the DeepFace framework, alignment functionality is implemented by
default, while the RetinaFace detector achieves superior alignment precision through its robust
landmark detection capabilities. Preprocessing workflows incorporating facial alignment based on
detector-identified landmarks have become established standard practice in contemporary facial
recognition systems.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Face Representation and Feature Extraction</title>
        <p>At this stage, the aligned facial image is transformed into a compact, discriminative numerical
vector (embedding) that captures its essential characteristics. Two primary approaches exist for
facial feature extraction:</p>
        <p>Handcrafted features. These are based on manually designed algorithms for detecting
edges, textures, shapes, or key points. Examples include LBP (Local Binary Patterns), HOG
(Histogram of Oriented Gradients), and SIFT (Scale-Invariant Feature Transform).
Advantages include interpretability and functionality with limited datasets, while
disadvantages encompass the potential to miss the most discriminative information.
Learned features through deep learning. Convolutional Neural Networks (CNN)
automatically learn hierarchical features from data. Initial layers extract edges and
textures, while subsequent layers combine them into complex shapes. Examples include
models such as DeepFace, FaceNet, VGG-Face, ArcFace, AdaFace, and MagFace.
Advantages include high discriminative capability and robustness to variations.
Disadvantages encompass requirements for large datasets, computational intensity, and
reduced interpretability.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Face Matching and Classification (Verification/Identification)</title>
        <p>This constitutes the final stage, which can be implemented in two distinct forms:</p>
        <p>Verification (1:1). Comparison of two facial embeddings to determine whether they belong
to the same individual. A similarity score is calculated (e.g., using cosine similarity or
Euclidean distance) and compared against a threshold value.</p>
        <p>Identification (1:N). Comparison of a query face embedding against a database of known
face embeddings to find the closest match (or multiple matches).</p>
        <p>Facial classification may employ Support Vector Machine (SVM), K-Nearest Neighbor (KNN)
methods, or specialized similarity learning architectures such as Siamese Networks (SN), which are
utilized for face matching through cosine similarity between output vectors.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussions</title>
      <p>
        This investigation enabled us to comprehend the complexity, multifaceted nature, and critical
importance of the entire facial recognition process, particularly within law enforcement
applications. These technologies offer numerous advantages compared to other access control or
monitoring devices. However, our analytical findings revealed that the sequential nature of the
facial recognition pipeline results in error accumulation. Enhancement of any individual stage can
improve overall system accuracy. This is particularly relevant for the face detection stage.
Reference [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] indicates that improving face detection accuracy can enhance overall recognition
accuracy by up to 42%, while alignment contributes up to 6% improvement. Therefore, when
selecting face detector models, we recommend considering the analytical results obtained in this
study.
      </p>
      <p>1. Model selection depends on task-specific requirements. For applications where maximum
accuracy is paramount (e.g., biometric identification from photographs), RetinaFace or Dlib
CNN models are recommended, utilizing graphics processing units (GPU) for acceleration
when necessary. For real-time systems (e.g., surveillance, interactive applications),
OpenVINO Face Detection or MediaPipe represent optimal choices due to their high
processing speeds and low resource requirements.
2. Classical methods are outperformed by neural network approaches. The Haar Cascades
algorithm significantly underperforms compared to modern models in both accuracy
(F1score ~0.65) and recall (~0.6). Despite relatively high processing speeds (up to 30 FPS), these
models are unsuitable for systems where detection quality is critical.
3. Modern optimized models offer balanced solutions. The OpenVINO Face Detection model
demonstrates exceptional CPU performance, achieving speeds up to 120 FPS while
maintaining high accuracy (F1-score ~0.93). Similarly, the MediaPipe framework provides
high accuracy (F1-score ~0.94) with low latency (10-15 ms), making these models optimal
for diverse applications, including mobile applications and embedded systems.
4. A pronounced trade-off exists between accuracy and computational efficiency. Deep neural
network-based models such as Dlib CNN and RetinaFace provide the highest detection
accuracy (F1-score 0.95); however, their computational complexity results in significant
latencies (50 to 400 ms per image). This constrains their use in real-time systems without
specialized hardware acceleration.
5. CPU-based models demonstrate a broad performance spectrum. Among CPU-operating
models, significant differentiation is observed. OpenVINO stands out as an extremely fast
solution (80-120 FPS) with competitive accuracy (F1-score 0.93), making it ideal for
embedded systems and Edge AI applications. Conversely, classical approaches like Haar
Cascades, while fast, substantially underperform in accuracy (F1-score 0.65), limiting their
applicability. Models such as MTCNN and Dlib CNN offer high accuracy but at the cost of
substantial computational complexity and consequently high latency.
6. Hardware acceleration (GPU) impact is critical for high-performance systems. Models
optimized for graphics processors (RetinaFace, YOLOv5-Face) demonstrate significantly
better performance balance compared to CPU counterparts. They achieve high FPS rates
(up to 45) while maintaining accuracy at F1-score ~0.95 levels. This makes them suitable for
computer vision systems requiring real-time video stream processing.</p>
      <p>Although the proposed framework may elicit academic discussion, such engagement potentially
demonstrates that this research establishes novel pathways for subsequent studies.
We present a Python-based implementation utilizing artificial intelligence tools for automated
facial detection, alignment, and cropping in images. The system employs a hybrid methodology
incorporating two primary detection algorithms: Dlib's HOG (Histogram of Oriented Gradients)
combined with SVM (Support Vector Machine) and MTCNN (Multi-task Cascaded Convolutional
Networks). The framework accepts input images across multiple formats and quality
specifications.</p>
      <p>This automated facial cropping system operates through the integration and application of
several fundamental scientific research contributions in computer vision and machine learning. Its
effectiveness and robustness are ensured through the utilization of advanced algorithms for face
detection, landmark identification, image enhancement, and precise resampling (Figure 4).</p>
      <p>
        The primary detector (Dlib HOG+SVM) employs a Histogram of Oriented Gradients (HOG)
implementation combined with Support Vector Machine (SVM) from the Dlib library. This method
efficiently extracts local gradient descriptors, which are input to a linear SVM classifier for binary
classification—distinguishing faces from background. This methodology was comprehensively
described in [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. It should be noted that this approach is particularly effective for frontal and
nearfrontal faces.
      </p>
      <p>To enhance the probability of face detection in images with varying orientations, the input
image is iteratively rotated (in 20-degree increments across a 0-360 degree range) before applying
the detector. In cases where faces are not detected on the original rotated image, preprocessing is
applied to enhance contrast and brightness, followed by repeated detection attempts.</p>
      <p>
        In the event of unsuccessful Dlib detector performance, the system transitions to MTCNN
(Multi-task Cascaded Convolutional Networks) [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. MTCNN is a convolutional neural network
comprising three cascaded stages (P-Net, R-Net, O-Net), each performing tasks of facial region
proposal generation, refinement, and facial landmark localization. This detector demonstrates high
robustness to variations in pose, illumination, and facial scale. MTCNN initialization occurs
dynamically, only when needed, to optimize system resource utilization. Face detection using
MTCNN also incorporates iterative image rotations, analogous to the Dlib approach.
      </p>
      <p>To enhance Dlib detection efficiency under challenging conditions, a function has been
implemented that applies adaptive histogram equalization using CLAHE (Contrast Limited
Adaptive Histogram Equalization) [39] in the LAB color space (on the luminance L channel) and
additional linear brightness transformation in the HSV color space (on the V channel). CLAHE
improves local contrast, enhancing facial visibility for the detector. It is important to note that
these enhancements are applied only to image copies used for detection, while subsequent
geometric transformations are performed on the original, unmodified rotated image to preserve
maximum quality.</p>
      <p>Following successful face detection, a Shape Predictor (the
shape_predictor_68_face_landmarks.dat model from Dlib) is utilized to localize 68 facial landmarks.
This model is based on the Supervised Descent Method (SDM), which was comprehensively
described in [40]. The facial inclination angle is calculated based on the coordinates of the left and
right eye centers. An affine transformation is applied to the image to align the face by positioning
the eye line horizontally. This transformation employs Lanczos interpolation, which ensures
highquality pixel transformation. This resampling method, based on the application of the sinc function
as a filter, is extensively discussed in digital image processing literature, particularly in [41].
Lanczos interpolation is recognized for its ability to minimize aliasing effects and preserve edge
sharpness during image scaling, which is critically important for obtaining high-quality final
photographs.</p>
      <p>Following alignment, re-detection is performed on the aligned image to obtain precise
coordinates. Using the refined coordinates of the aligned face, the system crops a square region
around it. The size of this region is determined based on facial measurements and a scaling
coefficient, which is employed in the script to define the square cropping area around the detected
face. This allows for the inclusion of additional space surrounding the face. The cropped image is
then scaled to a standardized output size using the high-quality Lanczos interpolation method.
Final images are saved in JPEG format with minimal compression settings.</p>
      <p>The program also incorporates a critical unique face filtering stage. Following initial detection
with rotation, the system filters detected faces using a two-stage approach. The first stage employs
geometric filtering, where Intersection over Union (IOU) and Intersection over Area (IOA) are
applied to eliminate redundant rectangles belonging to the same face. The second stage implements
vector filtering, which utilizes the dlib face recognition model to compute 128-dimensional vectors
(embeddings) for each face. Faces are considered unique if the distance between their vectors
exceeds an established threshold (embedding_threshold = 0.6). This enables the system to process
images containing multiple faces while preserving only one instance when faces are highly similar
(e.g., from different angles).</p>
      <p>For process monitoring and diagnostics, the system maintains detailed log files that record
detection successes/failures, the detector used, and rotation angles. A separate file contains a list of
images that were not processed by the Dlib detector and were passed to MTCNN.</p>
      <p>Thus, the script implements an image processing pipeline that prioritizes the use of the Dlib
detector and then, in case of its failure, switches to the backup MTCNN detector. This cascaded
approach aims to optimize both detection accuracy and efficiency, since the Dlib HOG detector is
typically faster for simple cases, while MTCNN provides higher robustness in complex scenarios
(e.g., poor lighting conditions, face rotations, or presence of occlusions). Through the two-stage
pipeline and unique face filtering logic, the script can efficiently process images containing single
or multiple faces, ensuring that each unique face is identified and processed separately.</p>
      <p>A set of images of varying quality, orientation, and scale was used for testing. Processing was
performed in Python using the os, cv2, dlib, numpy, shutil, datetime, and MTCNN libraries. The
comparative effectiveness of the models was determined by accuracy, precision, processing time,
and the success rate of face detection when changing lighting parameters, background, and their
number.</p>
      <p>As input data for testing the program's performance, two sets of photographic images were
selected: the Labelled Faces in the Wild (LFW) Dataset [42] and WIDER Face Testing Images [43].</p>
      <p>The results of processing the first dataset (13,234 photos) demonstrated the effectiveness of the
tested tool: the results contained only faces, although a small portion of them were duplicated in
flipped form. Several faces were partially identified on single images containing multiple faces.
Among the faces that were not detected, the majority were located adjacent to other faces.</p>
      <p>The testing results on the second dataset (16,097 photos) showed poorer performance. Most
problems occurred with photographs containing groups of people positioned together but in
different planes (not all faces were detected). Additionally, in photographs where perforated
ribbons with text were present, as shown in Figure 5, some ribbon fragments were identified as
faces.</p>
      <p>Overall, it should be noted that the tool we developed has certain limitations regarding its
application. For example, on images containing multiple faces, the application may not detect all of
them since the system was designed for photographs containing a single central figure. Also,
images containing multiple faces require substantially longer processing times, with performance
scaling according to available computational resources.</p>
      <p>The experiment confirmed that the hybrid approach of Dlib + MTCNN provides satisfactory
results when working with heterogeneous images and reduces the number of missed faces.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Therefore, under conditions of full-scale and prolonged war in Ukraine, the CNN face recognition
models we analyzed can be effectively utilized by law enforcement agencies to solve a wide range
of tasks. These include searching for missing persons, kidnapped children, identification of
deceased military personnel and civilians, detection of enemy saboteurs, collaborators and spies,
war criminals and Russian military personnel, integration with video surveillance systems and
drones, among others. The use of different face detector models enables deep analysis of large data
volumes, particularly video materials from surveillance systems.</p>
      <p>Within the framework of this research, recommendations have been formulated regarding the
implementation of face recognition procedures and the selection of CNN models at the face
detection stage to enhance the efficiency of law enforcement agencies. The developed automated
system effectively solves the tasks of face detection, alignment, and cropping in images using a
hybrid detector and high-quality image processing algorithms. The hybrid approach implemented
in the system allows combining the speed advantages of the Dlib detector with the enhanced
robustness of MTCNN. The application of iterative rotations significantly increases the chances of
face detection under non-ideal conditions. The use of Lanczos interpolation for all scaling and
rotation operations minimizes image quality degradation, particularly reducing the blurriness
problem. Brightness and contrast parameters have been adapted to balance between improving
visibility for the detector and preserving original image details. The system's efficiency is
confirmed by successful processing of a wide spectrum of images, ensuring a standardized output
format.</p>
      <p>The implemented methodologies ensure the necessary accuracy and quality of output images,
making this system a valuable tool for various applied tasks, particularly in law enforcement
agencies. Further research may include optimization of the scaling coefficient and integration of
additional quality metrics for automatic evaluation of cropped faces. Additionally, we note that
deep learning models for face recognition typically require large, suitable, labeled datasets for
optimal training, which can present difficulties. Therefore, it is advisable to attempt applying the
so-called ensemble method for face recognition based on deeply trained CNNs. Ensemble deep
learning represents a machine learning paradigm in which several individual CNN models
(learning algorithms) are combined to create a single, more effective and predictive model.
Ensemble systems in face recognition for solving various tasks that law enforcement officers face
in their activities today constitute the direction of our further scientific research.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI GPT-5 and Gemini in order to:
Grammar and spelling check. After using these tools/services, the authors reviewed and edited the
content as needed and takes full responsibility for the publication’s content.</p>
      <p>https://doi.org/10.1109/LSP.2016.2603342
[39] K. J. Zuiderveld, CLAHE, in: Graphics Gems IV, 1994, 474–485.
https://doi.org/10.1016/B978-012-336156-1.50061-6
[40] V. Kazemi, J. Sullivan, One millisecond face alignment, in: CVPR, 2014, 1867–1874.</p>
      <p>https://doi.org/10.1109/CVPR.2014.241
[41] R. C. Gonzalez, R. E. Woods, Digital Image Processing, 4th ed., Pearson, 2018.
[42] LFW Dataset, Kaggle, 2025. URL: https://www.kaggle.com/datasets/jessicali9530/lfw-dataset
[43] WIDER Face Dataset, 2025. URL: http://shuoyang1213.me/WIDERFACE/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A</given-names>
            <surname>Critical</surname>
          </string-name>
          <article-title>Juncture amid Policy Shifts</article-title>
          .
          <source>April</source>
          <year>2025</year>
          . World Economic Outlook. https://www.imf.org/en/Publications/WEO/Issues/2025/04/22/world-economic-outlookapril
          <article-title>-2025</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>International</given-names>
            <surname>Monetary</surname>
          </string-name>
          <string-name>
            <surname>Fund</surname>
          </string-name>
          ,
          <source>World Economic Outlook: A Critical Juncture amid Policy Shifts</source>
          ,
          <year>2025</year>
          . URL: https://www.imf.org/en/Publications/WEO/Issues/2025/04/22/worldeconomic-outlook-april
          <article-title>-2025</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Office of the Prosecutor General of Ukraine, Statistical reports</article-title>
          ,
          <year>2025</year>
          . URL: https://gp.gov.ua/ua/posts/statistika
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Amirgaliyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mussabek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rakhimzhanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhumadillayeva</surname>
          </string-name>
          ,
          <article-title>Review of ML/DL methods for person detection and face recognition</article-title>
          ,
          <source>Sensors</source>
          <volume>25</volume>
          (
          <year>2025</year>
          )
          <article-title>1410</article-title>
          . https://doi.org/10.3390/s25051410
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>Review of face recognition based on deep learning</article-title>
          ,
          <source>Applied and Computational Engineering</source>
          <volume>46</volume>
          (
          <year>2024</year>
          )
          <fpage>297</fpage>
          -
          <lpage>303</lpage>
          . https://doi.org/10.54254/
          <fpage>2755</fpage>
          -2721/46/20241638
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Survey of face recognition</article-title>
          ,
          <source>arXiv 2212.13038</source>
          (
          <year>2022</year>
          ). https://doi.org/10.48550/arXiv.2212.13038
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Zhang,</surname>
          </string-name>
          <article-title>Deep learning face recognition survey</article-title>
          ,
          <source>Computer Vision and Image Understanding</source>
          <volume>189</volume>
          (
          <year>2019</year>
          )
          <article-title>102805</article-title>
          . https://doi.org/10.1016/j.cviu.
          <year>2019</year>
          .102805
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>El</surname>
          </string-name>
          <string-name>
            <surname>Fadel</surname>
          </string-name>
          ,
          <article-title>Facial recognition algorithms: systematic review</article-title>
          ,
          <source>Journal of Imaging</source>
          <volume>11</volume>
          (
          <year>2025</year>
          )
          <article-title>58</article-title>
          . https://doi.org/10.3390/jimaging11020058
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Very deep convolutional networks</article-title>
          ,
          <source>arXiv 1409.1556</source>
          (
          <year>2014</year>
          ). URL: https://arxiv.org/abs/1409.1556
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>arXiv 1512.03385</source>
          (
          <year>2015</year>
          ). https://doi.org/10.48550/arXiv.1512.03385
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sermanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          , et al.,
          <article-title>Going deeper with convolutions</article-title>
          ,
          <source>arXiv 1409.4842</source>
          (
          <year>2014</year>
          ). https://doi.org/10.48550/arXiv.1409.4842
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schroff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          , J. Philbin, FaceNet: unified embedding,
          <source>in: CVPR</source>
          ,
          <year>2015</year>
          . https://doi.org/10.48550/arXiv.1503.03832
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xue</surname>
          </string-name>
          , S. Zafeiriou, ArcFace, in: CVPR,
          <year>2019</year>
          . https://doi.org/10.48550/arXiv.
          <year>1801</year>
          .07698
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gong</surname>
          </string-name>
          , et al.,
          <source>CosFace</source>
          , arXiv
          <year>1801</year>
          .
          <volume>09414</volume>
          (
          <year>2018</year>
          ). https://doi.org/10.48550/arXiv.
          <year>1801</year>
          .09414
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weller</surname>
          </string-name>
          , SphereFace revived, arXiv
          <volume>2109</volume>
          .05565 (
          <year>2018</year>
          ). https://doi.org/10.48550/arXiv.2109.05565
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>MTCNN face detection</article-title>
          ,
          <source>in: ICCNEA</source>
          ,
          <year>2020</year>
          ,
          <fpage>154</fpage>
          -
          <lpage>158</lpage>
          . https://doi.org/10.1109/ICCNEA50255.
          <year>2020</year>
          .00040
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ververas</surname>
          </string-name>
          , I. Kotsia, S. Zafeiriou, RetinaFace, in: CVPR,
          <year>2020</year>
          ,
          <fpage>5203</fpage>
          -
          <lpage>5212</lpage>
          . URL: https://openaccess.thecvf.com/content_CVPR_2020/papers/Deng_RetinaFace_
          <article-title>SingleShot_Multi-Level_Face_Localisation_in_the_Wild_CVPR_2020_paper</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dakhil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Abdulazeez</surname>
          </string-name>
          ,
          <article-title>Face recognition based on DL: review</article-title>
          ,
          <source>Indonesian Journal of Computer Science</source>
          <volume>13</volume>
          (
          <year>2024</year>
          ). https://doi.org/10.33022/ijcs.v13i3.
          <fpage>4037</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Survey of face recognition</article-title>
          ,
          <source>arXiv 2212.13038</source>
          (
          <year>2022</year>
          ). https://doi.org/10.48550/arXiv.2212.13038
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>DCFace</given-names>
            <surname>Authors</surname>
          </string-name>
          ,
          <article-title>Balanced face generation for fair verification</article-title>
          ,
          <source>arXiv 2412.03349</source>
          (
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2412.03349
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>I. D.</given-names>
            <surname>Raji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buolamwini</surname>
          </string-name>
          ,
          <article-title>Actionable auditing of biased AI</article-title>
          , in: AIES,
          <year>2019</year>
          ,
          <fpage>429</fpage>
          -
          <lpage>435</lpage>
          . https://doi.org/10.1145/3306618.3314244
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stehouwer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <article-title>Face presentation attack detection survey</article-title>
          ,
          <source>IEEE TPAMI 43</source>
          (
          <year>2020</year>
          )
          <fpage>3538</fpage>
          -
          <lpage>3559</lpage>
          . https://doi.org/10.1109/TPAMI.
          <year>2020</year>
          .2977021
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wienroth</surname>
          </string-name>
          ,
          <article-title>Socio-technical disagreements in forensic DNA</article-title>
          , BioSocieties
          <volume>15</volume>
          (
          <year>2020</year>
          )
          <fpage>28</fpage>
          -
          <lpage>45</lpage>
          . https://doi.org/10.1057/s41292-018-0138-8
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mordvyntsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pashniev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nakonechnyi</surname>
          </string-name>
          ,
          <article-title>Video analytics in criminal analysis</article-title>
          ,
          <source>Law and Safety</source>
          <volume>96</volume>
          (
          <year>2025</year>
          )
          <fpage>90</fpage>
          -
          <lpage>103</lpage>
          . https://doi.org/10.32631/pb.
          <year>2025</year>
          .
          <volume>1</volume>
          .
          <fpage>08</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y. P.</given-names>
            <surname>Zaichenko</surname>
          </string-name>
          ,
          <source>Fundamentals of Intelligent Systems Design, Slovo</source>
          , Kyiv,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Dovbysh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. V.</given-names>
            <surname>Shelekhov</surname>
          </string-name>
          , Pattern Recognition Theory, Sumy State University,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Dovbysh</surname>
          </string-name>
          , Intelligent Systems Design, Sumy State University,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>OpenCV</surname>
          </string-name>
          , Cascade Classifier,
          <year>2025</year>
          . URL: https://docs.opencv.
          <source>org/4</source>
          .x/db/d28/tutorial_cascade_classifier.html
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Antipona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Magsino</surname>
          </string-name>
          ,
          <article-title>Haar cascade enhancement for face recognition</article-title>
          ,
          <year>2024</year>
          . https://doi.org/10.13140
          <source>/RG.2.2.34675.75045</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. B.</given-names>
            <surname>Suthar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Thakkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Thumar</surname>
          </string-name>
          ,
          <source>Face detection on Raspberry Pi, IRJAEH</source>
          <volume>2</volume>
          (
          <year>2024</year>
          )
          <fpage>2440</fpage>
          -
          <lpage>2445</lpage>
          . https://doi.org/10.47392/IRJAEH.
          <year>2024</year>
          .0334
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>C.-L. Lin</surname>
            ,
            <given-names>Y.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Adaptive facial attendance systems</article-title>
          ,
          <source>Electronics</source>
          <volume>11</volume>
          (
          <year>2022</year>
          ). https://doi.org/10.3390/electronics11142278
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zamir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Naseem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Frasteen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Assam</surname>
          </string-name>
          ,
          <source>Face recognition on Raspberry Pi, Computation</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <article-title>148</article-title>
          . https://doi.org/10.3390/computation10090148
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          et al.,
          <source>RetinaFace: dense face localisation</source>
          , arXiv
          <year>1905</year>
          .
          <volume>00641</volume>
          (
          <year>2019</year>
          ). https://doi.org/10.48550/arXiv.
          <year>1905</year>
          .00641
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>D.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yao</surname>
          </string-name>
          , J. Liu, YOLO5Face, arXiv
          <volume>2105</volume>
          .12931 (
          <year>2021</year>
          ). https://doi.org/10.48550/arXiv.2105.12931
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brown</surname>
          </string-name>
          , Mobile attendance using OpenVINO, in: ICAIS,
          <year>2021</year>
          ,
          <fpage>1152</fpage>
          -
          <lpage>1157</lpage>
          . https://doi.org/10.1109/ICAIS50930.
          <year>2021</year>
          .9395836
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>S. Demirkol</surname>
          </string-name>
          <article-title>(serengil), DeepFace</article-title>
          , GitHub repository,
          <year>2025</year>
          . URL: https://github.com/serengil/deepface
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dalal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Triggs</surname>
          </string-name>
          ,
          <article-title>HOG for human detection</article-title>
          ,
          <source>in: CVPR</source>
          ,
          <year>2005</year>
          ,
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          . https://doi.org/10.1109/CVPR.
          <year>2005</year>
          .177
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <surname>MTCNN</surname>
          </string-name>
          ,
          <source>IEEE SPL 23</source>
          (
          <year>2016</year>
          )
          <fpage>1499</fpage>
          -
          <lpage>1503</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>