<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Methods of Face Recognition in Video Sequences and Performance Studies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariia Nazarkevych</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitaly Lutsyshyn</string-name>
          <email>vitalylutsyshyn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanna Nazarkevych</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liubomyr Parkhuts</string-name>
          <email>liubomyr.t.parkhuts@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna Kostiak</string-name>
          <email>Kostiak.maryna@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Ivan Franko National University</institution>
          ,
          <addr-line>1 Universytetska str., Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 Stepan Bandera str., Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>246</fpage>
      <lpage>253</lpage>
      <abstract>
        <p>A method of capturing a person's face in a video stream has been developed. The developed methods of capturing the video stream are considered. Tracking methods are used in video surveillance. Methods of video stream capture, image frame extraction, and face recognition are considered. The method of flexible comparison on graphs, the principal component method, The Viola-Jones method, Local binary patterns, and Hidden Markov models, which are used for face recognition, are considered. The library in Python DeepFace was studied. Face recognition experiments were conducted. Faces photographed in the genre of selfie, portrait, and documentary photography were recognized. It has been found that the best recognition methods are found in the genre of photography. The recognition results are somewhat worse for selfies. The worst ones are for digital photography. Recognition was based on the MediaPipe Face Detection library. The recognition time was from 10 to 22 mc.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Face recognition</kwd>
        <kwd>object tracking</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Tracking objects in surveillance camera
footage is a challenging task. It is much more
difficult to track objects in video sequences to
improve their recognition. There are many
existing object-tracking methods, but all have
some drawbacks. Some of the existing
objecttracking models are region-based contour
models [
        <xref ref-type="bibr" rid="ref21 ref28">1</xref>
        ]. Tracking—tracking an object in a
video sequence; and detection—detecting an
object in a video sequence.
Tracking-bydetection—trackers first run a detector for each
frame, and then the tracking algorithm
associates these detections to determine the
movement of individual objects and assign them
unique identification numbers [2].
      </p>
      <p>Tracking objects is a complex problem.
Difficulties with object tracking can arise from
abrupt object movement, changing appearance
patterns of both the object and the scene,
nonrigid object structures, object-object and
objectscene occlusions, and camera movement.
Tracking is usually performed in the context of
higher-level applications that require the
location and/or shape of an object in each frame.
Typically, assumptions are made to limit the
tracking problem in the context of a particular
application. In this review, we classify tracking
methods based on the object and motion
representations used. Object tracking consists
of using appropriate image features, selecting
motion models, and detecting objects [3]:
• Target representation object.
• Localization object.</p>
      <p>Difficulties arise when objects move fast
compared to the frame rate or when the tracked
object changes direction in time [4–6]. The
sequential flow of object detection, object
tracking, object identification, and object
behavior completes the tracking process [7].
Video processing consists of the following steps:
video upload [8], prepro-cessing, a proposed
algorithm that includes video processing, then the
object capture step (Fig. 1).</p>
      <p>Video frames</p>
      <p>Preprocessing</p>
      <p>Proposed algorithm</p>
      <p>Moving object detection and tracking</p>
    </sec>
    <sec id="sec-2">
      <title>2. Object Recognition</title>
      <p>The capture and encoding of digital images
should result in the creation and rapid
dissemination of a huge amount of visual
information. Hence, efficient tools for searching
and retrieving visual information are essential.
Although there are effective search engines for
text documents today, there are no satisfactory
systems for retrieving visual information.</p>
      <p>Due to the growth of visual data both online
and offline [9] and the phenomenal success of web
search, expectations for image and video search
technologies are increasing.</p>
      <p>
        However, with the evolution of video camera
characteristics that can record at high frame rates
in good quality, and with advances in detection,
such as new approaches based on Convolutional
Neural Networks (CNNs), the basis for
Trackingby-detection trackers [
        <xref ref-type="bibr" rid="ref6 ref8">10</xref>
        ] has become more
robust. The requirements for a tracker in a
tracking system have changed dramatically,
allowing for much simpler tracking algorithms
that can compete with more complex systems
requiring significant computational costs.
      </p>
      <p>Let’s analyze three ranking algorithms that
take into account the spatial, temporal, and
spatiotemporal properties of geo-referenced video
clips.</p>
      <p>Object detection requires training machine
learning models, such as Recurrent Neural
Networks (RNNs) and CNNs, on images where
objects have been manually annotated and
associated with a high-level concept.</p>
      <p>A video stream is streamed in which a face
needs to be recognized [11]. We determine the
size of the face coordinates. The face contour is
aligned and the basic parameters are determined
(Fig. 2). As a result, a parametric vector is built.
The parameters are compared. As a result,
recognition is performed.</p>
      <p>Facial recognition
Face contour alignment</p>
      <p>n
cFae ftrae ilenm t
g
a
Determination of basic parameters</p>
      <p>Comparison of parameters. Result</p>
      <p>Object detection in offline video. This
approach estimates the behavior of perceived
objects and works best as a complement to other
offline video-based object detection systems [12].
In recent years, various other video object
detection systems have emerged that have tried to
use 3D convolutional networks that analyze many
images simultaneously.</p>
      <p>Knowledge-based methods use information
about the face, its features, shape, texture, or skin
color. In these methods, a certain set of rules is
distinguished that a frame fragment must meet to
be considered a human face. It is quite easy to
define such a set of rules (Fig. 3). All rules are
formalized knowledge that a person uses to
determine whether a face is a face or not.</p>
      <p>For example, the basic rules are: the areas of
the eyes, nose, and mouth differ in brightness
from the rest of the face; the eyes on the face are
always symmetrically positioned relative to each
other. Based on these and other similar properties,
algorithms are built that check whether these rules
are fulfilled in the image during execution. The
same group of methods includes a more general
method—the pattern-matching method. In this
method, a face standard (template) is determined
by describing the properties of individual face
areas and their specified relative position, with
which the input image is subsequently compared.</p>
      <p>Computing methodology
Artificial
inteligents
Computer
vision</p>
      <p>Object
recognition
Detection</p>
      <p>Computer
graphics
image
manipulation</p>
      <p>Face detection</p>
      <p>Face detection using such methods is
performed [13] by searching all rectangular
fragments of the image to determine which class
the image belongs to.</p>
      <p>Viola-Jones object detection [14]. The method
was proposed by Paul Viola and Michael Jones
and became the first method to demonstrate high
results in real-time image processing. The method
has many implementations, including as part of
the OpenCV computer vision library
(cvHaarDetectObjects function). The advantages
of this method are high speed (due to the use of a
cascade classifier); high accuracy in detecting
turned faces at an angle of up to 30 degrees. The
disadvantages include a long training time. The
algorithm needs to analyze a large number of test
images.</p>
      <p>The method of comparison on graphs (Elastic
graph matching) [15]. This method is related to
2D modeling. Its essence lies in the comparison of
graphs describing faces (a face is represented as a
grid with an individual location of vertices and
edges). Faces are represented as graphs with
weighted vertices and edges. At the recognition
stage, one of the graphs, the reference graph,
remains unchanged, while the other is deformed
to best match the first graph. In such recognition
systems, graphs can have a rectangular lattice and
a structure formed by characteristic
(anthropometric) points of faces.</p>
      <p>Graph edges are weighted by the distances [16]
between adjacent vertices. The difference
(distance, discriminative characteristic) between
two graphs is calculated using a certain
deformation cost function that takes into account
both the difference between the feature values
calculated in the vertices and the degree of
deformation of the graph edges.</p>
      <p>The graph is deformed by shifting each of its
vertices by a certain distance in certain directions
relative to its original location and choosing such
a position at which the difference between the
feature values in the vertex of the deformed graph
and the corresponding vertex of the reference
graph is minimal. This operation is performed in
turn for all graph vertices until the smallest total
difference between the features of the deformed
and reference graphs is achieved. The value of the
deformation cost function at this position of the
graph will be the measure of the difference
between the input face image and the reference
graph. This “relaxation” deformation procedure
should be performed for all reference faces in the
system database. The result of the system’s
recognition is the reference with the best value of
the deformation cost function.</p>
      <p>The disadvantages of the method include the
complexity of the recognition algorithm and the
complicated procedure for entering new templates
into the database.</p>
      <p>The best results in face recognition were
shown by the CNN or convolutional neural
network. The success is due to the ability to
understand the two-dimensional topology of the
image, unlike the multilayer perceptron.</p>
      <p>The distinctive features of CNN are local
receptor fields (providing local two-dimensional
connectivity of neurons), common weights
(providing detection of some features anywhere in
the image), and hierarchical organization with
spatial subsampling. Thanks to these innovations,
the CNN provides partial resistance to scale
changes, shifts, rotations, changes in angle, and
other distortions.</p>
      <p>CNN was developed in DeepFace, which was
acquired by Facebook to recognize the faces of its
social network users.</p>
      <p>Geometric face recognition method [17] is
one of the first face recognition methods used. The
methods of this type of recognition involve the
selection of a set of key points (or areas) of the
face and the subsequent formation of a set of
features. The key points can include the corners of
the eyes, lips, the tip of the nose, the center of the
eye, etc. The advantages of this method include
the use of inexpensive equipment. The
disadvantages are as follows: low statistical
reliability, high lighting requirements, and
mandatory frontal image of the face, with small
deviations. It does not take into account possible
changes in facial expressions.</p>
      <p>The method of flexible comparison on
graphs [18], the essence of which is to compare
graphs describing the image of a person’s face.
Some publications indicate 95–97% recognition
efficiency even in the presence of different
emotional expressions and changes in the angle
when forming a face image up to 15 degrees.
However, it takes about 25 seconds to compare the
input face image with 87 reference images.
Another disadvantage of this approach is the low
manufacturability of memorizing new standards,
which generally leads to a non-linear dependence
of the operating time on the size of the face
database. The main advantage is low sensitivity to
face illumination and changes in face angle, but
this approach itself has lower recognition
accuracy than methods built using neural
networks.</p>
      <p>The Principal Component Method (PCM)
[19] reduces the recognition or classification
process to the construction of a certain number of
principal components of images for an input
image. However, in cases where there are
significant changes in illumination or facial
expression in the face image, the effectiveness of
the method is significantly reduced.</p>
      <p>The Viola-Jones method [14] allows you to
detect objects in images in real-time. The method
works well when observing an object at a small
angle, up to about 30°. The recognition accuracy
using this method partially reaches over 90%,
which is a good result. However, at a deviation
angle of more than 30°, the recognition
probability drops sharply. This feature makes it
impossible to detect a face at an arbitrary angle.
Use of neural networks.</p>
      <p>One of the best results in face recognition is
achieved by using CNNs, which are a logical
development of such architectures as cognition
and recognition. The success is due to the ability
to take into account the two-dimensional topology
of the image, unlike the multilayer perceptron.
Thanks to these innovations, the ANN provides
partial resistance to scale changes, shifts,
rotations, changes in perspective, and other
distortions. Testing of the ANN on the ORL
database containing images of faces with slight
changes in lighting, scale, spatial rotation,
position, and various emotions showed 96%
recognition accuracy. The disadvantages of
methods based on neural networks include the
addition of a new reference face to the database,
which requires complete retraining of the network
on the entire available set, and this is a rather
lengthy procedure that, depending on the size of
the sample, requires hours of work or even several
days.</p>
      <p>Local Binary Patterns (LBPs) [15] were first
proposed in 1996 to analyze the texture of
halftone images. Studies have shown that LBPs
are invariant to small changes in lighting
conditions and small image rotations. LBW-based
methods work well when using images of faces
with different facial expressions, different
lighting, and head turns. Among the
disadvantages is the need for high-quality image
preprocessing due to high sensitivity to noise, as
the number of false binary codes increases in its
presence.</p>
      <p>Hidden Markov models [16]. A hidden
Markov model is a statistical model that simulates
the operation of a process similar to a Markov
process with unknown parameters. According to
the model, the task is to find unknown parameters
based on other observed parameters. The obtained
parameters can be used in further analysis for face
recognition. From the point of view of
recognition, an image is a two-dimensional
discrete signal. The observation vector plays an
important role in building an image model. To
avoid discrepancies in descriptions, a rectangular
window is usually used for recognition. To avoid
losing data areas, rectangular windows should
overlap each other. The values for overlap, as well
as the recognition areas, are selected
experimentally. Before use, the model must be
trained on a set of pre-labeled images. Each label
has its number and defines a characteristic point
that the model will have to find when adapting to
a new image.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Face Detection</title>
      <p>MediaPipe Face Detection is a face detection
software product that includes 6 landmarks and
support for multiple faces. It is based on
BlazeFace [17], a lightweight and
highperformance face detector specifically designed
for mobile GPUs. The detector’s real-time
performance allows it to be applied to any
realtime video stream that requires an accurate face
region to be used as input to other task-specific
models, such as 3D face keypoint estimation (e.g.,
MediaPipe Face Mesh), facial features, or facial
expression classification, and face region
segmentation. BlazeFace utilizes a simplified
feature extraction network inspired by
MobileNetV1/V2, but distinct from it, a
GPUfriendly binding scheme modified from Single
Shot MultiBox Detector (SSD).</p>
      <p>A collection of detected faces, where each face
is represented as a proto-message containing a
bounding box and 6 key points (right eye, left eye,
nose tip, the center of the mouth, right ear tragion,
and left ear tragion). The bounding box consists of
xmin and width (both normalized to [0.0, 1.0] by
the width of the image), and ymin and height (both
normalized to [0.0, 1.0] by the height of the
image). Each key point consists of x and y, which
are normalized to [0.0, 1.0] by the width and
height of the image, respectively (Fig. 4).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Face Mash</title>
      <p>MediaPipe Face Mesh is a solution that
estimates 468 3D facial landmarks in real-time,
even on mobile devices [18, 19]. The program
uses machine learning to determine the 3D surface
of the face, requiring only a single camera input
without the need for a special depth sensor. Using
a simplified modeling architecture along with
GPU acceleration throughout the pipeline, the
solution delivers real-time performance that is
critical.</p>
      <p>Additionally, the solution comes with a face
transformation module that bridges the gap
between facial landmark estimation and useful
real-time Augmented Reality (AR) applications
[20]. It establishes a metric 3D space and uses the
positions of facial landmarks on the screen to
estimate facial transformations in that space. The
face transformation data consists of conventional
3D primitives, including a face pose
transformation matrix and a triangular face mesh
[21]. A lightweight statistical analysis method
called Procrustes Analysis is used to drive robust,
efficient, and portable logic. The analysis is
performed on the CPU and has a minimal speed
footprint.</p>
      <p>
        The machine learning pipeline consists of two
real-time deep neural network models that work
together [
        <xref ref-type="bibr" rid="ref13">22</xref>
        ]: a detector that works on the full image
and calculates the location of the face, and a 3D
facial landmark model that works on these locations
and predicts an approximate 3D surface using
regression. Accurate face cropping significantly
reduces the need for conventional data
augmentation.
      </p>
      <p>The pipeline is implemented as a MediaPipe
graph that uses a face landmark subgraph from the
face landmark module and visualizes using a special
face renderer subgraph. The face landmark subgraph
internally uses the face_detection_subgraph from
the face detection module.</p>
      <p>The face detector is the same BlazeFace model
used in MediaPipe Face Detection.</p>
      <p>For 3D facial landmarks, we applied transfer
learning and trained the network with multiple
objectives: the network simultaneously predicts 3D
landmark coordinates on synthetic visualized data
and 2D semantic contours on annotated real-world
data. The resulting network provided us with
reasonable predictions of 3D landmarks not only on
synthetic but also on real-world data [23, 24].</p>
      <p>
        The 3D landmark network receives a cropped
video frame as input without additional depth
input. The model outputs the positions of the 3D
points, as well as the probability of the presence
and proper alignment of a face in the input data
[
        <xref ref-type="bibr" rid="ref12">25, 26</xref>
        ]. A common alternative approach is to
predict a 2D heat map for each landmark, but it
does not lend itself to depth prediction and has
high computational costs for so many points. We
further improve the accuracy and reliability of our
model by iterative loading and refining the
predictions. In this way, we can increase our
dataset to increasingly complex cases such as
grimaces, obliques, and occlusions.
      </p>
      <p>This method can be used for a variety of face
masking applications (Fig. 5).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Model Development</title>
      <p>There are two models in this solution: general
and landscape. Both models are based on
MobileNetV3 with modifications to make them
more efficient. The general model works with a
256×256×3 (HWC) tensor and outputs a
256×256×1 tensor representing the segmentation
mask. The landscape model is similar to the
general model but works on a 144×256×3 (HWC)
tensor. It has fewer FLOPs than the regular model
and is therefore faster. MediaPipe Selfie
Segmentation automatically resizes the input
image to the right tensor size before feeding it to
the ML model [27].</p>
      <p>The general model also supports ML Kit, and
the landscape model option supports Google Meet
(Fig. 6).</p>
      <p>During this experiment, the issue of
recognizing objects in a video stream was
considered. The main Python libraries that can be
used to recognize and classify objects from video
are highlighted. MediaPipe methods for achieving
a particular result in recognition are clearly
described (Fig. 7).</p>
    </sec>
    <sec id="sec-6">
      <title>6. Experimental Results</title>
      <p>About 50 images of graduating master’s
students were used. The images were taken from
mobile phones. Subsequently, after taking photos,
they recorded videos in MPEG7 format. In the
experiment, the placement of the face to the plane
of the photograph was taken into account. The
photo was taken in the style of a selfie, a
documentary photo, and a portrait. In addition,
these images had different variations in quality
and contained several facets of variation in color,
position, scale, rotation, pose, and facial
expression. We present the detection results in
Tables 1 and 2 for the HHI MPEG7 image set. The
face was fascinated by tracking.</p>
      <p>Out of man</p>
      <sec id="sec-6-1">
        <title>Frontal</title>
      </sec>
      <sec id="sec-6-2">
        <title>Semi-profile</title>
      </sec>
      <sec id="sec-6-3">
        <title>Profile</title>
        <sec id="sec-6-3-1">
          <title>Number of images</title>
        </sec>
        <sec id="sec-6-3-2">
          <title>Image size</title>
        </sec>
        <sec id="sec-6-3-3">
          <title>FP: False Positives</title>
        </sec>
        <sec id="sec-6-3-4">
          <title>DR: Detection Rate</title>
        </sec>
        <sec id="sec-6-3-5">
          <title>Time (sec)</title>
        </sec>
        <sec id="sec-6-3-6">
          <title>FP: False Positives</title>
        </sec>
        <sec id="sec-6-3-7">
          <title>DR: Detection Rate</title>
        </sec>
        <sec id="sec-6-3-8">
          <title>Time (sec)</title>
        </sec>
        <sec id="sec-6-3-9">
          <title>FP: False Positives</title>
        </sec>
        <sec id="sec-6-3-10">
          <title>DR: Detection Rate</title>
        </sec>
        <sec id="sec-6-3-11">
          <title>Time (sec)</title>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>Close to the frontal</title>
        <p>10</p>
      </sec>
      <sec id="sec-6-5">
        <title>Selfi</title>
        <p>12
6204 5205
87% 85%
10мс</p>
      </sec>
      <sec id="sec-6-6">
        <title>Portrait</title>
        <p>5290 5005
93% 92%
18 mc</p>
      </sec>
      <sec id="sec-6-7">
        <title>Documentary photography</title>
        <p>3458 FP: False</p>
        <sec id="sec-6-7-1">
          <title>Positives 85% DR: Detection Rate 22 mc</title>
          <p>7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
    </sec>
    <sec id="sec-8">
      <title>8. References</title>
      <p>
        Research on face detection in a video stream
has been conducted. This is done using the
MediaPipe Face Detection library. The results of
face detection are shown in Fig. 4. Frames in the
form of photos are recorded in the video stream.
The DeepFace library is used to capture faces. If
there are several faces in the frame, then
DeepFace captures several faces. Experiments
were carried out when taking pictures in the genre
of the selfie, portrait photography, and
documentary photography. The photo was taken
from the frontal, close to the frontal and profile.
The performed recognition showed a high
percentage of face recognition.
[
        <xref ref-type="bibr" rid="ref21 ref28">1</xref>
        ]
[2]
[3]
[4]
[5]
      </p>
      <p>A. Yilmaz, O. Javed, M. Shah, Object
Tracking: A Survey. ACM Computing
Surveys (CSUR), 38(4) (2006) 13-es.</p>
      <p>T. Huang, Computer Vision: Evolution and
Promise, CERN School Comput., 1996,
21–25. doi: 10.5170/CERN-1996-008.21
Z. Pang, Z. Li, N. Wang, Simpletrack:
Understanding and Rethinking 3D
MultiObject Tracking, ECCV 2022 Workshops:
Tel Aviv, Israel, October 2022, 680–696.
doi:10.1007/978-3-031-25056-9_43
O. Iosifova, et al., Analysis of Automatic
Speech Recognition Methods, in:
Workshop on Cybersecurity Providing in
Information and Telecommunication
Systems, vol. 2923 (2021) 252–257.</p>
      <p>K. Khorolska, et al., Application of a
Convolutional Neural Network with a
Module of Elementary Graphic Primitive</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>Transformation of 2D to 3D Models,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>Information Technology</source>
          <volume>100</volume>
          (
          <issue>24</issue>
          ) (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          7426-
          <fpage>7437</fpage>
          . [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Skladannyi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Platonenko,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Unmanned</given-names>
            <surname>Aerial Vehicles</surname>
          </string-name>
          , in: IEEE 41st
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>and</surname>
          </string-name>
          Nanotech-nology (
          <year>2022</year>
          )
          <fpage>473</fpage>
          -
          <lpage>477</lpage>
          . doi:
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          10.1109/ELNANO54667.
          <year>2022</year>
          .
          <volume>9927105</volume>
          [7]
          <string-name>
            <surname>I. Delibaşoğlu</surname>
          </string-name>
          , Moving Object Detection
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Video</surname>
            <given-names>Processing</given-names>
          </string-name>
          , (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . doi:
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          10.1007/s11760-022-02458-y [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yu</surname>
          </string-name>
          , Evaluation of Training Efficiency of
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Video</given-names>
            <surname>Processing</surname>
          </string-name>
          <string-name>
            <surname>Technology</surname>
          </string-name>
          , Optik,
          <fpage>273</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          (
          <year>2023</year>
          )
          <fpage>170404</fpage>
          . [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nixon</surname>
          </string-name>
          , How Do Destinations Relate to
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>eTourism Conference</source>
          ,
          <year>2023</year>
          ,
          <fpage>204</fpage>
          -
          <lpage>216</lpage>
          . [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Luo</surname>
          </string-name>
          , Improving Multiple
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <volume>25</volume>
          (
          <issue>2</issue>
          ) (
          <year>2023</year>
          )
          <fpage>380</fpage>
          . [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Garcia</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Face-</surname>
          </string-name>
          To-Face and Online
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Nursing</surname>
          </string-name>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Artificial Intelligence
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Small</given-names>
            <surname>Underwater</surname>
          </string-name>
          <string-name>
            <surname>Robot</surname>
          </string-name>
          , Processes,
          <volume>11</volume>
          (
          <issue>2</issue>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          (
          <year>2023</year>
          )
          <fpage>312</fpage>
          . [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , et al.,
          <source>CYBORG: Blending</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>on Applications of Computer Vision</source>
          ,
          <year>2023</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          6108-
          <fpage>6117</fpage>
          . [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dawood</surname>
          </string-name>
          , Facial Image
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Nonlinear</given-names>
            <surname>Analysis Appls</surname>
          </string-name>
          .
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          1593-
          <fpage>1599</fpage>
          . [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hartman</surname>
          </string-name>
          , et al.,
          <source>Elastic Shape Analysis</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Framework</surname>
          </string-name>
          ,
          <source>Int. J. Comput. Vision</source>
          ,
          <year>2023</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          1-
          <fpage>27</fpage>
          . [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Serratosa</surname>
          </string-name>
          , Learning
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Workshops</surname>
            ,
            <given-names>S+SSPR</given-names>
          </string-name>
          <year>2022</year>
          , Montreal, QC,
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Canada</surname>
          </string-name>
          ,
          <year>August 2022</year>
          ,
          <fpage>103</fpage>
          -
          <lpage>112</lpage>
          . [17]
          <string-name>
            <given-names>X.</given-names>
            <surname>Qi</surname>
          </string-name>
          , et al.,
          <source>A Convolutional Neural</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Neuroscience</surname>
          </string-name>
          (
          <year>2023</year>
          ). [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yasuda</surname>
          </string-name>
          , et al.,
          <source>Flexibility Chart 2.0: An</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Reviews</surname>
          </string-name>
          ,
          <volume>174</volume>
          (
          <year>2023</year>
          )
          <fpage>113116</fpage>
          . [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ramadan</surname>
          </string-name>
          , et al.,
          <source>Impact of PCM type on</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>J. Energy</given-names>
            <surname>Systs</surname>
          </string-name>
          .
          <volume>7</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>67</fpage>
          -
          <lpage>88</lpage>
          . [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sut</surname>
          </string-name>
          , et al.,
          <source>Automated Adrenal Gland</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>with CT Images</surname>
            ,
            <given-names>J. Digital</given-names>
          </string-name>
          <string-name>
            <surname>Imaging</surname>
          </string-name>
          (
          <year>2023</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          1-
          <fpage>14</fpage>
          . [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Glennie</surname>
          </string-name>
          , et al.,
          <source>Hidden Markov Models:</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>Methods in Ecology and Evolution</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          (
          <year>2023</year>
          )
          <fpage>43</fpage>
          -
          <lpage>56</lpage>
          . [22]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bansal</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Real-</surname>
          </string-name>
          Time Advanced
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Video</surname>
            <given-names>Detection</given-names>
          </string-name>
          , Appl. Sci.
          <volume>13</volume>
          (
          <issue>5</issue>
          ) (
          <year>2023</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          3095. [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Deori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thounaojam</surname>
          </string-name>
          , A Survey on
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Inf. Theor.</surname>
          </string-name>
          (IJIT),
          <volume>3</volume>
          (
          <issue>3</issue>
          ) (
          <year>2014</year>
          )
          <fpage>31</fpage>
          -
          <lpage>46</lpage>
          . [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Medykovskyy</surname>
          </string-name>
          , et al.,
          <source>Methods of</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <given-names>Information</given-names>
            <surname>Technologies</surname>
          </string-name>
          ,”
          <year>2015</year>
          ,
          <fpage>70</fpage>
          -
          <lpage>72</lpage>
          . [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Logoyda</surname>
          </string-name>
          , et al.,
          <source>Identification of</source>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2019</year>
          . [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nazarkevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yavourivskiy</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Systems</surname>
          </string-name>
          in Microelectr.,
          <year>2015</year>
          ,
          <fpage>439</fpage>
          -
          <lpage>441</lpage>
          . [27]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hrytsyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grondzal</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bilenkyj,
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Technologies</surname>
          </string-name>
          ,” (
          <year>2015</year>
          )
          <fpage>188</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>