<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Methods of Capturing and Tracking Objects in Video Sequences with Subsequent Identification by Artificial Intelligence</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariia Nazarkevych</string-name>
          <email>mariia.a.nazarkevych@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitaly Lutsyshyn</string-name>
          <email>vitalylutsyshyn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Lytvyn</string-name>
          <email>vasyl.v.lytvyn@lpnu.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna Kostiak</string-name>
          <email>kostiak.maryna@lpnu.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yaroslav Kis</string-name>
          <email>yaroslav.p.kis@lpnu.ua</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Ivan Franko National University</institution>
          ,
          <addr-line>1 Universytetska str., Lviv, 79000</addr-line>
          ,
          <institution>Ukraine Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 Stepan Bandera str., Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>237</fpage>
      <lpage>245</lpage>
      <abstract>
        <p>In video surveillance systems, the task of capturing an object and following it is relevant. An even more urgent task is the process of identification with subsequent authentication of the person. This makes it possible to organize the appropriate security of a private person or property, as well as the security of particularly important objects. A method of capturing the object has been developed. The use of the best among the considered methods in automated video surveillance systems will allow increasing the efficiency of intelligent video surveillance systems with a sufficient level of reliability. Footage from capturing the object is recorded in files. Implemented object capture using the MediaPipe Face Mesh library. In the future, it is planned to carry out the identification of a person using machine learning.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Tracking objects</kwd>
        <kwd>identification</kwd>
        <kwd>artificial intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Tracking technology [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is used in practice in
automated video surveillance systems, traffic
monitoring, motion-based recognition, such as
gait-based identification, video indexing in large
data catalogs, navigation of unmanned vehicles,
Augmented Reality (AR), the film and advertising
industry, for creating computer graphics and
visual effects, in research in medicine and sports
[2], medical imaging, human-computer
interaction (gesture recognition, virtual keyboard,
mouse, pupil tracking), crowd behavior analysis,
video compression.
      </p>
      <p>The rapid introduction and deployment of
ubiquitous video sensors have resulted in the
collection of massive amounts of data. However,
indexing and searching large video databases
remains a very difficult task. Augmenting media
clips [3] with metadata descriptions are very
useful for engineering. In our previous work, we
proposed the notion of a visible scene model
obtained by combining location and direction
sensor information with a video stream. Such
georeferenced media streams are useful in many
applications and, most importantly, they can be
searched efficiently. The result of a
georeferenced video query will usually consist of the
number of video segments that satisfy the query
conditions but with more or less relevance.</p>
      <p>The widespread availability of digital video
sensors has led to a variety of applications,
ranging from casual video recording to
professional multi-camera surveillance systems
[4]. As a result, a large number of video clips are
collected, which creates complex data processing
problems [5–7].</p>
      <p>Many institutions deploy video surveillance
systems with the primary function of protecting
property rather than monitoring public traffic.
Nowadays, there is cloud-based video
surveillance that, in addition to security issues,
solves the role of property protection. In addition,
the main advantage of a cloud-based video
surveillance system is that it is located outside the
object of observation and has a flexible storage
capacity.</p>
      <p>
        A visual surveillance system [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] consists of
two main parts:
• Target representation object.
• Localization object.
• Filtering.
• Data Association.
      </p>
      <p>Target object representation and localization is
mostly a bottom-up process, it is sequential and its
subsequent steps do not affect the previous ones.
The typical computational complexity of these
algorithms is quite low.</p>
      <p>
        Standard algorithms for finding and localizing
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] a target object include:
• Blob tracking: segmentation of the
object’s interior, including blob detection,
block-based correlation, and optical flow.
• Kernel-based tracking (mean-shift
tracking): an iterative localization procedure
based on maximizing the similarity criterion.
• Contour tracking: search for the boundary
of an object Feature matching: image
registration.
• Point feature tracking: given a sequence
of images of a certain scene captured by a
moving or stationary camera. You need to get
a set of the most accurate sequences of
projection coordinates of some points in the
scene in each frame.
      </p>
      <p>Capturing and encoding digital images should
lead to the creation and rapid dissemination of a
huge amount of visual information. Therefore,
efficient tools for searching and retrieving visual
information are essential. Although effective
search engines for text documents exist today,
there are no satisfactory systems for retrieving
visual information.</p>
      <p>Due to the growth of visual data both online
and offline and the phenomenal success of web
search, expectations for image and video search
technologies are increasing.</p>
      <p>
        Hidden Markov Models (HMM) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
method is based on a statistical comparison of an
object with a database of templates. Hidden
Markov models use the statistical properties of
signals and take into account their spatial
characteristics. Model elements: the initial
probability of states, set of observed states, set of
hidden states, transition probability matrix.
During human recognition, all generated Markov
models are checked and the highest probability
that the sequence of observations for an object is
generated by the corresponding model is searched.
      </p>
      <p>To end our set of experiments, we should try
PRNU SCI over video streams. 270 Basically,
PRNU is equally valid for all frames that are
captured in a video. Normally, a 271 video is
made of 25 or 30 frames per second. Cameras
usually reduce individual frames to 272 resolution
when capturing a video: 720×576 is called
standard definition, 1280×720 is 273 called
enhanced definition, and 1920×1080 for high
definition (or sometimes Full HD). 274 Nowadays
there exist higher definitions like 3840×2160 (4K)
or even 7680×4320 (8K). See 275 that even 4K is
less than 8 Mpixels per frame.</p>
      <p>The disadvantages include low processing
speed and low resolution.</p>
      <p>The system can only optimize the data
processing time and response time of its model,
but cannot minimize the time of searching other
models.</p>
      <p>
        Principal component analysis [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. One of the
most well-known and developed methods is the
principal component analysis (PCA) based on the
Karunen Loewe transformation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Initially, the
principal component method was used in statistics
to reduce the feature space without significant loss
of information. In face recognition, it is used
mainly to represent a face image with a
lowdimensional vector, which is then compared with
reference vectors stored in a database.
      </p>
      <p>The main goal of the principal component
method is to significantly reduce the
dimensionality of the feature space so that it
describes “typical” images belonging to a large
number of faces as well as possible. Using this
method, it is possible to detect various variations
in the training set of face images and describe this
variability based on several orthogonal vectors.</p>
      <p>
        The set of eigenvectors obtained once on the
training set of face images [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is used to encode
all other face images, which are represented in a
weighted combination of eigenvectors.
      </p>
      <p>
        Using a limited number of eigenvectors, a
compressed approximation of the input face
image [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] can be obtained, which can then be
stored in the database as a vector of coefficients,
which is also a search key in the face database.
      </p>
      <p>Also, the purpose of the PCA method is to
reduce the feature space without significant loss
of information so that it best describes the
“typical” images belonging to a set of faces. In
face recognition, it is used mainly to represent a
face as a low-dimensional vector, which is then
compared with reference vectors from the
database.</p>
      <p>PCA has proven itself in many applications.
However, when there are significant changes in
facial expressions or lighting in the face image,
the effectiveness of the method drops
significantly. This is because the principal
component method selects a subspace to
maximize the approximation of the input data set,
rather than to discriminate between classes of
faces.</p>
      <p>
        The Support Vector Machine (SVM) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
method is a set of similar learning algorithms used
for classification and regression analysis tasks.
The essence of the support vector method is to
find a hyperplane in the feature space that
separates a class of face images from other images
(the class of “no faces” images). In this case, out
of all possible hyperplanes that divide into two
classes, it is necessary to choose a hyperplane
whose distance from each class is maximal.
      </p>
      <p>The advantages of this
method are high
resistance to overtraining; high speed compared to
neural networks; the ability to reduce sensitivity
to noise by reducing accuracy.</p>
      <p>The disadvantages include the fact that the
accuracy of the method is inferior to many other
methods.</p>
      <p>
        Neural network methods [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Quite common
methods
include
about
a
dozen
different
algorithms. The main feature of such networks is
their ability to learn from a set of ready-made
examples entered into the database in advance.
During the training of neural networks, the
network automatically extracts key features and
builds relationships between them. After that, the
trained neural network applies the experience
gained to recognize a previously unknown object.
Neural network methods show some of the best
results in the field
of recognition
but are
considered the most difficult to implement.
      </p>
      <p>
        There are about a dozen different Neural
Networks (NNs) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. One of the most widely
used and popular instances is a network built on a
multilayer perceptron,
which
allows you to
classify an image or signal given as input
according to the preliminary settings and training
experience of the neural network.
      </p>
      <p>Neural networks are trained on a set of training
examples. The essence of training is to adjust the
weights of inter-neuronal connections in the
process of solving an optimization problem using
the gradient descent method. During the training
process, the neural network automatically extracts
key features, determines their importance, and
builds relationships between them.</p>
      <p>It is assumed that the trained network will be
able to apply the experience gained
during
training to unknown images using generalizing
abilities.</p>
    </sec>
    <sec id="sec-2">
      <title>Tracking Algorithm</title>
      <p>
        Let I(x,t) be the brightness of the image frame
with time t at point x, where x is a vector. The
movement of the image far from the limits of
visibility is described using an equation of the
form: I(x,t)=I(delta(x), t + t1) (*), where delta(x) is
the movement of point x when moving from the
frame (t) to (t + 1). The movement of features
from frame to frame is described by balancing for
all points x from the surrounding features W [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>In this case, the lighting of the points of the
scene corresponding to the features remains
constant.</p>
      <p>With small changes in the image from frame to
frame, you can read that the feature window is
simply contained, and the delta(x) movement
takes the form delta(x) = x + d. However, when the
duration of tracking increases, the scene point
image</p>
      <p>is
approximately
distorted.</p>
      <p>This</p>
      <p>image
described
by
an
can</p>
      <p>be
affine
transformation, therefore the movement of points
is described by an affine transformation delta(x) =
Ax + d, where A is a matrix of dimension 2×2.</p>
      <p>The task of the tracker is to track the delta(x)
movement values for all points of the feature
window W. Since (*) is never performed in real
conditions, the search is for such a movement that
minimizes the difference between the windows at
the current and next position on the frame, i.e. is
the delta(x) at which the minimum is reached
 = 
∑ 

( ),  +  1 −  ( ,  )
(1)
or it is the norm of the image difference L2.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Detect Video Objects in Real-Time</title>
      <p>
        Recurrent neural networks are based on
sequential data and are well-suited for video
object detection. A collaboration between the
Georgia Institute of Technology (GIT) and
Google Research proposed an RNN-based video
object
detection
method
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] that achieved
recognition rates of up to 15 frames per second,
even running on a mobile processor.
      </p>
      <p>One approach to detecting video objects is to
split the video into its component frames and
apply an image recognition algorithm to each
frame. This rejects all the potential benefits of
extracting temporal information from the video.</p>
      <p>Consider, for example, the problem of occlusion
and recapture: if a video object recognition system
identifies a person who is then briefly covered by
a passing pedestrian, it will take time for the
system to realize that it has “lost” the object. An
object can be lost not only due to occlusion, but
also due to motion blur, in cases where camera
movement or object movement (or both) causes
enough disruption to the frame that elements
become streaky or out of focus, and this is
impossible for the recognition structure to
identify.</p>
      <p>If the system analyzes frames, this
understanding is not possible because each frame
is treated as a complete and closed episode. The
computational cost of a video object detection
system is less for object identification and
tracking than re-registering the same object on a
frame-by-frame basis.</p>
      <p>Optical flow estimation generates a
twodimensional vector field that represents the pixel
displacement between one frame and the
neighboring frame (previous or next).</p>
      <p>The optical flow can show the progress of
groupings of individual pixels along the entire
length of the video footage, providing guidance on
which to perform useful operations. It has been
used for a long time in traditional data-intensive
video environments, such as video editing suites,
to provide motion vectors along which filters can
be applied, and for animation purposes. From the
perspective of video object recognition, optical
flow enables the computation of discontinuous
object trajectories because it can compute mean
trajectories in a way that is not possible with older
methods such as the Lucas-Canade approach. This
approach only considers the constant flow
between grouped frames and cannot form a
comprehensive relationship between multiple
groups of actions, even though these groups may
represent the same event interrupted by factors
such as occlusion and motion blur.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Research of Software Products for</title>
    </sec>
    <sec id="sec-5">
      <title>Object Recognition</title>
    </sec>
    <sec id="sec-6">
      <title>4.1. Google Video Intelligence</title>
      <p>
        As a leading FAANG (Facebook, Amazon,
Apple, Netflix, and Google) investor in computer
vision research, Google's Video Intelligence
system offers out-of-the-box features including
object recognition in video, real-time OCR
(optical character recognition) capabilities, logo
detection, and face detection. Cloud Video
Intelligence comes with a huge offering of
pretrained models, although customized training is
available [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
4.2.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Hand Detection</title>
      <p>The ability to perceive the shape and
movement of hands can be a vital component in
enhancing the user experience across a variety of
technology domains and platforms. For example,
it can form the basis for understanding sign
language and controlling hand gestures, and it can
enable the overlay of digital content and
information on top of the physical world in
augmented reality. Although it is natural for
humans, reliable real-time hand perception is an
extremely challenging task for computer vision
because hands often cover themselves or each
other (e.g., finger/palm occlusions and hand
tremors) and lack high-contrast patterns.
MediaPipe Hands is a highly accurate hand and
finger tracking solution. It uses Machine Learning
(ML) to identify 21 3D hand landmarks from just
one frame. Whereas current state-of-the-art
approaches rely primarily on powerful desktop
environments.
4.3.</p>
    </sec>
    <sec id="sec-8">
      <title>Palm Detection Model</title>
      <p>To detect the initial location of the hand, there
is a detector model optimized for real-time mobile
use, similar to the face detection model in
MediaPipe Face Mesh. Hand detection is an
extremely challenging task: The simplified model
and the full model have to work with different
hand sizes with a large scale range (~20×) relative
to the image frame and be able to detect closed
and self-closed hands. While faces have
highcontrast patterns, such as in the eye and mouth
area, the lack of such features on hands makes it
difficult to detect them reliably based on visual
characteristics alone. Instead, providing
additional contexts, such as hand, body, or facial
features, helps to accurately localize the hand.</p>
      <p>The method solves the above problems using
different strategies. First, we train a palm detector
instead of a hand detector because estimating the
bounding boxes of solid objects such as palms and
fists is much easier than detecting hands with
articulated fingers. Furthermore, since palms are
smaller objects, the non-maximal suppression
algorithm works well even in cases of two-handed
self-occlusion, such as handshakes. Furthermore,
palms can be modeled using square bounding
rectangles (anchors in ML terminology), ignoring
other aspect ratios, and thus reducing the number
of anchors by a factor of 3–5. Secondly, an
encoder-decoder feature extractor is used to
increase the awareness of the scene context even
for small objects. Finally, we minimize the focal
loss during training to support a large number of
anchors resulting from the high-scale dispersion.</p>
      <p>With the above methods, an average accuracy
of 95.7% in palm detection can be achieved.
Using the conventional cross-entropy loss without
a decoder gives a baseline of only 86.22%.
4.4.</p>
    </sec>
    <sec id="sec-9">
      <title>Hand Reference Model</title>
      <p>After detecting the palm in the entire image,
the subsequent reference hand model accurately
localizes the key points of the 21 3D hand and
finger coordinates within the detected hand region
using regression, i.e. direct coordinate prediction.
The model learns a consistent internal
representation of the hand pose and is robust even
to partially visible hands.</p>
      <p>To obtain the data, the MediaPipe authors
manually selected 21 coordinates (Figs. 1 and 2).</p>
      <p>BlazeFace provides two additional virtual key
points that clearly describe the center of the
human body, rotation, and scale as a circle. Using
Leonardo’s Vitruvian Man, the authors predicted
the middle of the person’s thighs, the radius of the
circle surrounding the person, and the angle of the
line connecting the middle of the shoulders and
hips.</p>
      <p>
        The landmark model in MediaPipe Pose
provides the location of 33 pose landmarks (see
Figs. 3 and 4) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>MediaPipe Objectron is a mobile solution for
real-time 3D object detection. It detects objects in
2D images and estimates their position using an
ML model trained on the Objectron dataset.</p>
      <p>Object detection is a widely studied problem in
computer vision, but most research has focused on
2D object prediction. While 2D prediction only
provides 2D bounding boxes, extending the
prediction to 3D can capture the size, position, and
orientation of an object in the world, leading to a
variety of applications in robotics, unmanned
vehicles, image retrieval, and augmented reality.
Although 2D object detection is relatively mature
and widely used in industry, 3D object detection
from 2D images is a challenging problem due to
the lack of data and the variety of object
appearances and shapes within a category.</p>
      <p>
        When the model is applied to each frame
captured by a mobile device, it may suffer from
jitter due to the ambiguity of the 3D bounding box
estimated at each frame [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. To mitigate this, the
same detection+tracking strategy is used in the 2D
object detection and tracking pipeline in
MediaPipe Box Tracking. This reduces the need
to run the mesh on each frame, allowing for
heavier and therefore more accurate models while
keeping the pipeline real-time on mobile devices.
It also preserves subject identity in frames and
ensures that predictions are consistent over time,
reducing jitter.
      </p>
      <p>
        The Objectron 3D object detection and
tracking pipeline are implemented as a MediaPipe
graph that internally uses a detection subgraph and
a tracking subgraph [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The detection subgraph
performs logical inference only once every few
frames to reduce the computational burden and
decodes the output tensor into a FrameAnnotation
that contains nine key points: the center of the 3D
bounding box and its eight vertices. The tracking
subgraph runs each frame using the window
tracking tool in MediaPipe Box Tracking to track
a 2D box that tightly spans the projection of the
3D bounding box and upscales the tracked 2D key
points to 3D using EPnP. When a new detection
becomes available from the detection subgraph,
the tracking subgraph is also responsible for
consolidation between detection and tracking
results based on the overlap region [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ].
      </p>
    </sec>
    <sec id="sec-10">
      <title>5. Experimental Results</title>
      <p>MediaPipe Selfie Segmentation can work in
real-time on both smartphones and laptops.</p>
      <p>Capturing an object in a video stream starts
with the initialization of the ObjectTracker class,
which is found in the cv2 library. Turn on the
video camera. We perform scaling and set the
interpolation characteristics. We create a window
for the frame. We can enter a video recording and
capture the track. We write the tracking window
selection function. Set the size of the window. We
capture the object. The algorithm analyzes the
location of the object. Fig. 5 shows a fragment of
the object tracking program.
hist = cv2.calcHist( [hsv_roi], [0], mask_roi,</p>
      <p>
        [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], [0, 180] )
# Normalize and reshape the histogram
cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX);
self.hist = hist.reshape(-1)
# Extract the region of interest from the frame
vis_roi = vis[y0:y1, x0:x1]
# Compute the image negative (for display only)
cv2.bitwise_not(vis_roi, vis_roi)
vis[mask == 0] = 0
# Check if the system in the "tracking" mode
if self.tracking_state == 1:
# Reset the selection variable
self.selection = None
# Compute the histogram back projection
hsv_backproj = cv2.calcBackProject([hsv], [0],
      </p>
      <p>self.hist, [0, 180], 1)
# Compute bitwise AND between histogram
# backprojection and the mask
hsv_backproj &amp;= mask
# Define termination criteria for the tracker
term_crit = (cv2.TERM_CRITERIA_EPS |
cv2.TERM_CRITERIA_COUNT,</p>
      <p>10, 1)
# Apply CAMShift on 'hsv_backproj'
track_box, self.track_window = cv2.CamShift(hsv_backproj,</p>
      <p>self.track_window, term_crit)
# Draw an ellipse around the object
cv2.ellipse(vis, track_box, (0, 255, 0), 2)
# Show the output live video
cv2.imshow('Object Tracker', vis)
# Stop if the user hits the 'Esc' key
c = cv2.waitKey(5)
if c == 27:</p>
      <p>break
# Close all the windows
cv2.destroyAllWindows()
if __name__ == '__main__':
# Start the tracker</p>
      <p>ObjectTracker().start_tracking()</p>
      <p>During the implementation of this experiment, 6. Conclusions
the issue of object recognition in the video stream
was considered. The main libraries of the Python The result of the study was the study of the
language that can be used for the recognition and recognition algorithm. Experiments on capturing
classification of objects from videos are covered. objects were conducted. Capture frames of objects</p>
      <p>MediaPipe methods for achieving a particular were recorded in files. In the future, it is planned
result in recognition are clearly described. to identify these frames according to whether the</p>
      <p>Since this experiment requires a significant face belongs to the given person or not. The
number of images of objects from different MediaPipe library performed best.
viewing angles, a sample obtained using computer A neural network for object recognition in a
graphics and face generation by the MediaPipe video stream was created and trained. An
program was used for its implementation. Face experiment was conducted in which the efficiency
recognition testing was performed by matching of the developed system was compared with the
the Viola-Jones method, an SVM classifier indicators of alternative known recognition
combined with histogram computation of oriented methods. Recognition accuracy increases when
gradients and convolutional networks trained on using the proposed method. The developed
the ImageNet sample. The trained models were recognition system is more resistant to local noise:
provided by the Caffe libraries with OpenCV.
for images subject to blurring and occlusion, the
recognition accuracy of the developed system
drops. The research results are of practical interest
in the design of management and information
processing systems in the field of computer vision
and image recognition, for those tasks where there
is a need to determine the spatial parameters of the
depicted objects. To use the system for real-time
monitoring, it is necessary to analyze and design
a system based on distributed computing for
parallel analysis of frames because the results of
local experiments show difficulties with many
frames per second, activation of detectors of many
levels requires significant computing resources.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Moriggl</surname>
          </string-name>
          , et. al.,
          <source>Touching Space: Distributed Ledger Technology for Tracking and Tracing Certificates, 56th Hawaii International Conference on System Sciences, January</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Mashxura</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Siddiqov,</surname>
          </string-name>
          <article-title>Effects of the Flipped Classroom in Teaching Computer Graphics</article-title>
          , Eurasian Research Bulletin,
          <volume>16</volume>
          (
          <year>2023</year>
          )
          <fpage>119</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Najmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Alhalafawy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaki</surname>
          </string-name>
          ,
          <article-title>Developing a Sustainable Environment Based on Augmented Reality to Educate Adolescents about the Dangers of Electronic Gaming Addiction</article-title>
          , Sustainability,
          <volume>15</volume>
          (
          <issue>4</issue>
          ) (
          <year>2023</year>
          )
          <fpage>3185</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Hyodo</surname>
          </string-name>
          , et. al.,
          <source>Video Surveillance System Incorporating Expert Decisionmaking Process: A Case Study on Detecting Calving Signs in Cattle</source>
          , arXiv,
          <year>2023</year>
          , preprint.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Iosifova</surname>
          </string-name>
          , et al.,
          <source>Analysis of Automatic Speech Recognition Methods, in: Workshop on Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>2923</volume>
          (
          <year>2021</year>
          )
          <fpage>252</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Khorolska</surname>
          </string-name>
          , et al.,
          <article-title>Application of a Convolutional Neural Network with a Module of Elementary Graphic Primitive Classifiers in the Problems of Recognition of Drawing Documentation and Transformation of 2D to 3D Models</article-title>
          ,
          <source>Journal of Theoretical and Applied Information Technology</source>
          <volume>100</volume>
          (
          <issue>24</issue>
          ) (
          <year>2022</year>
          )
          <fpage>7426</fpage>
          -
          <lpage>7437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Skladannyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Platonenko</surname>
          </string-name>
          ,
          <article-title>Video Channel Suppression Method of Unmanned Aerial Vehicles</article-title>
          ,
          <source>in: IEEE 41st International Conference on Electronics and Nanotech-nology</source>
          (
          <year>2022</year>
          )
          <fpage>473</fpage>
          -
          <lpage>477</lpage>
          . doi:
          <volume>10</volume>
          .1109/ELNANO54667.
          <year>2022</year>
          .9927105
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          , et. al.,
          <article-title>Blind Surveillance Image Quality Assessment via Deep Neural Network Combined with the Visual Saliency</article-title>
          , Artificial Intelligence: Second CAAI International Conference,
          <string-name>
            <surname>CICAI</surname>
          </string-name>
          <year>2022</year>
          , Beijing, China,
          <year>August 2022</year>
          ,
          <fpage>136</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Quantum</given-names>
            <surname>Annealing</surname>
          </string-name>
          <article-title>Bat Algorithm for Node Localization in Wireless Sensor Networks</article-title>
          , Sensors,
          <volume>23</volume>
          (
          <issue>2</issue>
          ) (
          <year>2023</year>
          )
          <fpage>782</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Glennie</surname>
          </string-name>
          , et al.,
          <source>Hidden Markov Models: Pitfalls and Opportunities in Ecology, Method. Ecol. Evol</source>
          .
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>43</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Maćkiewicz</surname>
          </string-name>
          , W. Ratajczak,
          <source>Principal Components Analysis (PCA)</source>
          ,
          <source>Comput. Geosci</source>
          .
          <volume>19</volume>
          (
          <issue>3</issue>
          ) (
          <year>1993</year>
          )
          <fpage>303</fpage>
          -
          <lpage>342</lpage>
          . doi:
          <volume>10</volume>
          . 1016/
          <fpage>0098</fpage>
          -
          <lpage>3004</lpage>
          (
          <issue>93</issue>
          )
          <fpage>90090</fpage>
          -
          <lpage>R</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Adlersberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cuperman</surname>
          </string-name>
          ,
          <article-title>Transform Domain Vector Quantization for Speech Signals</article-title>
          ,
          <source>ICASSP'87. IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          ,
          <volume>12</volume>
          (
          <year>1987</year>
          )
          <fpage>1938</fpage>
          -
          <lpage>1941</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , et. al.,
          <source>CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning-Based Synthetic Face Detection, IEEE/CVF Winter Conference on Applications of Computer Vision</source>
          ,
          <year>2023</year>
          ,
          <fpage>6108</fpage>
          -
          <lpage>6117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kurani</surname>
          </string-name>
          , et. al.,
          <source>Comprehensive Comparative Study of Artificial Neural Network (ANN)</source>
          and
          <article-title>Support Vector Machines (SVM) on Stock Forecasting</article-title>
          ,
          <source>Annals. of Data Sci</source>
          .
          <volume>10</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>183</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tanantong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yongwattana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Convolutional</given-names>
            <surname>Neural</surname>
          </string-name>
          <article-title>Network Framework for Classifying Inappropriate Online Video Contents</article-title>
          ,
          <source>IAES Int. J. Artificial Intell</source>
          .
          <volume>12</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>124</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wen</surname>
          </string-name>
          , et. al.,
          <article-title>Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on PhysicsInformed Neural Networks</article-title>
          , arXiv,
          <year>2023</year>
          , preprint.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Li,
          <article-title>Research on Path Tracking Algorithm of Green Agricultural Machinery for Sustainable Development</article-title>
          ,
          <source>Sustainable Energy Technologies and Assessments</source>
          ,
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>102917</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Azimi</surname>
          </string-name>
          , et. al.,
          <string-name>
            <surname>Rethinking</surname>
            <given-names>RNN</given-names>
          </string-name>
          -Based Video Object Segmentation,
          <source>Computer Vision</source>
          , Imaging and
          <source>Computer Graphics Theory and Applications</source>
          , CCIS,
          <volume>1691</volume>
          (
          <year>2021</year>
          )
          <fpage>348</fpage>
          -
          <lpage>365</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          - 25477-2_
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Teece</surname>
          </string-name>
          ,
          <source>Big Tech and Strategic Management: How Management Scholars Can Inform Competition Policy, Academy of Management Perspectives</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .5465/amp.
          <year>2022</year>
          .0013
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          , et. al.,
          <article-title>Kinematic Pose Tracking for Workout App Using Computer Vision</article-title>
          ,
          <source>Int. Res. J. of Modernization in Eng. Technol. Sci. doi:10.56726/irjmets33856</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Medykovskyy</surname>
          </string-name>
          , et. al.,
          <year>(2015</year>
          ,
          <article-title>September)</article-title>
          .
          <source>Methods of Protection Document Formed from Latent Element Located by Fractals</source>
          ,
          <source>2015 Xth International Scientific and Technical Conference “Computer Sciences and Information Technologies” (CSIT)</source>
          ,
          <year>2015</year>
          ,
          <fpage>70</fpage>
          -
          <lpage>72</lpage>
          . doi:
          <volume>10</volume>
          .1109/STC-CSIT.
          <year>2015</year>
          .7325434
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Logoyda</surname>
          </string-name>
          , et. al.,
          <source>Identification of Biometric Images using Latent Elements, CEUR Workshop Proceedings</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nazarkevych</surname>
          </string-name>
          , et. al.,
          <source>The Ateb-Gabor Filter for Fingerprinting, Conference on Computer Science and Information Technologies</source>
          ,
          <year>2019</year>
          ,
          <fpage>247</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hrytsyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grondzal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bilenkyj</surname>
          </string-name>
          ,
          <article-title>Augmented Reality for People with Disabilities, 2015 Xth International Scientific</article-title>
          and Technical Conference “
          <source>Computer Sciences and Information Technologies (CSIT)</source>
          ,
          <year>2015</year>
          188-
          <fpage>191</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>