<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Method of Dataset Filling and Recognition of Moving Objects in Video Sequences based on YOLO</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariia Nazarkevych</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Lytvyn</string-name>
          <email>vasyl.v.lytvyn@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna Kostiak</string-name>
          <email>maryna.y.kostiak@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nazar Oleksiv</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nazar Nаconechnyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 S. Bandera str., Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>265</fpage>
      <lpage>276</lpage>
      <abstract>
        <p>The method of filling the dataset of mobile military objects was studied. In the future, objects captured by a video camera mounted on a moving object will be investigated. The software used is YOLO v8, which allows you to track moving objects that fall into the video from the video camera. We use artificial intelligence methods to recognize military equipment. Improvements in object recognition can be achieved by contour analysis, pattern comparison, and point-by-point comparison. In future works, we will develop recognition of these theories. The metrics used to evaluate object recognition are shown. A method of filling the dataset and creating a classifier is proposed. Shown are graphs, the results of moving object recognition in Yolo8x.</p>
      </abstract>
      <kwd-group>
        <kwd>1 AI</kwd>
        <kwd>machine learning</kwd>
        <kwd>YOLO</kwd>
        <kwd>computer vision</kwd>
        <kwd>military</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Today, with the rapid development of artificial</title>
        <p>
          intelligence based on deep learning
technology, convolutional neural networks [
          <xref ref-type="bibr" rid="ref1">1–
4</xref>
          ] have made a breakthrough in the field of
computer vision, greatly increasing the
flexibility and automation of production [
          <xref ref-type="bibr" rid="ref7">5–9</xref>
          ].
        </p>
        <p>
          In the field of computer vision, machine
vision is involved in the automatic recognition
of moving objects, and the detection of
obstacles during the movement of objects [
          <xref ref-type="bibr" rid="ref8">10</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Review of Literature</title>
      <p>
        The YOLO series has been updated to YOLO v8
[8]. To further improve the performance of
existing object detection algorithms, many
scientists have begun to study them. In
particular, [
        <xref ref-type="bibr" rid="ref7">9</xref>
        ] proposed an object detector based
on depth features. It enhances the ability to train
the network through multi-scaling training
methods without increasing the network size
and integrates multiple tracking features of
observation models for accurate object location.
However, this method lacks detection and false
detection in complex scenes with many objects.
      </p>
      <p>In an era marked by rapid advancements in
Artificial Intelligence (AI) and computer vision,
the convergence of these technologies has
ignited revolutionary progressions with
profound implications. The domain of military
applications stands at the forefront of harnessing
AI’s potential, particularly within the sphere of
computer vision. This article embarks on a
critical exploration of this aspect, delving into the
importance of training AI models, specifically
focusing on the You Only Look Once (YOLO)
model, for the recognition of military objects,
with a specific emphasis on tanks.</p>
      <p>As the velocity of technological innovation
intensifies, AI and computer vision have
evolved into crucial instruments that reshape
our perceptions and interactions with the
environment. In military contexts, the
incorporation of computer vision capabilities
holds paramount significance, providing
heightened situational awareness, facilitating
strategic decision-making, and optimizing
overall operational efficiency. The rapid and
accurate identification of military objects, such
as tanks, is imperative in environments
characterized by dynamism and
unpredictability.</p>
      <p>Amidst these technological advancements,
recent geopolitical events, and the war in
Ukraine, accentuate the urgent requirement
for advanced AI models capable of detecting
military assets like tanks. The deployment of AI
in conflict zones confers a strategic advantage
by enabling real-time object recognition,
aiding in the precise allocation of resources,
and ultimately enhancing the safety and
effectiveness of military operations.</p>
      <p>This article aims to explore how we train a
sophisticated model called YOLO to recognize
military objects. We are going to talk about the
methods we use, the difficulties we face, and
what could happen as a result. The goal is to
show how important AI and computer vision
are in changing how military technology
works, which has big effects on keeping things
safe and defending our countries.</p>
      <p>Various methods are used to search for
objects in the image when applying computer
vision methods.</p>
      <p>
        There are three main methods of object
recognition in the image:
• Contour analysis [
        <xref ref-type="bibr" rid="ref8">10</xref>
        ].
• Template matching [
        <xref ref-type="bibr" rid="ref9">11</xref>
        ].
• Feature detection, description &amp; matching
[
        <xref ref-type="bibr" rid="ref10">12</xref>
        ].
      </p>
      <p>One of the methods for recognizing objects
from a video stream is using pattern search
methods. This method has information about
what the required object looks like, what kind
of background it can have, how certain
contours of the object look, and at what
positions they can be, immediately considering
the possible location of the object detection. It
allows us to achieve a high quality of
recognition and has a good speed. However,
when the video camera captures several
objects that are similar to each other, different
patterns are satisfied and recognition
decreases. Therefore, apply a family of models
to estimate functions.</p>
      <p>
        Google Collaboratory [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ] (“Colab”, a cloud
version of Jupyter Notebook) will be used to
train neural networks. Using Colab does not
require installing and running or upgrading
your computer hardware to meet Python’s
CPU/GPU intensive requirements. In addition,
Colab provides free access to computing
infrastructure such as storage, RAM,
computing power, and processing with
graphics units (GPUs) [14] and tensor
processing units (TPUs) [15].
      </p>
      <p>A workspace will be organized before the
project is developed. To identify enemy
targets, DataSet padding will be performed for
the corresponding data set. This process will
include searching for images and videos of the
above-mentioned objects and marking the
corresponding objects. The data set will
consist of tens of thousands of unique
images—approximately equally for each class.</p>
      <sec id="sec-2-1">
        <title>2.1. Contour Analysis</title>
        <p>One of the methods used to determine moving
objects is contour analysis [16], which is a
method of describing, storing, recognizing,
comparing, and searching for graphic images
(objects) based on their contours. The contour
completely defines the shape of the image and
contains all the necessary information for
recognizing images by their shape. This
approach allows you not to consider the
internal points of the image and thereby
significantly reduce the amount of information
that is transformed. Contour—a curve that
describes the boundary of the object in the
image. Therefore, it is possible to consider the
contours of objects, which reduces the
complexity of algorithms. The main advantage
of the contour analysis is the invariance
concerning the rotation, scale, and shift of the
contour in the image.</p>
        <p>It is well suited for searching for an object
of a given shape. As a result, it is often possible
to ensure the system works in real-time.
However, there are significant disadvantages
of this method. There are breaks in the outline
in places. Thus, the contour analysis has a
rather weak resistance to interference, and any
violation of the integrity of the contour or poor
visibility of the object leads to either the
impossibility of detection or false positives.
Simplicity and speed of contour analysis allow
you to successfully apply this approach,
provided there is a well-defined object on a
contrasting background.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Template Matching</title>
        <p>The template comparison method [17] is
performed according to the criterion of the
minimum (maximum) of some function
comparing the object and its template. This
method is one of the simplest. The input
parameters of the method are the image on
which the template must be searched, the image
of the object that must be found on the tested
image, and the size of the template must be
smaller than the size of the tested image. The
purpose of the algorithm is to find a section on
the tested image that matches the template.
Searching for a template is done by sequentially
moving it by one pixel across the image, and
evaluating the similarity of each new area to the
template. Based on the results of the check, the
section with the highest coincidence ratio is
selected. In essence, this is the percentage of
overlap between the image area and the
template. The described method of matching
with templates is simple, but there is a certain
complexity in the process of creating templates,
that is, in learning.</p>
        <p>Template matching does not allow you to say
whether the original object was found because it
is a probabilistic characteristic that depends on
the scale, viewing angles, rotations of the image,
and the presence of physical obstacles [18].
There are also possible false positives of the
algorithm when the searched object is not there,
but there are some general details in the pattern
and area on the tested image. Of course, this
situation can be avoided by checking the match
factor value (so that it is not less than some
threshold), but this will not always work
properly.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Search by Special Points</title>
        <p>Often algorithms use key points [19] (feature
points) of the image for their work. Key points
are understood as some areas of the picture that
are distinctive for this image. There are a large
number of methods for detecting such “special
points”, all of them differ in the speed of
operation, the number of selected points, as well
as resistance to image transformations: rotation,
changes in viewing angles, changes in scale.</p>
        <p>Three components are used to find key points
in images and their subsequent comparison:
• Feature detector[20]—searches for key
points on the image.
• Descriptor (descriptor extractor)—
produces a description of the found key
points, evaluating their positions through
the description of the surrounding areas.
• Matcher (matcher)—builds
correspondences between two sets of image
points.</p>
        <p>Unlike template matching and contour
analysis, algorithms for finding key points are
more resistant to obstacles, and
transformations and allow finding objects even
in the presence of physical obstacles.</p>
        <p>To achieve the highest possible level of
tracking of an object (marker), it must have a
significant number of unique (stable) key
points, which the augmented reality library
can quickly highlight in the video stream and
compare with the existing template set. For
this, it is necessary to use the fastest possible
detector and descriptor, as well as to develop
an algorithm that could confidently say that the
object has been found.</p>
        <p>This algorithm allows you to recognize
images at different angles, at different
distances from the camera, under different
lighting, and when the image is partially
overlapped [21].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Architecture</title>
    </sec>
    <sec id="sec-4">
      <title>Algorithm of the</title>
    </sec>
    <sec id="sec-5">
      <title>YOLO</title>
      <sec id="sec-5-1">
        <title>The YOLOv8 [22] model is widely employed for</title>
        <p>object detection purposes. YOLOv8 is available in
four primary versions: small (s), medium (m),
large (l), and extra large (x), with each version
providing increasing levels of accuracy.</p>
        <p>Additionally, each variant requires a distinct
amount of time for the training process.</p>
        <p>The objective of the graph is to create a
highly efficient object detector model, with
performance indicated on the Y-axis and
inference time on the X-axis. Initial findings
indicate that YOLOv8 performs exceptionally
well in achieving this goal compared to other
cutting-edge techniques.</p>
        <p>Examining the chart, it’s evident that all
versions of YOLOv8 exhibit quicker training
times than EfficientDet. Specifically, the most
accurate YOLOv8 model, YOLOv8x,
demonstrates the capability to process images
at a significantly faster rate while maintaining
a comparable level of accuracy compared to YOLO performs the prediction of both class
the EfficientDet D4 [23] model. probabilities and bounding box coordinates.
The architecture of YOLO involves dividing the YOLO assumes multiple bounding boxes for
input image predicting bounding boxes, into a each grid cell. During training, the aim is to
grid and and class probabilities. ensure that each bounding box predictor is</p>
        <p>Here’s a high-level overview of the YOLO “responsible” for predicting the object based
architecture: on the prediction with the highest value of</p>
        <p>Input Layer: YOLO takes an input image, sample overlap with the correct information.
typically of fixed size. To improve object detection accuracy,</p>
        <p>Dividing into Grid: The image is divided YOLO uses the Non-Maximum Suppression
into an S×S grid. Each grid cell is responsible (NMS) method [25]. This technique allows you
for predicting bounding boxes and class to identify and remove redundant or incorrect
probabilities. bounding boxes that may be created for a</p>
        <p>Bounding Box Prediction: Each grid cell single object. As a result of applying NMS, only
predicts multiple bounding boxes (usually B one bounding box is selected for each object in
bounding boxes). These bounding boxes the image, which improves the quality and
include information about the coordinates (x, efficiency of the detection process.
y) of the bounding box, width (w), height (h),
and confidence scores. 3.1. The Difference Between YOLO and</p>
        <p>Class Prediction: For each bounding box, Other Deep Learning Algorithms for
the model predicts class probabilities for Object Detection
different object categories. This is done using
softmax activation. The main difference between YOLO algorithms</p>
        <p>Confidence Score: Each bounding box used for object detection is that it recognizes
prediction is associated with a confidence objects quickly in real-time. The principle of
score, indicating how likely it is that the operation of YOLO involves entering the entire
bounding box contains an object. This score image at once, which passes through the
takes into account both the probability of convolutional neural network only once [26].
object presence and the accuracy of the The performance of the YOLO algorithm is
bounding box. evaluated using the COCO dataset [27] (Common</p>
        <p>Final Prediction: The final predictions are Objects in Context). The COCO dataset is a large
obtained by combining the bounding box dataset consisting of 80 feature classes. YOLOv1,
coordinates, class probabilities, and confidence the baseline version, can recognize 24 object
scores. Non-maximum suppression is then classes and has a 21.6% mAP (average accuracy)
applied to filter out redundant and low- measured using the COCO dataset. YOLOv2 can
confidence predictions. recognize 90 feature classes with a COCO dataset</p>
        <p>Output: The final output is a set of bounding mAP of 30.2%. YOLOv3 can recognize 1000
boxes, each associated with a class label and a feature classes with an mAP of 57.9%. YOLOv4
confidence score. has a better performance compared to YOLOv3</p>
        <p>Loss Function: YOLO uses a multi-part loss but can recognize 80 feature classes with mAP of
function that includes terms for object 60.0%. YOLOv8 achieves significant
presence/absence, bounding box coordinates, performance improvements over YOLOv4 and
and class probabilities. This allows the model achieves an mAP of 83.5% on the COCO dataset.
to be trained for accurate detection. YOLOv6 achieved 84.4% of the COCO mAP</p>
        <p>The YOLO algorithm takes an image as input dataset. YOLOv7 achieved a mAP of 85.4%.
and uses a deep convolutional neural network YOLOv8 provides additional performance
to detect objects in the image. improvements over YOLOv7 and achieves an</p>
        <p>The first 20 convolutional layers of the mAP of 86.4%.
model are pre-trained using ImageNet [24], However, the YOLO algorithm has its
using a temporal mean pooling layer and a fully drawbacks. Spatial limitations allow only
connected layer. After that, this trained model 8Bbox to be projected per grid cell, making it
is transformed to perform the object detection difficult to distinguish objects that are close
task, to the trained network improves its together. Multiple samples are used and a lack
performance. The last fully connected layer of
of detail is often apparent. The third problem is
imprecise localization, and finally, since Bbox
training is performed based on data, it is
difficult for the algorithm to detect a test
dataset that does not exist in the training data.</p>
        <p>However, the most difficult aspect of deep
learning is the preparation of training data, and
the data applicable to each application domain
is very limited.</p>
        <p>Despite all the limitations, the YOLO model
still has a significantly higher processing speed
than other models and continues to be widely
used.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Construction of the Model</title>
      <sec id="sec-6-1">
        <title>Artificial neural networks are complex</title>
        <p>computing systems similar to the
corresponding systems of the human brain.
They consist of artificial neurons that receive
input signals from a certain number of
connections to the neurons of the previous
layer or the input of the network.
The scheme of a simple neural network with
one hidden layer is shown in Fig. 1. The process
of learning a neural network consists of
adjusting the parameters of the network to
obtain better results. For most problems in
neural networks, tutoring is used, when the
system is fed data with a ready result to
gradually train the network with real data,
where the correct answer will not be known in
advance. Improving the results and accuracy of
the network is done by minimizing a certain
error, the difference between the predicted
and actual outputs.</p>
        <p>Accuracy measures how well the model
predicts the correct outcome compared to the
actual outcome. It is calculated by dividing the
number of correct predictions by the total
number of predictions. Although accuracy is a
useful metric, it may not be sufficient to
evaluate the performance of complex AI
models. Therefore, it is important to consider
other metrics such as precision, repeatability,
and F1-score [28].</p>
        <p>Classification errors—confusion matrix
[29] (error matrix). If there are two classes and
an algorithm that predicts each object to one of
the classes, then the classification error matrix
will look like this:
where  ̌ is the answer of the algorithm on the
object, and y is the true label of the class on this
object.</p>
        <p>Accuracy measures the ratio of correct
positive predictions to all positive predictions
made by the model. It shows how well the model
will avoid false positives. Repeatability, on the
other hand, measures the ratio of correct positive
predictions to all actual positive cases. It shows
how well the model will avoid false negatives.</p>
        <p>In this way, classification errors are applied,
which are: False Negative (FN) and False Positive
(FP) [29].</p>
        <p>=
 =</p>
        <p>+

× 100%,
× 100%.</p>
        <p>+</p>
        <p>The F1-score is a combination of precision
and repeatability, providing a balanced
estimate of model performanceP).</p>
        <p>1 = 2×+× × 100%. (3)</p>
        <p>Reference data sets are traditionally used
for preliminary comparison of models. The
COCO dataset uses a special mAP averaged
accuracy metric  .</p>
        <p>1
= ∫  ( )  ,</p>
        <p>0
= 1 ∑3=1   + .
(1)
(2)
(4)
(5)</p>
      </sec>
      <sec id="sec-6-2">
        <title>This is an average accuracy measure, which is taken from different values of the intersection over the union.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Construction of the Classifier</title>
      <sec id="sec-7-1">
        <title>The construction of the classifier makes it possible to evaluate the preferences of the choice based on the analysis of information about the preferences of users.</title>
        <p>We will use an artificial neural network to
build a classifier.</p>
        <p>A neural network based on a multilayer
perceptron is a system of interconnected
layers of neurons. Each neuron is
characterized by an activation function that
converts the neuron’s input signal into an
output signal. Connections of neurons with
other neurons are characterized by connection
coefficients—weights. An important factor in
learning a neural network is the type of input
data. To achieve the best results, it is necessary
to preliminarily display the data using
centering and scaling operations (1):</p>
      </sec>
      <sec id="sec-7-2">
        <title>The learning process is an iterative sequence</title>
        <p>of operations for calculating the output signal of
the network and subsequently changing the
weights of the connections. As an algorithm for
weight adjustment in MLP-based networks, the
error backpropagation algorithm is usually used.
It refers to methods of learning with a teacher, so
it requires that target values be set in the training
examples. This algorithm belongs to the class of
gradient algorithms, that is, the changes in the
weights of connections are made in the direction
of minimizing the gradient of the error. The
prediction error during training is equal to the
difference between the signal at the network
output and the output reference value
corresponding to the input data (2).</p>
        <p>= ( − ).
(7)</p>
      </sec>
      <sec id="sec-7-3">
        <title>The training of the network must be carried</title>
        <p>out until the average value of the error during
one training epoch decreases. Further training
usually leads to a deterioration in the
analytical capabilities of the neural network.</p>
        <p>The network was trained using the error
backpropagation algorithm and the gradient
descent algorithm. The basis of the idea of the
algorithm is the use of the initial error of the
neural network (7) to calculate the correction
values of the neuron weights in its hidden
layers:</p>
        <p>E = 1 2 ∑ (y−y′) k 2 i = 1,
(8)
where k is the number of output neurons of the
network, y is the target value; and y’ is the actual
output value.</p>
        <p>This algorithm is iterative, it uses
step-bystep learning. It works as follows: one test
example is given to the learning input. Then the
weight values are adjusted. At each iteration,
forward and reverse passes of the network take
place. On a forward input, the vector propagates
from the inputs to the outputs of the network. In
this way, a certain output vector is formed, which
corresponds to the current (actual) state of the
scales. The error of the neural network is
calculated as the difference between the actual
and target values.</p>
        <p>On the return pass, this error is propagated
from the output of the network to its inputs,
and the neuron weights are corrected
according to formula (4):
Δwji(n) = −η ∂Eav ∂wij,
(9)
where wji is the weight of the ith connection of
the jth neuron; η is a learning speed parameter
that allows you to additionally control the size
of the correction step; Δwji for more accurate
adjustment to the minimum error and is
selected experimentally in the learning process
(changes in the interval from 0 to 1).</p>
        <sec id="sec-7-3-1">
          <title>5.1. The Method of Stochastic Gradient</title>
        </sec>
        <sec id="sec-7-3-2">
          <title>Descent</title>
        </sec>
      </sec>
      <sec id="sec-7-4">
        <title>The stochastic gradient descent method belongs</title>
        <p>to optimization algorithms and is used to adjust
the parameters of the machine learning model.
The gradient is usually considered the sum of the
gradients caused by each training element. The
parameter vector changes in the direction of the
anti-gradient with a given step. Therefore,
standard gradient descent requires one pass
over the training data before it can change the
parameters. In stochastic (or “operational”)
gradient descent, the value of the gradient is
approximated by the gradient of the cost
function calculated on only one training element.
The parameters then change in proportion to the
approximate gradient. Thus, the parameters of
the model change after each training object. For
large datasets, stochastic gradient descent can
provide a significant speed advantage over
standard gradient descent.</p>
        <p>We will use the Python language to build the
classifier. One of the main reasons why Python is
used for machine learning is that it has many
frameworks that simplify the process of writing
code and reduce development time.</p>
        <sec id="sec-7-4-1">
          <title>5.2. Save the Video</title>
        </sec>
      </sec>
      <sec id="sec-7-5">
        <title>So, we shoot a video and process it frame by</title>
        <p>frame [30], and we want to save this video. We
do pre-processing of images [31].</p>
        <p>We create a VideoWriter object. Specify the
name of the output file (output.avi). Then we
specify the FourCC code and transfer the
number of frames per second (fps) and the
frame size. And the last one is Color. If True, the
encoder expects a color frame, otherwise it
works with a grayscale frame. FourCC is a
4byte code used to identify the video codec.
XVID codec is better. MJPG creates large-size
videos. X264 gives a small video size). On
Windows: DIVX.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6. Experiment Environment</title>
      <p>As part of the research, a proprietary dataset
was utilized, comprising over a thousand
images of tanks from various models and
perspectives. These images were sourced from
electronic books and websites, rendering them
diverse and realistic [32]. Accordingly, it was
necessary to manually add labels to the images
for subsequent model training. This process
ensured proper classification and recognition
of tanks in the images.</p>
      <p>The decision was made to employ the
Roboflow platform for efficient uploading and
processing of images.</p>
      <sec id="sec-8-1">
        <title>This not only facilitated a swift resolution of</title>
        <p>the task but also accurately assigned labels to
each object in the images, providing the model
with sufficient information for tank recognition
and classification.</p>
      </sec>
      <sec id="sec-8-2">
        <title>These steps in utilizing a proprietary dataset</title>
        <p>and the Roboflow platform hold significant
importance in advancing the tank recognition
model, particularly in enhancing its accuracy
and efficiency during training.</p>
        <p>
          As a result of training, the model
demonstrated impressive accuracy, achieving
a high level of correct classification of objects
in the images [33]. The graphs attached to the
article illustrate the model's high stability and
effectiveness during training, confirming its
ability to confidently recognize tanks in
various scenarios and conditions [34].
Furthermore, the trained model exhibited
minimal losses during training, indicating high
efficiency and optimal adaptation to the
training data. The training and loss graphs
depict a stable learning process and the
absence of significant fluctuations in losses
during model optimization [
          <xref ref-type="bibr" rid="ref12 ref13">35, 36</xref>
          ].
        </p>
        <p>Overall, the obtained results affirm the high
potential and efficiency of the developed
model for tank recognition, making it a
significant contribution to the field of military
applications and security objects.</p>
        <sec id="sec-8-2-1">
          <title>6.1.1. Model Testing</title>
        </sec>
      </sec>
      <sec id="sec-8-3">
        <title>We conducted model testing under various conditions, including image analysis and video streaming:</title>
        <sec id="sec-8-3-1">
          <title>6.1.2. Image Testing</title>
          <p>
            The model was evaluated using diverse images of
tanks, encompassing various models sourced
from different outlets [
            <xref ref-type="bibr" rid="ref14 ref15">37, 38</xref>
            ]. Under normal
lighting conditions and various viewing angles,
the model successfully recognized tanks,
demonstrating a high level of accuracy.
          </p>
        </sec>
        <sec id="sec-8-3-2">
          <title>6.2. Video Stream Testing</title>
          <p>We also conducted testing using a video stream
to assess the model’s real-time tank recognition
capabilities. In this mode, the model proved to be
quite effective, adapting quickly to changes in the
images and confidently recognizing tanks in
different scenarios.
Despite the overall success, it is worth noting
that the model struggled to recognize tanks in
low-light conditions, such as during nighttime
without thermal imaging illumination or in
snowy conditions. This aspect should be
considered as a potential area for further
improvements and optimizations to the model.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>7. Conclusions</title>
      <sec id="sec-9-1">
        <title>Our research project encompassed several key</title>
        <p>stages aimed at creating and refining a tank
recognition model using YOLO (You Only Look
Once). We generated our dataset, consisting of
diverse tank images from various sources. This
dataset provided crucial material for training
and recognizing different tank models. [5]
Utilizing the created dataset, we successfully
trained the YOLO model to recognize tanks.</p>
        <p>The training methodology involved annotation
and systematic training to achieve high
accuracy. We assessed the model’s
effectiveness across various media, including
photos and videos. The model demonstrated a
high level of tank recognition in real-time and [6]
images. The testing results revealed
impressive accuracy and efficiency in
recognizing tanks under different conditions
and perspectives. Upon analysis, the next step
could involve refining the model for tank
recognition in low-light conditions, such as
nighttime without thermal illumination, or
during adverse weather conditions like [7]
snowfall. In conclusion, the developed model
holds significant potential in military
applications and security domains. Further
enhancements could make it even more
effective in real-world conditions. [8]
(2024) 17–20. doi: 10.55524/ijircst.2024. Recognition with YOLO: Advanced
12.1.4. Techniques in Robotic Object Detection,
[14] U. Utkarsh, et al., Automated Translation Cognitive Robotics (2024). doi:
and Accelerated Solving of Differential 10.1016/j.cogr.2024.01.001.
Equations on Multiple GPU Platforms, [26] Y. Wang, et al., GT-YOLO: Nearshore
Comput. Methods Appl. Mech. Eng. 419 Infrared Ship Detection Based on
(2024) 116591. doi: 10.48550/arXiv. Infrared Images, J. Marine Sci. Eng. 12(2)
2304.06835. (2024) 213. doi: 10.3390/jmse12020213.
[15] P. Feng, et al., Co-Continuous Structure [27] K. Gupta, A. Asthana, Reducing the
SideEnhanced Magnetic Responsive Shape Effects of Oscillations in Training of
Memory PLLA/TPU Blend Fabricated by Quantized YOLO Networks, IEEE/CVF
4D Printing, Virtual Phys. Prototyp. Winter Conference on Applications of
19(1) (2024) e2290186. doi: 10.1080/ Computer Vision (2024) 2452–2461.
17452759.2023.2290186. [28] P. Giudici, M. Centurelli, S. Turchetta,
[16] J. Malik, et al., Contour and Texture Artificial Intelligence risk measurement,
Analysis for Image Segmentation, Int. J. Expert Systems with Applications 235
Comput. Vision 43 (2001) 7–27. doi: (2024) 121220.</p>
        <p>10.1023/A:1011174803800. [29] S. Shinde, et al., Artificial Intelligence
[17] N. Hashemi, et al., Template Matching Approach for Terror Attacks Prediction
Advances and Applications in Image Through Machine Learning, Multidiscip.</p>
        <p>Analysis, arXiv preprint (2016). Sci. J. 6(1) (2024) 2024011–2024011.
[18] G. Cox, Template Matching and Measures [30] M. Nazarkevych, et al., Evaluation of the
of Match in Image Proce-Ssing, University Effectiveness of Different Image
of Cape Town (1995). Skeletonization Methods in Biometric
[19] D. Lowe, Distinctive Image Features from Security Systems, Int. J. Sens. Wirel.</p>
        <p>Scale-Invariant Keypoints, Int. J. Comput. Commun. Control 11(5) (2021) 542–552.</p>
        <p>Vision 60 (2004) 91–110. doi: 10.2174/221032791066620121015
[20] D. Mukherjee, M. Wu, G. Wang, A 1809.</p>
        <p>Comparative Experimental Study of [31] M. Nazarkevych, et al., The Ateb-Gabor
Image Feature Detectors and Filter for Fingerprinting, International
Descriptors, Mach. Vision Appl. 26 Conference on Computer Science and
(2015) 443–466. doi: 10.1007/s00138- Information Technology (2019) 247–
015-0679-9. 255. doi:
10.1007/978-3-030-33695[21] L. Liu, et al., CLFR-Det: Cross-Level 0_18.</p>
        <p>Feature Refinement Detector for Tiny- [32] M. Nazarkevych, et al., Data Protection
Ship Detection in SAR Images, Based on Encryption Using
AtebKnowledge-Based Syst. 284 (2024). doi: Functions, 9th International Scientific
10.1016/j.knosys.2023. 111284. and Technical Conference Computer
[22] Q. Liu, et al., YOLOv8-CB: Dense Sciences and Information Technologies
Pedestrian Detection Algorithm Based (2016) 30–32.
on In-Vehicle Camera, Electronics 13(1) [33] M. Medykovskyy, et al., Methods of
(2024) 236. doi: 10.3390/electronics Protection Document Formed from
13010236. Latent Element Located by Fractals, Xth
[23] F. Pan, et al., Zero-shot Building Attribute International Scientific and Technical
Extraction from Large-Scale Vision and Conference Computer Sciences and
Language Models, IEEE/CVF Winter Information Technologies (2015) 70–72.
Conference on Applications of Computer [34] V. Sheketa, et al., Formal Methods for
Vision (2024) 8647–8656. Solving Technological Problems in the
[24] H. Li, C. Wang, Y. Liu, YOLO-FDD: Infocommunications Routines of
Efficient Defect Detection Network of Intelligent Decisions Making for Drilling
Aircraft Skin Fastener, Signal, Image and Control, IEEE International
ScientificVideo Processing (2024) 1–15. Practical Conference Problems of
[25] S. Koga, et al., Optimizing Food Sample Infocommunications, Science and
Handling and Placement Pattern Technology (2019) 29–34.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] [2] [3]</source>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <source>A Survey of Convolutional Neural Networks: Analysis</source>
          ,
          <string-name>
            <surname>Applications</surname>
          </string-name>
          , and Prospects,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>33</volume>
          (
          <issue>12</issue>
          ) (
          <year>2021</year>
          )
          <fpage>6999</fpage>
          -
          <lpage>7019</lpage>
          . doi:
          <volume>10</volume>
          .1109/TNNLS.
          <year>2021</year>
          .
          <volume>3084827</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Grechaninov</surname>
          </string-name>
          , et al.,
          <article-title>Decentralized Access Demarcation System Construction in Situational Center Network</article-title>
          ,
          <source>in: Workshop on Cybersecurity Providing in Information and Telecommunication Systems II</source>
          , vol.
          <volume>3188</volume>
          , no.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Anakhov</surname>
          </string-name>
          , et al.,
          <article-title>Protecting Objects of Critical Information Infrastructure from Wartime Cyber Attacks by Decentralizing the Telecommunications Network</article-title>
          ,
          <source>in: Workshop on Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>3550</volume>
          (
          <year>2023</year>
          )
          <fpage>240</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Hulak</surname>
          </string-name>
          , et al.,
          <article-title>Dynamic Model of Guarantee Capacity and Cyber Security Management in the Critical Automated System</article-title>
          ,
          <source>in: 2nd International Conference on Conflict Management in Global Information Networks</source>
          , vol.
          <volume>3530</volume>
          (
          <year>2023</year>
          )
          <fpage>102</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Buriachok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Skladannyi</surname>
          </string-name>
          ,
          <article-title>Security Rating Metrics for Distributed Wireless Systems</article-title>
          ,
          <source>in: Workshop of the 8th International Conference on “Mathematics. Information Technologies. Education:” Modern Machine Learning Technologies and Data Science</source>
          , vol.
          <volume>2386</volume>
          (
          <year>2019</year>
          )
          <fpage>222</fpage>
          -
          <lpage>233</lpage>
          . M.
          <string-name>
            <surname>Vladymyrenko</surname>
          </string-name>
          , et al.,
          <article-title>Analysis of Implementation Results of the Distributed Access Control System</article-title>
          . in: IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1109/picst47496.
          <year>2019</year>
          .
          <volume>9061376</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          , et al.,
          <article-title>Automatic Recognition of Static Phenomena in Retouched Images: A Novel Approach, Advanced Technologies for the Implementation of New Ideas (</article-title>
          <year>2024</year>
          )
          <fpage>287</fpage>
          -
          <lpage>291</lpage>
          .
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>E-YOLO</surname>
          </string-name>
          :
          <article-title>Recognition of Estrus Cow Based on Improved YOLOv8n Model, Expert Syst</article-title>
          .
          <source>Appl</source>
          .
          <volume>238</volume>
          (
          <year>2024</year>
          )
          <article-title>122212</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2023</year>
          .
          <volume>122212</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>DsP-YOLO</surname>
          </string-name>
          :
          <article-title>An AnchorFree Network with DsPAN for Small Object Detection of Multiscale Defects, Expert Syst</article-title>
          .
          <source>Appl</source>
          .
          <volume>241</volume>
          (
          <year>2024</year>
          )
          <article-title>122669</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2023</year>
          .
          <volume>122669</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Patel</surname>
          </string-name>
          , et al.,
          <source>3D Back Contour Metrics in Predicting Idiopathic Scoliosis Progression: Retrospective Cohort Analysis, Case Series Report and Proof of Concept, Children</source>
          <volume>11</volume>
          (
          <issue>2</issue>
          ) (
          <year>2024</year>
          )
          <article-title>159</article-title>
          . doi:
          <volume>10</volume>
          .3390/children11020159.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <source>Transformer-Based Multiple-Object Tracking via AnchorBased-Query and Template Matching, Sensors</source>
          <volume>24</volume>
          (
          <issue>1</issue>
          ) (
          <year>2024</year>
          ). 229 doi: 10.3390/s24010229.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <source>Coarse Registration of Point Cloud Base on Deep Local Extremum Detection and Attentive Description</source>
          , Multimedia Syst.
          <volume>30</volume>
          (
          <issue>1</issue>
          ) (
          <year>2024</year>
          ),
          <volume>23</volume>
          . doi:
          <volume>10</volume>
          .1007/s00530-023-01203-w.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <article-title>Deep Learning Framework for Forecasting Diabetic Retinopathy: An Innovative Approach</article-title>
          , Int.
          <source>J. Innov. Res. Comput. Sci. Technol</source>
          .
          <volume>12</volume>
          (
          <issue>1</issue>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sheketa</surname>
          </string-name>
          , et al.,
          <article-title>Empirical Method of Evaluating the Numerical Values of Metrics in the Process of Medical Software Quality Determination</article-title>
          ,
          <source>International Conference on Decision Aid Sciences and Application</source>
          (
          <year>2020</year>
          )
          <fpage>22</fpage>
          -
          <lpage>26</lpage>
          . doi:
          <volume>10</volume>
          .1109/DASA51403.
          <year>2020</year>
          .
          <volume>9317218</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>N.</given-names>
            <surname>Boyko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tkachuk</surname>
          </string-name>
          ,
          <article-title>Processing of Medical Different Types of Data Using Hadoop and Java MapReduce</article-title>
          , in: 3rd International Conference on Informatics &amp;
          <string-name>
            <surname>Data-Driven Medicine</surname>
          </string-name>
          Vol.
          <volume>2753</volume>
          (
          <year>2010</year>
          )
          <fpage>405</fpage>
          -
          <lpage>414</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>N.</given-names>
            <surname>Boyko</surname>
          </string-name>
          , et al.,
          <source>Fractal Distribution of Medical Data in Neural Network</source>
          ,
          <string-name>
            <surname>IDDM</surname>
          </string-name>
          (
          <year>2019</year>
          )
          <fpage>307</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>I.</given-names>
            <surname>Tsmots</surname>
          </string-name>
          , et al.,
          <source>The Method and Simulation Model of Element Base Selection for Protection System Synthesis and Data Transmission, Int. J. Sens. Wirel. Commun. Control</source>
          <volume>11</volume>
          (
          <issue>5</issue>
          ) (
          <year>2021</year>
          )
          <fpage>518</fpage>
          -
          <lpage>530</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>