<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>O. Petryliak);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring Yolo architecture for detection of Ukrainian car license plates in images⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleh Petryliak</string-name>
          <email>oleh.petryliak@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Svitlana Kostenko</string-name>
          <email>svitlana.kostenko@lnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lesia Dobuliak</string-name>
          <email>lesia.dobuliak@lnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lyubomyr Chyrun</string-name>
          <email>lyubomyr.chyrun@lnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ivan Franko National University of Lviv</institution>
          ,
          <addr-line>Universytetska 1, 79000, Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MoMLeT-2025: 7th International Workshop on Modern Machine Learning Technologies</institution>
          ,
          <addr-line>June, 14, 2025, Lviv-Shatsk</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The rapid advancement of deep learning has significantly improved object detection tasks, including license plate. This paper explores the application of deep learning for the detection of vehicle license plates in images. The study evaluates and compares various generations of the YOLO (You Only Look Once) model to determine the most effective architecture in terms of both accuracy and speed. YOLO models are widely used for real-time object detection due to their efficiency in processing images in a single forward pass through the neural network. A systematic approach was applied to train and compare YOLOv5, YOLOv6, YOLOv8, YOLOv9, YOLOv10, and YOLOv11. Each model was trained on a dataset of vehicle images containing license plates, with annotations marking the exact location of the plates. The training process lasted 100 epochs, allowing the models to stabilize and reach optimal performance. The evaluation metrics included mean Average Precision (mAP) at different IoU thresholds, detection speed, and generalization capability on unseen data. Based on the analysis, YOLOv9 was selected as the most efficient model, balancing accuracy and processing speed. The selected models were tested on real-world images of vehicles captured under varying conditions, including different angles, lighting, and license plate formats such as Cyrillic characters, green electric vehicle plates, and American-style layouts. The results confirmed the robustness of most YOLO models, although YOLOv11 showed lower stability in more complex scenarios. The findings of this research highlight that careful evaluation of YOLO architectures enables the selection of an effective model for automatic license plate detection. The chosen YOLOv9 model demonstrates strong performance and is well suited for real-time applications such as traffic monitoring, automated access control, and law enforcement.</p>
      </abstract>
      <kwd-group>
        <kwd>license plate detection</kwd>
        <kwd>deep learning</kwd>
        <kwd>computer vision</kwd>
        <kwd>model comparison</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>which are characterized by different sizes, fonts and character arrangements. Additional challenges
are posed by Cyrillic characters, where some characters are visually similar to Latin characters,
which can cause confusion during detection.</p>
      <p>Particular attention is paid to the detection of number plates with non-standard color schemes,
such as green number plates used for electric vehicles and red number plates typical of transit
vehicles. These color variations require the model to adapt to different levels of contrast and
illumination, which increases the complexity of the number plate detection task.</p>
      <p>YOLO (You Only Look Once) is one of the most popular deep learning architectures for detecting
objects in images and classifying them. The main feature of this architecture is the ability to search
for objects in real time, processing images in one pass through the neural network, which ensures
high speed of the model while maintaining accuracy.</p>
      <p>The YOLO prediction principle is based on the fact that the input image is divided into a grid of
size  ×  , where  — the number of horizontal and vertical image divisions, which is selected
depending on the size of the input image and the model architecture (for example, 7×7 or 13×13).
Each cell of this grid is used to detect objects whose center falls within its boundaries, so the model
makes predictions for each cell:
1.
2.
3.</p>
      <p>The parameters of the bounding box (a rectangular area that outlines the detected object in
the image): the coordinates of the upper left corner of the box ( ₀,  ₀), its width  and height
 .</p>
      <p>Confidence score — a numerical indicator that indicates the probability that the bounding
box really contains an object, as well as the accuracy of the predicted box coordinates. If the
confidence value is below a certain threshold, the prediction can be rejected as unreliable.</p>
      <p>Object class — the category to which the found object belongs.</p>
      <p>The final output of the model is a matrix of size  ×  × ( × 5 +  ), where  the number of
predicted frames for each cell, 5 — the number of parameters of each frame (coordinates of the upper
left corner of the frame, width, height, and confidence), and  — the number of possible classes.</p>
      <p>Next, the Non-Maximum Suppression (NMS) method [2] is used to remove unnecessary frames.</p>
      <p>The results of the model are the parameters of the bounding boxes that most closely match the
objects in the image and the classification of these objects.</p>
      <p>The described approach is implemented by a neural network whose architecture consists of three
main parts (Fig. 1): Backbone, which detects features in the input image, Neck, which combines
features to improve the quality of detecting objects of different scales, and Heads, which performs
the final prediction of frames and classes.</p>
      <p>Before training the neural network and setting up the YOLO model, a training set must be
prepared in a special way. This sample should consist of a set of image files with objects to be
recognized and their annotations in a format supported by YOLO models. This format includes, for
each image, the class identifier of the existing object, the coordinates of the center of its bounding
box, and the width and height of the box normalized to the image size. This dataset is used by the
model for training, allowing it to identify license plate features such as shape, proportions, and
contrast relative to the background.</p>
      <p>YOLO training is based on minimizing the Loss Function, which is the sum of three components,
such as Localization Loss (mean square error (MSE) for the predicted coordinates of the bounding
box), Classification Loss (cross-entropy or MSE for the object class prediction), and Objectness Loss
(probability that the box contains an object).</p>
    </sec>
    <sec id="sec-2">
      <title>2. YOLO generations</title>
      <p>During the development of the YOLO architecture, eleven generations from YOLOv1 to YOLOv11
have appeared, each of which takes into account the shortcomings of the previous one and offers
certain improvements to increase the accuracy, speed and efficiency of work, as well as adaptation
to new tasks:
1. YOLOv1 (2015): Introduced the concept of “You Only Look Once” by offering a one-step
approach to object detection. For the first time, the image was divided into a grid, predicting
the coordinates of the frames and classes of objects in each cell.
2. YOLOv2 (2016): Batch Normalization was added, Anchor Boxes and Dimension Clusters were
introduced, which improved the accuracy and stability of training.
3. YOLOv3 (2018): A multi-level model architecture is used to handle objects of different sizes,
as well as an improved base layer (Darknet-53), which improved speed and accuracy.
4. YOLOv4 (2020): Introduced Mosaic – data augmentation, a new Anchor-Free Detection Head
and an optimized loss function. The model became more efficient on large data sets.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Research and publication analysis</title>
      <p>The effectiveness of using different generations of YOLO for vehicle, road sign, and license plate
detection has been considered in many scientific papers. In particular, [4] analyzes the results of
using YOLO for automated license plate detection. The study confirms that YOLO is an effective
approach, but the main challenges remain the variability of lighting, shooting angles, and partial
overlap of license plates.</p>
      <p>Study [5] provides an overview of methods for improving YOLO performance for real-world
applications, including the use of various options after processing the detection results. The authors
note that to improve accuracy, it is necessary to adapt the model to specific environmental conditions
and improve the balance between speed and performance.</p>
      <p>Paper [6] investigates the impact of using pre-trained models and methods before training YOLO
on new datasets. The authors emphasize that adapting models to specific scenarios significantly
increases their efficiency, especially in conditions of variable lighting and low image quality.</p>
      <p>Also worth mentioning is the VEZHA LPR system from Incoresoft [7], which is actively used for
automatic recognition of Ukrainian license plates in video surveillance and traffic speed control
systems. VEZHA LPR demonstrates high accuracy in difficult real-world conditions, such as fast
traffic, poor lighting, difficult angles, and partial overlap of license plates.</p>
      <p>While such systems are already in operation, there is a need for systematic academic evaluation
and comparison of different deep learning architectures tailored to Ukrainian license plate formats.
This research aims to fill that gap by analyzing and benchmarking various YOLO model generations,
which can serve as a basis for developing flexible, open, and easily deployable solutions or for
enhancing existing systems with empirically validated architectures.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Approaches to assessing model performance</title>
      <p>To evaluate the performance (efficiency) of a model, YOLO model training uses a function called
fitness, which is a weighted sum of key machine learning metrics calculated based on the accuracy
of the predictions made for the objects in the training set [8]:

= 
∗  + 
∗  + 
∗</p>
      <p>50 – mean average precision calculated at an intersection
over union (IoU) threshold of 0.50,</p>
      <p>50-95 – average of the mean average precision calculated
at varying IoU thresholds, ranging from 0.50 to 0.95. The weighting coefficients 
are chosen
heuristically, based on practical experience. By default, 
study, different weights were applied, specifically 
= 
= 
= 0,1, 
= 0, 
= 0,1,</p>
      <p>= 0,9. In this
= 0,3, 
= 0,5.</p>
      <p>Precision and Recall are calculated as follows:
 =
,
where 
(True Positive) – the number of correctly detected objects, 
(False Positive) – the
number of incorrectly detected object,</p>
      <p>(False Negative) – the number of actual objects that were
missed.</p>
      <p>A key metric for assessing prediction accuracy in object detection tasks is the Jaccard Index (
),
which is defined as the ratio of the area of intersection to the area of union between the ground truth
bounding box and the predicted bounding box:</p>
      <p>The average accuracy for the threshold value of the Jaccard coefficient 
= 0.5 is defined as:
The average accuracy in the range of</p>
      <p>thresholds from 0.5 to 0.95 with a step of 0.05 is
calculated by the formula:
1
10
. ,</p>
      <p>.</p>
      <p>.
where 
– the bounding box predicted by the model, 
– the ground truth bounding box.</p>
      <p>Average Precision (</p>
      <p>) – the area under the AUC (area under ROC curve), formed by the ROC
curve, which is based on  ( ) – the dependence of precision on recall, i.e.</p>
      <p>,
=</p>
      <p>( )  ,
50 =  (
= 0.5).</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(4)
(5)
(6)
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Data preparation, training, and object detection stages</title>
      <p>For this task, we collected about 11500 images of cars from open sources using an automated parser
based on the Selenium library [9]. To annotate the data, we used the CVAT (Computer Vision
Annotation Tool) [10], which generates a text annotation of the required format according to the
rectangular area with an object selected in the image. In our case, the objects we are looking for in
the images are license plates. They will be assigned to the class with the identifier 0. That is, the
model is trained exclusively on detecting license plates without dividing them into subcategories.
of the model.</p>
      <p>The resulting sample was randomly divided into training and test samples. The training set
contains 10000 images that were used to train the models, while the remaining 1500 images were
included in the test set to evaluate the quality of license plate recognition and determine the speed</p>
      <p>For the study, we trained the YOLOv5, YOLOv6, YOLOv8, YOLOv9, YOLOv10, and YOLOv11
models. YOLOv7 was not included in the experiments due to the lack of significant improvements in
object detection compared to other versions. The obtained results allowed us to determine the most
optimal architecture for the license plate detection task, taking into account the balance between
accuracy, speed and stability of the model.</p>
      <p>The process of training YOLO models of different generations lasted 100 epochs using standard
settings (batch size – 16, optimizer – Adam, input image size – 640 pixels, frame error weight – 7.5,
class identification error weight – 0.5). The training was performed using PyTorch [11] on a system
with an Intel i5-14600KF, Nvidia RTX 4070, and 32 GB of RAM. The total training time was:





</p>
      <p>YOLOv5 – 7494,9 seconds
YOLOv6 – 7099,8 seconds
YOLOv8 – 6819,7 seconds
YOLOv9 – 6584,6 seconds
YOLOv10 – 8744,7 seconds
YOLOv11 – 8073,9 seconds
a)
b)
c)
d)</p>
      <p>With the help of the obtained YOLO models, the search (prediction) of a license plate in a new
image consists of several main stages (Fig. 4).</p>
      <p>At the first stage, the model accepts an image of arbitrary size (Fig. 4.a), which must be scaled to
a standard size in accordance with the requirements of the input layer of the neural network. In the
case of YOLO, the image is scaled to a square, usually 640×640 pixels (Fig. 4.b).</p>
      <p>After that, the model makes predictions: the neural network processes the image and finds
potential objects based on the obtained characteristics. The output of the model contains the
parameters of the detected bounding box, the object class, and the level of confidence in the
prediction. Figure 4.c shows the result of prediction on a 640×640 pixels image.</p>
      <p>At the last stage, the frame is scaled and applied to the original image (Fig. 4.d). This is necessary
to preserve the input resolution and correctly display the detection results in real conditions, i.e., the
exact positioning of the bounding box and the correspondence of the prediction to the real
coordinates of the license plate.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Comparison of the performance of YOLO models of different generations</title>
      <p>Using the test sample, we calculated the performance indicators of each model. Table 1 shows the
results of evaluating the performance of different generations of YOLO based on four key metrics:
Precision (P), Recall (R), mean average precision at IoU = 0.5 (mAP50), and average of the mean
average precision at IoU from 0.5 to 0.95 (mAP50-95). Based on these values, we calculated the overall
fitness of the model, which allows us to assess its balance between accuracy, completeness, and
overall detection quality.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Comparison of prediction accuracy of YOLO models of different generation</title>
      <p>For the first experiment on license plate detection, three images with different types of Ukrainian
license plates were selected: green license plates for electric cars, Cyrillic license plates of the
standard format, and American license plates.</p>
      <p>a)
b)
c)</p>
      <p>The largest deviation in license plate detection is observed in Figure 6.b, likely due to the
challenging angle and specific lighting conditions. The car is positioned at an angle that alters the
perspective of the license plate, potentially complicating accurate detection by YOLO models.
YOLOv11 delivered the weakest result on this image. Meanwhile, other YOLO versions provided
similar predictions, confirming the general robustness of the architecture to such challenges. This
highlights potential directions for further optimization, particularly in improving the adaptability of
models to variations in license plate orientation.</p>
      <p>The non-standard height of the license plate in Figure 6.c contributes to differences in the
predicted bounding box dimensions.</p>
      <p>a)
b)
c)</p>
      <p>For the second experiment, images with challenging conditions for automatic license plate
detection were selected (Fig. 7): an angled license plate, poor lighting, and non-standard placement.
Table 3 contains the parameters of the detected bounding boxes for the license plates, obtained using
YOLO models of different generations.</p>
      <p>The results of model testing on the presented images demonstrate generally high accuracy in
license plate detection, with minor deviations caused by lighting and angle variations. However, in
all of the given cases, the YOLOv11 model showed the weakest performance, with more significant
errors in determining the bounding box position, especially in terms of height. Other models
produced results close to the ground truth coordinates.</p>
      <p>The analysis confirms that YOLO models are generally effective for license plate detection under
standard conditions. However, challenging factors such as angled views, insufficient lighting, and
non-standard plate placement can impact prediction accuracy, indicating potential directions for
further model optimization.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>This study presents a comparative analysis of the effectiveness of various generations of YOLO
models (YOLOv5, YOLOv6, YOLOv8, YOLOv9, YOLOv10, and YOLOv11) for the task of detecting
Ukrainian vehicle license plates. The evaluation was based on key metrics: precision, recall, mean
average precision (mAP) at different Intersection over Union (IoU) thresholds, and image processing
speed. Based on the results, YOLOv9 was identified as the most effective model, providing an optimal
balance between accuracy and speed.</p>
      <p>Testing the models on images with different formats of Ukrainian license plates (green for electric
vehicles, Cyrillic, and American-style) as well as under challenging conditions (angled view, poor
lighting, and non-standard placement) demonstrated the overall robustness of YOLO models.
However, YOLOv11 showed the lowest stability among all tested models, particularly in cases
involving difficult angles and non-standard plate positioning.</p>
      <p>Therefore, YOLOv9 is recommended as the optimal architecture for real-world applications of
automated Ukrainian license plate detection due to its high accuracy, speed, and reliability. Future
research may focus on improving model performance under more complex conditions and
integrating the models into practical monitoring and video surveillance systems.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used X-GPT-4 and Gramby in order to: Grammar
and spelling check. After using these services, the authors reviewed and edited the content as needed
and take full responsibility for the publication’s content.
[4] S. M. Silva, C. R. Jung, License plate detection and recognition in unconstrained scenarios // 2018
European Conference on Computer Vision (ECCV), 2018, pp. 580-596.
https://openaccess.thecvf.com/content_ECCV_2018/papers/Sergio_Silva_License_Plate_
Detection_ECCV_2018_paper.pdf
[5] R. Laroca, E. Severo, L. Zanlorensi et al. A robust real-time automatic license plate recognition
based on the YOLO detector // Journal of Visual Communication and Image Representation,
2021, volume 74, 103013. https://www.researchgate.net/
publication/323444033_A_Robust_Real-Time_Automatic_License_Plate_ Recognition_
Based_on_the_YOLO_Detector
[6] Redmon J., Farhadi A. YOLO9000: better, faster, stronger // Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7263-7271.
https://ieeexplore.ieee.org/document/8100173
[7] Incoresoft. VEZHA LPR.
https://vezhavms.com/moduli-vezha/rozpiznavannya-avtonomerivlpr/
[8] Hyperparameter Evolution for YOLOv5. https://docs.ultralytics.com/yolov5/
tutorials/hyperparameter_evolution/
[9] Selenium Documentation. https://www.selenium.dev
[10] Computer Vision Annotation Tool. https://app.cvat.ai/tasks
[11] PyTorch Documentation. https://pytorch.org/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , YOLO Explained. https://medium.com/analytics-vidhya/yolo-explained5b6f4564f31
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Subramanyam</surname>
          </string-name>
          ,
          <article-title>Non Max Suppression (NMS)</article-title>
          . Medium. https://medium.com/ analyticsvidhya/non
          <article-title>-max-suppression-nms-6623e6572536</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dixit</surname>
          </string-name>
          ,
          <article-title>Understanding Multi-Headed YOLO-v9 for Object Detection and Segmentation</article-title>
          . Medium. https://medium.com/@srddev/understanding
          <article-title>-multi-headed-yolo-v9-for-objectdetection-and-segmentation-8923ee21b652</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>